ocamlyacc/ocamllex problems

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* ocamlyacc/ocamllex problems
@ 2001-01-31 17:16 Laurent Reveillere
  2001-02-01 14:22 ` Xavier Leroy
  0 siblings, 1 reply; 2+ messages in thread
From: Laurent Reveillere @ 2001-01-31 17:16 UTC (permalink / raw)
  To: caml-list

I am writing a parser that uses Parsing.rhs_start and Parsing.rhs_end in
a rule.
The problem is the following,

1) If I use a simple rule in the lexer that matches a token all is fine.
ex:
    |   "'" ['0' '1' '*' '.']+ "'"  { ... }

2) If I use an automata in the lexer for matching the same token, the
results of Parsing.rhs_start and Parsing.rhs_end are wrong.
ex:	
    | "'"  { ... bits lexbuf ... }
and bits = parse
    | '\'' { ... }
    | ['0' '1' '.' '*' ] { ... }
    | eof  { ... }
    | _    { ... }

	
Here is the ouput of a debug printf for rhs_start and rhs_end values
case 1) Debug: pats='1..00000'       at (964,965)
case 2) Debug: pats='1..00000'       at (955,965)


I am not sure to undertand the reasons of my problem?


-- 
Laurent



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: ocamlyacc/ocamllex problems
  2001-01-31 17:16 ocamlyacc/ocamllex problems Laurent Reveillere
@ 2001-02-01 14:22 ` Xavier Leroy
  0 siblings, 0 replies; 2+ messages in thread
From: Xavier Leroy @ 2001-02-01 14:22 UTC (permalink / raw)
  To: Laurent Reveillere; +Cc: caml-list

> I am writing a parser that uses Parsing.rhs_start and Parsing.rhs_end in
> a rule.  The problem is the following,
> 
> 1) If I use a simple rule in the lexer that matches a token all is fine.
> ex:
>     |   "'" ['0' '1' '*' '.']+ "'"  { ... }
> 
> 2) If I use an automata in the lexer for matching the same token, the
> results of Parsing.rhs_start and Parsing.rhs_end are wrong.
> ex:	
>     | "'"  { ... bits lexbuf ... }
> and bits = parse
>     | '\'' { ... }
>     | ['0' '1' '.' '*' ] { ... }
>     | eof  { ... }
>     | _    { ... }
>
> I am not sure to undertand the reasons of my problem?

For terminal symbols (tokens), the locations returned by
Parsing.rhs_start and Parsing.rhs_end are those returned by
Lexing.lexeme_start and Lexing.lexeme_end.  However, these two
functions track the location of the *last* regular expression matched by
the ocamllex-generated automaton.  (This location is stored and
updated in place in the "lexbuf" argument.)

So, if your lexing rule recursively calls other lexing rules (as in
case 2 above), the locations reported correspond to the part of the
token that was last matched by a regular expression (i.e. the last
"bit" of the token in your example 2).

To get correct locations in example 2, a bit of "lexbuf" hacking is
required to restore the start location to what it was when the first
regexp was matched:

| "'"  { let start = Lexing.lexeme_start lexbuf in
         let res = ... bits lexbuf ... in
         lexbuf.Lexing.lex_start_pos <- start - lexbuf.Lexing.lex_abs_pos;
         res }
and bits = parse ...

Hope this helps,

- Xavier Leroy

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-02-02 15:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-31 17:16 ocamlyacc/ocamllex problems Laurent Reveillere
2001-02-01 14:22 ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox