lexing strings

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* lexing strings
@ 1997-06-01 23:53 Lyn A Headley
  1997-06-02 15:35 ` Pierre Weis
  0 siblings, 1 reply; 2+ messages in thread
From: Lyn A Headley @ 1997-06-01 23:53 UTC (permalink / raw)
  To: caml-list

hi,

I pored over the flex/bison manuals without finding an answer,
so I hope this question is nontrivial.

I'm having a rough time lexing strings with ocamllex, just using
the normal read-eval-print interpreter whose main grammar rule is:

expr EOL                { $1 }

with one of the 'expr' rules like this:
| STRING                  { (O.String $1) }

My lex file has a rule like this:

| '\''           { slurp lexbuf }

recursively lexing strings according to rule 'slurp'. slurp's main
regex looks like this:

[^'\n']*[^'\\']'\''

which should match any sequence of non-newlines until it reaches a '
not preceded by a backslash.  slurp returns the token: STRING(!build)).

My intent, when reading a string, is for the lexer to see the first ',
jump into 'slurp,' eat up the string and return it as the STRING token,
then have the parser read a newline and return EOL, thus matching the
main grammar rule and printing the result.  This almost works, but not
until the user types _two_ newlines will the "interpreter" respond
by printing the expression value! i.e., typing

'hi' [newline]

at the prompt is not enough; two newlines are required.  Other than
that, the expected value is returned.  Does this mean that the first
newline is interpreted as part of the STRING?  Why would my regex match
the newline?

any help appreciated,

Lyn Headley

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: lexing strings
  1997-06-01 23:53 lexing strings Lyn A Headley
@ 1997-06-02 15:35 ` Pierre Weis
  0 siblings, 0 replies; 2+ messages in thread
From: Pierre Weis @ 1997-06-02 15:35 UTC (permalink / raw)
  To: Lyn A Headley; +Cc: caml-list

> [^'\n']*[^'\\']'\''
> 
> which should match any sequence of non-newlines until it reaches a '
> not preceded by a backslash.  slurp returns the token: STRING(!build)).
> 
> My intent, when reading a string, is for the lexer to see the first ',
> jump into 'slurp,' eat up the string and return it as the STRING token,
> then have the parser read a newline and return EOL, thus matching the
> main grammar rule and printing the result.  This almost works, but not
> until the user types _two_ newlines will the "interpreter" respond
> by printing the expression value! i.e., typing
> 
> 'hi' [newline]
> 
> at the prompt is not enough; two newlines are required.  Other than
> that, the expected value is returned.  Does this mean that the first
> newline is interpreted as part of the STRING?  Why would my regex match
> the newline?

Yes, 'hi'\n' matches your regexp. I guess you want something along the
lines of

and slurp = parse
    "'"
    { STRING(rev !build) }
  | '\\' "'"
    { build := '\'' :: !build;
      slurp lexbuf }
  | eof 
    { raise(Lexical_error "unterminated slurp") }
  | c 
    { build := c :: !build;
      slurp lexbuf }

Hope this helps,

(Note: You should have defined the exception Lexical_error of string, in order
to signal the error "unterminated slurp".)

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/







^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~1997-06-02 15:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-06-01 23:53 lexing strings Lyn A Headley
1997-06-02 15:35 ` Pierre Weis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox