From: Pierre Weis <pierre.weis@inria.fr>
To: alex@baretta.com (Alessandro Baretta)
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Bug somewhere...
Date: Tue, 8 Oct 2002 22:07:01 +0200 (MET DST) [thread overview]
Message-ID: <200210082007.WAA27767@pauillac.inria.fr> (raw)
In-Reply-To: <3DA0C1F3.5010308@baretta.com> from Alessandro Baretta at "Oct 7, 102 01:06:27 am"
> Alessandro Baretta wrote:
> > It's either on my brain or in the Scanf module, the former possibility
> > being definitely more likely.
> >
> > I have written a very simple program to compute md5 checksums of a codes
> > taken from a text file. Here it is:
> >
> > let scan_line () = Scanf.scanf "%[^\n\r]\n" (fun a -> a)
> > let digest s = String.uppercase
> > (Digest.to_hex(Digest.string s))
> > let digest_line s = print_endline (s ^ "#" ^ (digest s))
> > let _ = try while true do digest_line (scan_line ()) done
> > with End_of_file -> ()
>
> I have rewritten my program in ocamllex. This one works.
> Here it is.
>
> {
>
> }
>
> rule scanline = parse
> | [^'\n''\r']* {Lexing.lexeme lexbuf}
> | ['\n''\r']* {scanline lexbuf }
> | eof {raise End_of_file}
>
> {
> let lexbuf = Lexing.from_channel stdin in
> let digest s = String.uppercase
> (Digest.to_hex (Digest.string s)) in
> let digest_line s = print_endline (s ^ "#" ^ (digest s)) in
> try while true do digest_line (scanline lexbuf) done
> with End_of_file -> ()
>
> }
>
> > Seems very reasonable...
[...]
>
> What's wrong with the Scanf version?
>
> Alex
A lot of problems in here: some are due to the semantics of the Scanf
module some are due to the implementation, some are even deeper than
those two!
Indeed the two programs are not equivalent (and their behaviour are
indeed different!).
The first reason is that you cannot match eof (as you did with your
lexer) using Scanf. This could be considered as a missing feature and
we may add a convention to match end of file (either ``@.'', ``@$'',
or ``$'' ?).
Second, your lexer uses an explicitely allocated buffer lexbuf, while
the scanf corresponding call allocates a new input buffer for each
invocation; but the semantics of Scanf imposes a look ahead of 1
character to check that no other \n follows the \n that ends your
pattern (the semantics of \n being to match 0 or more \n, space, tab,
or return). For each line Scanf reads an extra character after the end
of line; it stores this character (wihch is a '(' by the way) in the
input buffer; but note that the character has been read from the
in_channel; now the next scanf invocation will allocate a new input
buffer that reads from stdin starting after the last character read by
the preceding invocation (the '(' looahead character). Hence you
see that a '(' is missing at the beginning of each line after the
first one!
To solve this problem, you should use bscanf and an explicitely
allocated input buffer that would survive from one call to scanf to
the next one. Considering that this phenomenon is general concerning
stdin and scanf, I rewrote the scanf code such that it allocates a
buffer once and for all. Hence this problem is solved in the working
sources.
In the mean time explicitely allocating an input buffer would solve
this problem for you:
let lexbuf = Scanf.Scanning.from_channel stdin
let scan_line () = Scanf.bscanf lexbuf "%[^\n\r]\n" (fun a -> a)
let digest s = String.uppercase
(Digest.to_hex(Digest.string s))
let digest_line s = print_endline (s ^ "#" ^ (digest s))
let _ = try while true do digest_line (scan_line ()) done
with End_of_file -> ()
Another semantical question is: should the call
sscanf "" "%[^\n\r]\n" (fun x -> x)
be successful or not ? If yes, what happens to your problem ?
An interesting example indeed that helps precising the semantics of
Scanf patterns and functions, thank you very much!
Pierre Weis
INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2002-10-08 20:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-06 22:57 Alessandro Baretta
2002-10-06 23:06 ` Alessandro Baretta
2002-10-08 20:07 ` Pierre Weis [this message]
2002-10-08 21:26 ` Eric C. Cooper
2002-10-08 23:31 ` Alessandro Baretta
2002-10-07 8:03 ` Pierre Weis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200210082007.WAA27767@pauillac.inria.fr \
--to=pierre.weis@inria.fr \
--cc=alex@baretta.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox