Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Alessandro Baretta <alex@baretta.com>
To: Pierre Weis <pierre.weis@inria.fr>, Ocaml <caml-list@inria.fr>
Subject: Re: [Caml-list] Bug somewhere...
Date: Wed, 09 Oct 2002 01:31:38 +0200	[thread overview]
Message-ID: <3DA36ADA.9050507@baretta.com> (raw)
In-Reply-To: <200210082007.WAA27767@pauillac.inria.fr>



Pierre Weis wrote:
> 
> A lot of problems in here: some are due to the semantics of the Scanf
> module some are due to the implementation, some are even deeper than
> those two!
> 
> Indeed the two programs are not equivalent (and their behaviour are
> indeed different!).

They are meant to be equivalent under the following 
assumption: the input file is divided in lines which are 
terminated by either '\n' or '\r'. The difference is mostly 
due to the fact that Scanf 3.06 reads an extra character 
with respect to the specified format string. Any other 
differences are attributable to faulty connections in my brain.

> The first reason is that you cannot match eof (as you did with your
> lexer) using Scanf. This could be considered as a missing feature and
> we may add a convention to match end of file (either ``@.'', ``@$'',
> or ``$'' ?).

I can live with this. What Scanf *really lacks* is a 
C-equivalent support for partial matches. If a C-format 
matches only partially, only the conversions specified in 
the matched prefix are performed. In O'Caml, Scanf throws an 
exception. A better solution would be for Scanf.scanf to 
have type :
('a, Scanning.scanbuf, 'b) format -> 'a option -> 'b
If a conversion is performed then the callback function is 
passed Some(<result>); otherwise, in a partial match f gets 
a number of None actual parameters from scanf.

This approach would make Scanf much more useful. We would be 
  able to explicitly code simple parsers in Ocaml logic and 
Scanf formats, when, at present, we would be forced to go 
with Ocamllex/yacc. Take my case, for example.

> Second, your lexer uses an explicitely allocated buffer lexbuf, while
> the scanf corresponding call allocates a new input buffer for each
> invocation; but the semantics of Scanf imposes a look ahead of 1
> character to check that no other \n follows the \n that ends your
> pattern (the semantics of \n being to match 0 or more \n, space, tab,
> or return). For each line Scanf reads an extra character after the end
> of line; it stores this character (wihch is a '(' by the way) in the
> input buffer; but note that the character has been read from the
> in_channel; now the next scanf invocation will allocate a new input
> buffer that reads from stdin starting after the last character read by
> the preceding invocation (the '(' looahead character). Hence you
> see that a '(' is missing at the beginning of each line after the
> first one!

This behaviour is couterintuitive, and should be considered 
buggy.

> To solve this problem, you should use bscanf and an explicitely
> allocated input buffer that would survive from one call to scanf to
> the next one. Considering that this phenomenon is general concerning
> stdin and scanf, I rewrote the scanf code such that it allocates a
> buffer once and for all. Hence this problem is solved in the working
> sources.

Very good. Thank you very much.

> ...
> Another semantical question is: should the call
> 
> sscanf "" "%[^\n\r]\n" (fun x -> x)
> 
> be successful or not ? If yes, what happens to your problem ?

With the present semantics, it should raise an exception. 
With the semantics of partial matches it should succeed.

> An interesting example indeed that helps precising the semantics of
> Scanf patterns and functions, thank you very much!
> 
> Pierre Weis

I humbly bow to your kindness. Thank you very much for 
sharing your work with all of us.

Alex

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2002-10-08 23:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-06 22:57 Alessandro Baretta
2002-10-06 23:06 ` Alessandro Baretta
2002-10-08 20:07   ` Pierre Weis
2002-10-08 21:26     ` Eric C. Cooper
2002-10-08 23:31     ` Alessandro Baretta [this message]
2002-10-07  8:03 ` Pierre Weis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DA36ADA.9050507@baretta.com \
    --to=alex@baretta.com \
    --cc=caml-list@inria.fr \
    --cc=pierre.weis@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox