* [Caml-list] Bug somewhere... @ 2002-10-06 22:57 Alessandro Baretta 2002-10-06 23:06 ` Alessandro Baretta 2002-10-07 8:03 ` Pierre Weis 0 siblings, 2 replies; 6+ messages in thread From: Alessandro Baretta @ 2002-10-06 22:57 UTC (permalink / raw) To: Ocaml It's either on my brain or in the Scanf module, the former possibility being definitely more likely. I have written a very simple program to compute md5 checksums of a codes taken from a text file. Here it is: let scan_line () = Scanf.scanf "%[^\n\r]\n" (fun a -> a) let digest s = String.uppercase (Digest.to_hex(Digest.string s)) let digest_line s = print_endline (s ^ "#" ^ (digest s)) let _ = try while true do digest_line (scan_line ()) done with End_of_file -> () Seems very reasonable... Here's the input file: (2002) DMD.CSB.1GL.001.01 (2002) DMD.CSB.1GL.001.02 (2002) DMD.CSB.1GL.001.03 (2002) DMD.CSB.1GL.001.04 (2002) DMD.CSB.1GL.001.05 (2002) DMD.CSB.1GL.001.06 (2002) DMD.CSB.1GL.001.07 (2002) DMD.CSB.1GL.001.08 (2002) DMD.CSB.1GL.001.09 (2002) DMD.CSB.1GL.001.10 (2002) DMD.CSB.1GL.001.11 (2002) DMD.CSB.1GL.001.12 (2002) DMD.CSB.1GL.001.13 (2002) DMD.CSB.1GL.001.14 (2002) DMD.CSB.1GL.001.15 (2002) DMD.CSB.1GL.001.16 (2002) DMD.CSB.1GL.001.17 (2002) DMD.CSB.1GL.001.18 (2002) DMD.CSB.1GL.001.19 (2002) DMD.CSB.1GL.001.20 Now here's the output file: (2002) DMD.CSB.1GL.001.01#EA486F3F11C1D1E5BE6DDC2A444BC4E1 2002) DMD.CSB.1GL.001.02#4A3E838023756A5EE01C39D5DD02FC07 2002) DMD.CSB.1GL.001.03#605ED19A81C3B7748494038FEE93671A 2002) DMD.CSB.1GL.001.04#F475498E61CC896FA42B3869858B9B69 2002) DMD.CSB.1GL.001.05#60246106058EA46F7C5904F9A7D69FD7 2002) DMD.CSB.1GL.001.06#3FDF89041B44A8A3F5334B500A8B48A0 2002) DMD.CSB.1GL.001.07#657A508D402845454D5EAF0A2BC8380B 2002) DMD.CSB.1GL.001.08#230BDE6A530043CCB01434A6E19DB10E 2002) DMD.CSB.1GL.001.09#39CA6A302A6DE081DFC3BD24C8D4C38E 2002) DMD.CSB.1GL.001.10#BFBAE55D0808B5A8729E23459E45A617 2002) DMD.CSB.1GL.001.11#001F0B9F7F5EEDE05C8BA5A85F7D0F45 2002) DMD.CSB.1GL.001.12#77AB75131372E7FB723B280E084733B0 2002) DMD.CSB.1GL.001.13#1E605246D240D6B5735CDE40FF4614CC 2002) DMD.CSB.1GL.001.14#40970C955978A228AA308AB1B1169800 2002) DMD.CSB.1GL.001.15#7DED9C18A5700389CE670C9E8474C757 2002) DMD.CSB.1GL.001.16#8D396925D7867AF0BF2169B692EAECFF 2002) DMD.CSB.1GL.001.17#DEE78191DEF1E6BA7144AA14E29B8EE6 2002) DMD.CSB.1GL.001.18#F6E082FFD976B0A6721AC056C40C526E 2002) DMD.CSB.1GL.001.19#34F915DBF5B258C7BD4200C753C42BD1 2002) DMD.CSB.1GL.001.20#D310054DE7CF959F5946FABAF561FBEF The '(' is only present on the first line, indicating--so it seems--that scanf is eating-away one more character than it should every time. Do I need brain surgery or is there really a problem with scanf? Alex ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Bug somewhere... 2002-10-06 22:57 [Caml-list] Bug somewhere Alessandro Baretta @ 2002-10-06 23:06 ` Alessandro Baretta 2002-10-08 20:07 ` Pierre Weis 2002-10-07 8:03 ` Pierre Weis 1 sibling, 1 reply; 6+ messages in thread From: Alessandro Baretta @ 2002-10-06 23:06 UTC (permalink / raw) To: Ocaml Alessandro Baretta wrote: > It's either on my brain or in the Scanf module, the former possibility > being definitely more likely. > > I have written a very simple program to compute md5 checksums of a codes > taken from a text file. Here it is: > > let scan_line () = Scanf.scanf "%[^\n\r]\n" (fun a -> a) > let digest s = String.uppercase > (Digest.to_hex(Digest.string s)) > let digest_line s = print_endline (s ^ "#" ^ (digest s)) > let _ = try while true do digest_line (scan_line ()) done > with End_of_file -> () I have rewritten my program in ocamllex. This one works. Here it is. { } rule scanline = parse | [^'\n''\r']* {Lexing.lexeme lexbuf} | ['\n''\r']* {scanline lexbuf } | eof {raise End_of_file} { let lexbuf = Lexing.from_channel stdin in let digest s = String.uppercase (Digest.to_hex (Digest.string s)) in let digest_line s = print_endline (s ^ "#" ^ (digest s)) in try while true do digest_line (scanline lexbuf) done with End_of_file -> () } > Seems very reasonable... > > Here's the input file: > > (2002) DMD.CSB.1GL.001.01 > (2002) DMD.CSB.1GL.001.02 > (2002) DMD.CSB.1GL.001.03 > (2002) DMD.CSB.1GL.001.04 > (2002) DMD.CSB.1GL.001.05 > (2002) DMD.CSB.1GL.001.06 > (2002) DMD.CSB.1GL.001.07 > (2002) DMD.CSB.1GL.001.08 > (2002) DMD.CSB.1GL.001.09 > (2002) DMD.CSB.1GL.001.10 > (2002) DMD.CSB.1GL.001.11 > (2002) DMD.CSB.1GL.001.12 > (2002) DMD.CSB.1GL.001.13 > (2002) DMD.CSB.1GL.001.14 > (2002) DMD.CSB.1GL.001.15 > (2002) DMD.CSB.1GL.001.16 > (2002) DMD.CSB.1GL.001.17 > (2002) DMD.CSB.1GL.001.18 > (2002) DMD.CSB.1GL.001.19 > (2002) DMD.CSB.1GL.001.20 And the correct output: (2002) DMD.CSB.1GL.001.01#EA486F3F11C1D1E5BE6DDC2A444BC4E1 (2002) DMD.CSB.1GL.001.02#DA0E405C9E982D4C51F9D21A2FAB5644 (2002) DMD.CSB.1GL.001.03#9D78774667150BBF2FE473CC149A72DB (2002) DMD.CSB.1GL.001.04#72491ED198C8BAB5A659EF4730EBF76D (2002) DMD.CSB.1GL.001.05#AE3CF2982E265B582725AFE770F685F8 (2002) DMD.CSB.1GL.001.06#8825A66BB3C4D1CEB362631C41FF0633 (2002) DMD.CSB.1GL.001.07#AE4F3D477E43943B044E05D5A0BDD498 (2002) DMD.CSB.1GL.001.08#84E0420BB0B52931EF839FB2673116D3 (2002) DMD.CSB.1GL.001.09#144ABD1E3136EBC4BF9642599340326A (2002) DMD.CSB.1GL.001.10#92C65BDDFB8045D96D9B3DDE2580896C (2002) DMD.CSB.1GL.001.11#AB9A737B83B040BCD4CE310977B3667B (2002) DMD.CSB.1GL.001.12#20C1B0322756CC61D3792A6814FA175A (2002) DMD.CSB.1GL.001.13#20C76BA308A80C93CA2A7FFCCBCD9696 (2002) DMD.CSB.1GL.001.14#BDD11EF273D429A7460E4A010F28AF8D (2002) DMD.CSB.1GL.001.15#D55A8BEE54618241691AD349DB5D3B0A (2002) DMD.CSB.1GL.001.16#D655BDC9DB0C22A2A03B718125884778 (2002) DMD.CSB.1GL.001.17#4EA753AEF91A7F497689DF1E43E0D083 (2002) DMD.CSB.1GL.001.18#B37C19DBE5ED47E9F3F9C8E257BC8F3E (2002) DMD.CSB.1GL.001.19#A35BEE6D08F95935BFFC61ACFEAC54B7 (2002) DMD.CSB.1GL.001.20#FB357D47CF387E1EBFD94C9E79A1DD6A What's wrong with the Scanf version? Alex ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Bug somewhere... 2002-10-06 23:06 ` Alessandro Baretta @ 2002-10-08 20:07 ` Pierre Weis 2002-10-08 21:26 ` Eric C. Cooper 2002-10-08 23:31 ` Alessandro Baretta 0 siblings, 2 replies; 6+ messages in thread From: Pierre Weis @ 2002-10-08 20:07 UTC (permalink / raw) To: Alessandro Baretta; +Cc: caml-list > Alessandro Baretta wrote: > > It's either on my brain or in the Scanf module, the former possibility > > being definitely more likely. > > > > I have written a very simple program to compute md5 checksums of a codes > > taken from a text file. Here it is: > > > > let scan_line () = Scanf.scanf "%[^\n\r]\n" (fun a -> a) > > let digest s = String.uppercase > > (Digest.to_hex(Digest.string s)) > > let digest_line s = print_endline (s ^ "#" ^ (digest s)) > > let _ = try while true do digest_line (scan_line ()) done > > with End_of_file -> () > > I have rewritten my program in ocamllex. This one works. > Here it is. > > { > > } > > rule scanline = parse > | [^'\n''\r']* {Lexing.lexeme lexbuf} > | ['\n''\r']* {scanline lexbuf } > | eof {raise End_of_file} > > { > let lexbuf = Lexing.from_channel stdin in > let digest s = String.uppercase > (Digest.to_hex (Digest.string s)) in > let digest_line s = print_endline (s ^ "#" ^ (digest s)) in > try while true do digest_line (scanline lexbuf) done > with End_of_file -> () > > } > > > Seems very reasonable... [...] > > What's wrong with the Scanf version? > > Alex A lot of problems in here: some are due to the semantics of the Scanf module some are due to the implementation, some are even deeper than those two! Indeed the two programs are not equivalent (and their behaviour are indeed different!). The first reason is that you cannot match eof (as you did with your lexer) using Scanf. This could be considered as a missing feature and we may add a convention to match end of file (either ``@.'', ``@$'', or ``$'' ?). Second, your lexer uses an explicitely allocated buffer lexbuf, while the scanf corresponding call allocates a new input buffer for each invocation; but the semantics of Scanf imposes a look ahead of 1 character to check that no other \n follows the \n that ends your pattern (the semantics of \n being to match 0 or more \n, space, tab, or return). For each line Scanf reads an extra character after the end of line; it stores this character (wihch is a '(' by the way) in the input buffer; but note that the character has been read from the in_channel; now the next scanf invocation will allocate a new input buffer that reads from stdin starting after the last character read by the preceding invocation (the '(' looahead character). Hence you see that a '(' is missing at the beginning of each line after the first one! To solve this problem, you should use bscanf and an explicitely allocated input buffer that would survive from one call to scanf to the next one. Considering that this phenomenon is general concerning stdin and scanf, I rewrote the scanf code such that it allocates a buffer once and for all. Hence this problem is solved in the working sources. In the mean time explicitely allocating an input buffer would solve this problem for you: let lexbuf = Scanf.Scanning.from_channel stdin let scan_line () = Scanf.bscanf lexbuf "%[^\n\r]\n" (fun a -> a) let digest s = String.uppercase (Digest.to_hex(Digest.string s)) let digest_line s = print_endline (s ^ "#" ^ (digest s)) let _ = try while true do digest_line (scan_line ()) done with End_of_file -> () Another semantical question is: should the call sscanf "" "%[^\n\r]\n" (fun x -> x) be successful or not ? If yes, what happens to your problem ? An interesting example indeed that helps precising the semantics of Scanf patterns and functions, thank you very much! Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Bug somewhere... 2002-10-08 20:07 ` Pierre Weis @ 2002-10-08 21:26 ` Eric C. Cooper 2002-10-08 23:31 ` Alessandro Baretta 1 sibling, 0 replies; 6+ messages in thread From: Eric C. Cooper @ 2002-10-08 21:26 UTC (permalink / raw) To: caml-list On Tue, Oct 08, 2002 at 10:07:01PM +0200, Pierre Weis wrote: > A lot of problems in here: some are due to the semantics of the Scanf > module some are due to the implementation, some are even deeper than > those two! > ... > To solve this problem, you should use bscanf and an explicitely > allocated input buffer that would survive from one call to scanf to > the next one. Considering that this phenomenon is general concerning > stdin and scanf, I rewrote the scanf code such that it allocates a > buffer once and for all. Hence this problem is solved in the working > sources. In the C stdio library, this is solved by ungetc() (push back an already-read character). That might be a useful addition to the operations on in_channels. -- Eric C. Cooper e c c @ c m u . e d u ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Bug somewhere... 2002-10-08 20:07 ` Pierre Weis 2002-10-08 21:26 ` Eric C. Cooper @ 2002-10-08 23:31 ` Alessandro Baretta 1 sibling, 0 replies; 6+ messages in thread From: Alessandro Baretta @ 2002-10-08 23:31 UTC (permalink / raw) To: Pierre Weis, Ocaml Pierre Weis wrote: > > A lot of problems in here: some are due to the semantics of the Scanf > module some are due to the implementation, some are even deeper than > those two! > > Indeed the two programs are not equivalent (and their behaviour are > indeed different!). They are meant to be equivalent under the following assumption: the input file is divided in lines which are terminated by either '\n' or '\r'. The difference is mostly due to the fact that Scanf 3.06 reads an extra character with respect to the specified format string. Any other differences are attributable to faulty connections in my brain. > The first reason is that you cannot match eof (as you did with your > lexer) using Scanf. This could be considered as a missing feature and > we may add a convention to match end of file (either ``@.'', ``@$'', > or ``$'' ?). I can live with this. What Scanf *really lacks* is a C-equivalent support for partial matches. If a C-format matches only partially, only the conversions specified in the matched prefix are performed. In O'Caml, Scanf throws an exception. A better solution would be for Scanf.scanf to have type : ('a, Scanning.scanbuf, 'b) format -> 'a option -> 'b If a conversion is performed then the callback function is passed Some(<result>); otherwise, in a partial match f gets a number of None actual parameters from scanf. This approach would make Scanf much more useful. We would be able to explicitly code simple parsers in Ocaml logic and Scanf formats, when, at present, we would be forced to go with Ocamllex/yacc. Take my case, for example. > Second, your lexer uses an explicitely allocated buffer lexbuf, while > the scanf corresponding call allocates a new input buffer for each > invocation; but the semantics of Scanf imposes a look ahead of 1 > character to check that no other \n follows the \n that ends your > pattern (the semantics of \n being to match 0 or more \n, space, tab, > or return). For each line Scanf reads an extra character after the end > of line; it stores this character (wihch is a '(' by the way) in the > input buffer; but note that the character has been read from the > in_channel; now the next scanf invocation will allocate a new input > buffer that reads from stdin starting after the last character read by > the preceding invocation (the '(' looahead character). Hence you > see that a '(' is missing at the beginning of each line after the > first one! This behaviour is couterintuitive, and should be considered buggy. > To solve this problem, you should use bscanf and an explicitely > allocated input buffer that would survive from one call to scanf to > the next one. Considering that this phenomenon is general concerning > stdin and scanf, I rewrote the scanf code such that it allocates a > buffer once and for all. Hence this problem is solved in the working > sources. Very good. Thank you very much. > ... > Another semantical question is: should the call > > sscanf "" "%[^\n\r]\n" (fun x -> x) > > be successful or not ? If yes, what happens to your problem ? With the present semantics, it should raise an exception. With the semantics of partial matches it should succeed. > An interesting example indeed that helps precising the semantics of > Scanf patterns and functions, thank you very much! > > Pierre Weis I humbly bow to your kindness. Thank you very much for sharing your work with all of us. Alex ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Bug somewhere... 2002-10-06 22:57 [Caml-list] Bug somewhere Alessandro Baretta 2002-10-06 23:06 ` Alessandro Baretta @ 2002-10-07 8:03 ` Pierre Weis 1 sibling, 0 replies; 6+ messages in thread From: Pierre Weis @ 2002-10-07 8:03 UTC (permalink / raw) To: Alessandro Baretta; +Cc: caml-list > It's either on my brain or in the Scanf module, the former > possibility being definitely more likely. [...] > > The '(' is only present on the first line, indicating--so it > seems--that scanf is eating-away one more character than it > should every time. > > Do I need brain surgery or is there really a problem with scanf? > > Alex You probably discovered a bug in the implementation of the Scanf module :( I will correct it in the working sources, as soon as possible. However you should report those bugs to caml-bugs@inria.fr, instead of reporting to this list. We have a bug tracking system which is much easier to deal with for recording and tracking than the Caml mailing list... Best regards, Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-10-08 23:21 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-10-06 22:57 [Caml-list] Bug somewhere Alessandro Baretta 2002-10-06 23:06 ` Alessandro Baretta 2002-10-08 20:07 ` Pierre Weis 2002-10-08 21:26 ` Eric C. Cooper 2002-10-08 23:31 ` Alessandro Baretta 2002-10-07 8:03 ` Pierre Weis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox