* [Caml-list] Array.make exception and parser @ 2011-01-04 13:41 Jean Krivine 2011-01-04 14:56 ` Daniel Bünzli ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Jean Krivine @ 2011-01-04 13:41 UTC (permalink / raw) To: caml users [-- Attachment #1: Type: text/plain, Size: 509 bytes --] Hi all, I am encountering a weird problem, I am trying to parse a very large file (around 1.2 GB) according to a grammar defined in ocamlyacc. During the parsing I get the exception Invalid_argument "Array.make". This is strange because I am not using any array. My guess it that a big chunk of the file I am parsing is matching a non terminal, something like rule: non_term END {blah}; where non_term is going to be 1GB of characters. Does anyone know what could be raising the exception ? Thanks! J [-- Attachment #2: Type: text/html, Size: 585 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine @ 2011-01-04 14:56 ` Daniel Bünzli 2011-01-04 14:57 ` Daniel Bünzli 2011-01-04 14:57 ` Török Edwin 2011-01-05 13:37 ` Xavier Leroy 2 siblings, 1 reply; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 14:56 UTC (permalink / raw) To: Jean Krivine; +Cc: caml users Compile and link in debug mode (-g flag, or debug tag with ocamlbuild). Run your program with : > export OCAMLRUNPARAM='b' myprogram.native You should get a backtrace to the offender. If the trace is not precise enough compile in bytecode. Best, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 14:56 ` Daniel Bünzli @ 2011-01-04 14:57 ` Daniel Bünzli 0 siblings, 0 replies; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 14:57 UTC (permalink / raw) To: Jean Krivine; +Cc: caml users > export OCAMLRUNPARAM='b' myprogram.native export OCAMLRUNPARAM='b'; myprogram.native Sorry, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine 2011-01-04 14:56 ` Daniel Bünzli @ 2011-01-04 14:57 ` Török Edwin 2011-01-04 15:14 ` Jean Krivine 2011-01-05 13:37 ` Xavier Leroy 2 siblings, 1 reply; 26+ messages in thread From: Török Edwin @ 2011-01-04 14:57 UTC (permalink / raw) To: caml-list On 2011-01-04 15:41, Jean Krivine wrote: > Hi all, > > I am encountering a weird problem, I am trying to parse a very large file > (around 1.2 GB) according to a grammar defined in ocamlyacc. On a 32-bit or 64-bit architecture? > During the parsing I get the exception > > Invalid_argument "Array.make". > > This is strange because I am not using any array. You can try calling 'Printexc.record_backtrace true' on startup, and compile with -g. If you compile to bytecode the stacktrace will usually be better, but that is not always accurate either. Another way is to run your bytecode in ocamldebug, and use 'backstep' from the point where exception is raised. > My guess it that a big chunk of the file I am parsing is matching a non > terminal, something like > > rule: > non_term END {blah}; > > where non_term is going to be 1GB of characters. Does anyone know what > could be raising the exception ? Trying to allocate a too big array: #ifdef ARCH_SIXTYFOUR #define Max_wosize (((intnat)1 << 54) - 1) #else #define Max_wosize ((1 << 22) - 1) #endif if (wsize > Max_wosize) caml_invalid_argument("Array.make"); > > Thanks! > J > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 14:57 ` Török Edwin @ 2011-01-04 15:14 ` Jean Krivine 2011-01-04 15:31 ` Daniel Bünzli ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Jean Krivine @ 2011-01-04 15:14 UTC (permalink / raw) To: Török Edwin; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 1878 bytes --] Thanks, it seems the exception is raised by the function grow_stacks() in the 'Parsing' module of Ocaml (not mine). Indeed, looking at the code I can see several calls to Array.create that are not encapsulated in a try .. with. Isn't that a bug ? Anyway I think I found the answer of my problem. Cheers! 2011/1/4 Török Edwin <edwintorok@gmail.com> > On 2011-01-04 15:41, Jean Krivine wrote: > > Hi all, > > > > I am encountering a weird problem, I am trying to parse a very large file > > (around 1.2 GB) according to a grammar defined in ocamlyacc. > > On a 32-bit or 64-bit architecture? > > > During the parsing I get the exception > > > > Invalid_argument "Array.make". > > > > This is strange because I am not using any array. > > You can try calling 'Printexc.record_backtrace true' on startup, and > compile with -g. If you compile to bytecode the stacktrace will usually > be better, but that is not always accurate either. > > Another way is to run your bytecode in ocamldebug, and use 'backstep' > from the point where exception is raised. > > > My guess it that a big chunk of the file I am parsing is matching a non > > terminal, something like > > > > rule: > > non_term END {blah}; > > > > where non_term is going to be 1GB of characters. Does anyone know what > > could be raising the exception ? > > Trying to allocate a too big array: > > #ifdef ARCH_SIXTYFOUR > #define Max_wosize (((intnat)1 << 54) - 1) > #else > #define Max_wosize ((1 << 22) - 1) > #endif > if (wsize > Max_wosize) caml_invalid_argument("Array.make"); > > > > Thanks! > > J > > > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > [-- Attachment #2: Type: text/html, Size: 2698 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 15:14 ` Jean Krivine @ 2011-01-04 15:31 ` Daniel Bünzli 2011-01-04 16:22 ` Yitzhak Mandelbaum 2011-01-04 17:04 ` Jean Krivine 2011-01-04 15:38 ` bluestorm [not found] ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr> 2 siblings, 2 replies; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 15:31 UTC (permalink / raw) To: Jean Krivine; +Cc: caml-list > I can see several calls to Array.create that are > not encapsulated in a try .. with. Isn't that a bug ? That's not the bug. The Invalid_argument exception is never supposed to be caught, it denotes a programming error from the client of the module. The bug lies in calling Array.make with a value larger than Sys.max_array_length. Best, Daniel P.S. I don't know if that's your case but so many languages to parse are LL(k) for some k. I don't really understand why people insist on using yacc like parser generators where a recursive descent parser with some combinators fits perfectly, seems nearly as efficient and allow you to give more precise syntax errors to your users. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 15:31 ` Daniel Bünzli @ 2011-01-04 16:22 ` Yitzhak Mandelbaum 2011-01-04 16:42 ` Daniel Bünzli 2011-01-04 17:04 ` Jean Krivine 1 sibling, 1 reply; 26+ messages in thread From: Yitzhak Mandelbaum @ 2011-01-04 16:22 UTC (permalink / raw) To: Daniel Bünzli; +Cc: Jean Krivine, caml-list For one perspective on this issue, you might want to take a look here: http://lambda-the-ultimate.org/node/4150 A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some combinators fits perfectly," and many, many grammars do not fit perfectly. On Jan 4, 2011, at 10:31 AM, Daniel Bünzli wrote: > P.S. > > I don't know if that's your case but so many languages to parse are > LL(k) for some k. I don't really understand why people insist on using > yacc like parser generators where a recursive descent parser with some > combinators fits perfectly, seems nearly as efficient and allow you to > give more precise syntax errors to your users. ----------------------------- Yitzhak Mandelbaum ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 16:22 ` Yitzhak Mandelbaum @ 2011-01-04 16:42 ` Daniel Bünzli 2011-01-04 17:03 ` Yitzhak Mandelbaum 0 siblings, 1 reply; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 16:42 UTC (permalink / raw) To: Yitzhak Mandelbaum; +Cc: Jean Krivine, caml-list > A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some > combinators fits perfectly," Well I don't know what "a lot of effort" means to you. But IIRC my compiler course it's a matter of taking your EBNF grammar, eliminating left recursion and left factoring the grammar. If you can't do that with your grammar then you cannot parse it with LL(k). Best, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 16:42 ` Daniel Bünzli @ 2011-01-04 17:03 ` Yitzhak Mandelbaum 0 siblings, 0 replies; 26+ messages in thread From: Yitzhak Mandelbaum @ 2011-01-04 17:03 UTC (permalink / raw) To: Daniel Bünzli; +Cc: Caml-list List Daniel, You're right about those transformations. Yet, when your grammar has hundreds of nonterminals, most or all with semantic actions attached, the transformations you describe can be quite tedious and error-prone. Tools can certainly help here, but that would require people being familiar with them. Its usually easier just to take Yacc for a spin than to go that effort. I'm not trying to advocate for one approach over another. I was just trying to answer your question about why people do the things they do. Also, if you have any further questions, may I suggest we take it off the caml-list? I think we've wandered off-topic here. :-) -- Yitzhak On Jan 4, 2011, at 11:42 AM, Daniel Bünzli wrote: >> A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some >> combinators fits perfectly," > > Well I don't know what "a lot of effort" means to you. But IIRC my > compiler course it's a matter of taking your EBNF grammar, eliminating > left recursion and left factoring the grammar. If you can't do that > with your grammar then you cannot parse it with LL(k). > > Best, > > Daniel > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ----------------------------- Yitzhak Mandelbaum ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 15:31 ` Daniel Bünzli 2011-01-04 16:22 ` Yitzhak Mandelbaum @ 2011-01-04 17:04 ` Jean Krivine 2011-01-04 17:22 ` Daniel Bünzli 1 sibling, 1 reply; 26+ messages in thread From: Jean Krivine @ 2011-01-04 17:04 UTC (permalink / raw) To: Daniel Bünzli; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 609 bytes --] On Tue, Jan 4, 2011 at 4:31 PM, Daniel Bünzli <daniel.buenzli@erratique.ch>wrote: > That's not the bug. The Invalid_argument exception is never supposed > to be caught, it denotes a programming error from the client of the > module. > > The bug lies in calling Array.make with a value larger than > Sys.max_array_length. > > Agree, but here I am never calling Array.make so ocaml is "internally" calling array.make with a wrong argument. So the bug is not really mine. An easy "patch" would be to define a Parsing.Overflow exception to throw whenever the function grow_stack() fails... J [-- Attachment #2: Type: text/html, Size: 922 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 17:04 ` Jean Krivine @ 2011-01-04 17:22 ` Daniel Bünzli 0 siblings, 0 replies; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 17:22 UTC (permalink / raw) To: Jean Krivine; +Cc: caml-list > So the bug is not really mine. Of course it's not. > An easy "patch" would be to define a Parsing.Overflow exception to throw > whenever the function grow_stack() fails... If you mean catching Invalid_argument and raising Overflow then no. What needs to be done is to raise Overflow when you are about to call Array.make with a a value greater than Sys.max_array_length. Again, Invalid_argument exceptions are programming errors (here a programming error of the Parsing module). Code catching Invalid_argument to implement an algorithm is unsound. The only place where you should catch Invalid_argument is at the toplevel of your program to report the stack trace. One obvious way of understanding that is to see what is raised on out of bounds access and consider the -unsafe compiler option. Daniel P.S. Btw if you want that to be fixed you should file a bug report. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 15:14 ` Jean Krivine 2011-01-04 15:31 ` Daniel Bünzli @ 2011-01-04 15:38 ` bluestorm 2011-01-04 17:43 ` Jean Krivine [not found] ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr> [not found] ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr> 2 siblings, 2 replies; 26+ messages in thread From: bluestorm @ 2011-01-04 15:38 UTC (permalink / raw) To: Jean Krivine; +Cc: Török Edwin, caml-list [-- Attachment #1: Type: text/plain, Size: 912 bytes --] Indeed, Ocaml arrays are limited in length, and the parser generator may use arrays internally that would run into the limit for specific grammars (I don't think the memory used is linear in the size of the input in the common use cases). You may be interested in trying to use menhir [1] as a Parser generator instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but doesn't use the Parsing module. While I don't think it as done anything specific to support larger input files, the issue may go away (or don't appear on the input sizes you need) using the different menhir implementation. [1] http://gallium.inria.fr/~fpottier/menhir/ Of course, patching ocamlyacc (or any other generator) to fix this issue would be the best way to handle this. But still, switching to a different but 90% compatible software may be a least-effort solution for you -- provided it doesn't have the same issue. [-- Attachment #2: Type: text/html, Size: 1218 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 15:38 ` bluestorm @ 2011-01-04 17:43 ` Jean Krivine [not found] ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr> 1 sibling, 0 replies; 26+ messages in thread From: Jean Krivine @ 2011-01-04 17:43 UTC (permalink / raw) To: bluestorm; +Cc: Török Edwin, caml-list [-- Attachment #1: Type: text/plain, Size: 1482 bytes --] I just tried menhir. Looks great ! Problem is it generates an invalid myParser.ml file: it contains a integer _menhir_shifted = 4611686018427387903; which way beyond int32 integers. I guess it comes from my grammar file which is certainly very bad, but in any case I should receive some insult from menhir first no ? I guess this error is linked to the Array.make exception that I get when I use ocamlyacc! J On Tue, Jan 4, 2011 at 4:38 PM, bluestorm <bluestorm.dylc@gmail.com> wrote: > Indeed, Ocaml arrays are limited in length, and the parser generator may > use arrays internally that would run into the limit for specific grammars (I > don't think the memory used is linear in the size of the input in the common > use cases). > > You may be interested in trying to use menhir [1] as a Parser generator > instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but > doesn't use the Parsing module. While I don't think it as done anything > specific to support larger input files, the issue may go away (or don't > appear on the input sizes you need) using the different menhir > implementation. > > [1] http://gallium.inria.fr/~fpottier/menhir/<http://gallium.inria.fr/%7Efpottier/menhir/> > > Of course, patching ocamlyacc (or any other generator) to fix this issue > would be the best way to handle this. But still, switching to a different > but 90% compatible software may be a least-effort solution for you -- > provided it doesn't have the same issue. > [-- Attachment #2: Type: text/html, Size: 1900 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr>]
* Re: [Caml-list] Array.make exception and parser [not found] ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr> @ 2011-01-04 17:53 ` Francois Pottier 0 siblings, 0 replies; 26+ messages in thread From: Francois Pottier @ 2011-01-04 17:53 UTC (permalink / raw) To: Jean Krivine; +Cc: bluestorm, Török Edwin, caml-list On Tue, Jan 04, 2011 at 06:44:03PM +0100, Jean Krivine wrote: > I just tried menhir. Looks great ! Problem is it generates an invalid > myParser.ml file: > it contains a integer > > _menhir_shifted = 4611686018427387903; > > which way beyond int32 integers. I guess it comes from my grammar file which > is certainly very bad, but in any case I should receive some insult from > menhir first no ? Weird. Could you send a bug report that includes your grammar (if possible) as well as the generated file and send it either to me or (even better) to menhir-list@yquem.inria.fr? In the meantime, you may also try with the --table option (this uses a different code generation scheme). -- François Pottier Francois.Pottier@inria.fr http://gallium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>]
* Re: [Caml-list] Array.make exception and parser [not found] ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr> @ 2011-01-04 17:45 ` Francois Pottier 2011-01-04 19:30 ` Daniel Bünzli [not found] ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr> 0 siblings, 2 replies; 26+ messages in thread From: Francois Pottier @ 2011-01-04 17:45 UTC (permalink / raw) To: bluestorm; +Cc: Jean Krivine, Török Edwin, caml-list Hello all, On Tue, Jan 04, 2011 at 04:38:56PM +0100, bluestorm wrote: > You may be interested in trying to use menhir [1] as a Parser generator > instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but > doesn't use the Parsing module. While I don't think it as done anything > specific to support larger input files, the issue may go away (or don't > appear on the input sizes you need) using the different menhir > implementation. > > [1] http://gallium.inria.fr/~fpottier/menhir/ Interesting suggestion, indeed. Menhir does not use an ocaml array to implement the parser's stack; instead, it uses a heap-allocated linked list. It might scale up better than the ocamlyacc-generated parser (although of course it might just fill up the heap; the linked list approach probably consumes more space than the array-based approach by a constant factor). I would be curious to hear whether this works. Earlier, Daniel Bünzli wrote: > I don't really understand why people insist on using > yacc like parser generators where a recursive descent parser > with some combinators ... For what it's worth, here is my answer: an LR parser generator (like Menhir) accepts a larger class of grammars without refactoring (no need to eliminate left recursion, identify common left factors, etc.) and is also able to *explain* why the grammar is ambiguous (or rather, why it lies outside the LR class), which a combinator-based approach cannot do. -- François Pottier Francois.Pottier@inria.fr http://gallium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 17:45 ` Francois Pottier @ 2011-01-04 19:30 ` Daniel Bünzli 2011-01-04 19:52 ` Yitzhak Mandelbaum [not found] ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr> 1 sibling, 1 reply; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 19:30 UTC (permalink / raw) To: Francois.Pottier; +Cc: caml-list > For what it's worth, here is my answer: an LR parser generator (like Menhir) > accepts a larger class of grammars without refactoring (no need to eliminate > left recursion, identify common left factors, etc.) and is also able to > *explain* why the grammar is ambiguous (or rather, why it lies outside the LR > class), which a combinator-based approach cannot do. So everybody trades developer convenience for end user convenience -- I mean, the syntax errors produced by LR parsers are just terrible, the ocaml compiler is here to witness that. That's depressing. Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 19:30 ` Daniel Bünzli @ 2011-01-04 19:52 ` Yitzhak Mandelbaum 2011-01-04 20:36 ` Daniel Bünzli 0 siblings, 1 reply; 26+ messages in thread From: Yitzhak Mandelbaum @ 2011-01-04 19:52 UTC (permalink / raw) To: Daniel Bünzli; +Cc: Caml-list List Daniel, I'm not sure you want to bring the ocaml compiler as proof-case for your argument. Its parser is generated by ocamlyacc, which is a relatively ancient tool (the codebase is a modified version of the Berkeley Yacc, written in C. I'm guessing that the large majority of that code is well over 20 years old at this point). There are more modern LR parser generators which do a far better job and there's no theoretical reason they can't do just as well as LL tools. Also, you might want to keep in mind that this is largely a zero-sum game. If the developers spend more time on the parser, that's less time on the rest of the compiler or new research. So, saving developer time does, in fact, mean *more* user convenience -- just in a different part of the compiler. Yitzhak On Jan 4, 2011, at 2:30 PM, Daniel Bünzli wrote: >> For what it's worth, here is my answer: an LR parser generator (like Menhir) >> accepts a larger class of grammars without refactoring (no need to eliminate >> left recursion, identify common left factors, etc.) and is also able to >> *explain* why the grammar is ambiguous (or rather, why it lies outside the LR >> class), which a combinator-based approach cannot do. > > So everybody trades developer convenience for end user convenience -- > I mean, the syntax errors produced by LR parsers are just terrible, > the ocaml compiler is here to witness that. > > That's depressing. > > Daniel > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ----------------------------- Yitzhak Mandelbaum ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 19:52 ` Yitzhak Mandelbaum @ 2011-01-04 20:36 ` Daniel Bünzli 0 siblings, 0 replies; 26+ messages in thread From: Daniel Bünzli @ 2011-01-04 20:36 UTC (permalink / raw) To: Yitzhak Mandelbaum; +Cc: Caml-list List > There are more modern LR parser generators which do a far better job and there's no theoretical reason they can't do just as well as LL tools. I'm not talking about theory. Practically I have never seen an LR parser throw me good error messages. Besides ocaml users (understandably) do use ocamlyacc since it comes with the distribution and is needed by the ocaml compiler. > Also, you might want to keep in mind that this is largely a zero-sum game. If the developers spend more time on the parser, that's less time on the rest of the compiler or new research. So, saving developer time does, in fact, mean *more* user convenience -- just in a different part of the compiler. I dont buy this. Loosing the time of your users by producing crappy error messages is never a user convenience. And for a programming language like ocaml this also has the bad side effect to make the beginner's learning curve much steeper, which doesn't benefit ocaml developers as a whole. I think you overestimate the time it takes to code a recursive descent parser. The needed infrastructure is nothing more than a programming pattern. After that your code pretty much follows your (LL factored) grammar. Best, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>]
* Re: [Caml-list] Array.make exception and parser [not found] ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr> @ 2011-01-04 20:31 ` Francois Pottier 2011-01-04 20:40 ` Lukasz Stafiniak ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Francois Pottier @ 2011-01-04 20:31 UTC (permalink / raw) To: Daniel Bünzli; +Cc: caml-list On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote: > So everybody trades developer convenience for end user convenience -- > I mean, the syntax errors produced by LR parsers are just terrible, It is true that ocamlyacc (and Menhir) offer essentially no support for explaining parse errors. (The "error" pseudo-token, inherited from yacc, is supposed to help, but in my opinion its use pollutes the grammar and makes it uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there is a deep reason why LR parsers must be bad at explaining errors. In principle, upon detecting an error, an LR parser could easily dump the stack, which corresponds to the sentence (composed of terminal and non-terminal symbols) that has been recognized so far. It could also display the set of look-ahead tokens that would *not* have caused an error in the current state. (Come to think of it, this is a feature that I would like to add to Menhir, if only time was not so much of the essence!) -- François Pottier Francois.Pottier@inria.fr http://gallium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 20:31 ` Francois Pottier @ 2011-01-04 20:40 ` Lukasz Stafiniak 2011-01-04 21:03 ` Török Edwin 2011-01-05 14:12 ` Boris Yakobowski 2011-01-05 21:12 ` Boris Yakobowski 2 siblings, 1 reply; 26+ messages in thread From: Lukasz Stafiniak @ 2011-01-04 20:40 UTC (permalink / raw) To: Francois.Pottier; +Cc: Daniel Bünzli, caml-list On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier <Francois.Pottier@inria.fr> wrote: > On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote: >> So everybody trades developer convenience for end user convenience -- >> I mean, the syntax errors produced by LR parsers are just terrible, > > It is true that ocamlyacc (and Menhir) offer essentially no support for > explaining parse errors. But Menhir's support for grammar debugging is very nice and already a strong enough argument to switch over from ocamlyacc! Happy New Year. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 20:40 ` Lukasz Stafiniak @ 2011-01-04 21:03 ` Török Edwin 2011-01-05 3:24 ` Yitzhak Mandelbaum 0 siblings, 1 reply; 26+ messages in thread From: Török Edwin @ 2011-01-04 21:03 UTC (permalink / raw) To: caml-list On 2011-01-04 22:40, Lukasz Stafiniak wrote: > On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier > <Francois.Pottier@inria.fr> wrote: >> On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote: >>> So everybody trades developer convenience for end user convenience -- >>> I mean, the syntax errors produced by LR parsers are just terrible, >> >> It is true that ocamlyacc (and Menhir) offer essentially no support for >> explaining parse errors. > > But Menhir's support for grammar debugging is very nice and already a > strong enough argument to switch over from ocamlyacc! > How about camlp4, what kind of parser is it? LL/LR? I found its error messages more helpful than ocamlc's (for example just running my source code through camlp4o), though it could be better (it could say ; expected insteam of [semi]). Best regards, --Edwin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 21:03 ` Török Edwin @ 2011-01-05 3:24 ` Yitzhak Mandelbaum 0 siblings, 0 replies; 26+ messages in thread From: Yitzhak Mandelbaum @ 2011-01-05 3:24 UTC (permalink / raw) To: Török Edwin; +Cc: caml-list Good question -- its languages fall into neither of LL(k) or LR(k). It uses a form of priority parsing, with one token lookahead and a few grammar transformations (assuming camlp4 shares the same parsing technology as camlp5, and nothing has changed since I last took a look at the docs and source code). In more detail: Its basic operation is to try each branch in a series of branches until one is found which successfully parses beyond one token. So, it uses only 1 token of lookahead, and it decides which branch to take at the beginning of set of branches. These characteristics make it similar to LL(1). However, it does not ensure that branches do not conflict. So, if two branches share the same token (see more on this below), then the second one will never be taken. IIRC, the parsec parser combinator library has similar semantics for its standard branch combinator. Furthermore, it performs a number of transformations: First, it does a (shallow) left factoring. So, if a set of branches are defined, it will left factor them without looking into any of the nonterminals. Second, it supports a certain amount of left-recursion through dynamic tracking of the current nonterminal. --Yitzhak On Jan 4, 2011, at 4:03 PM, Török Edwin wrote: > On 2011-01-04 22:40, Lukasz Stafiniak wrote: >> On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier >> <Francois.Pottier@inria.fr> wrote: >>> On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote: >>>> So everybody trades developer convenience for end user convenience -- >>>> I mean, the syntax errors produced by LR parsers are just terrible, >>> >>> It is true that ocamlyacc (and Menhir) offer essentially no support for >>> explaining parse errors. >> >> But Menhir's support for grammar debugging is very nice and already a >> strong enough argument to switch over from ocamlyacc! >> > > How about camlp4, what kind of parser is it? LL/LR? > I found its error messages more helpful than ocamlc's (for example just > running my source code through camlp4o), though it could be better > (it could say ; expected insteam of [semi]). > > Best regards, > --Edwin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ----------------------------- Yitzhak Mandelbaum ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 20:31 ` Francois Pottier 2011-01-04 20:40 ` Lukasz Stafiniak @ 2011-01-05 14:12 ` Boris Yakobowski 2011-01-05 21:12 ` Boris Yakobowski 2 siblings, 0 replies; 26+ messages in thread From: Boris Yakobowski @ 2011-01-05 14:12 UTC (permalink / raw) To: The Caml Mailing List On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier <Francois.Pottier@inria.fr> wrote: > It is true that ocamlyacc (and Menhir) offer essentially no support for > explaining parse errors. (The "error" pseudo-token, inherited from yacc, is > supposed to help, but in my opinion its use pollutes the grammar and makes it > uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there > is a deep reason why LR parsers must be bad at explaining errors. In > principle, upon detecting an error, an LR parser could easily dump the stack, > which corresponds to the sentence (composed of terminal and non-terminal > symbols) that has been recognized so far. I think the stack would be useless for the user: too long, and impossible to understand without the grammar. It would be barely better for the writer of the grammar, as he would need to recognize the parsing state to produce an intelligible error report. I think the error token is a good idea, that just went too far. Its ability to shift and reduce allows writing parsers that recovers from syntax errors, but we hardly do that nowadays. Instead, using the error token causes bogus shift/reduce conflicts... What I propose is the following: still use the error token, but do not allow reduction. Instead, only allow productions that return exceptions when they contain the error token. This way, the parse errors are caught inside the grammar, as they should, but do not pollute the parsing itself. > It could also display the set of > look-ahead tokens that would *not* have caused an error in the current > state. (Come to think of it, this is a feature that I would like to add to > Menhir, if only time was not so much of the essence!) This would be incredibly useful (provided the mly writer uses sane names for its tokens, or ideally with some further cooperation from the lexer). Finally, a remark on the parse errors returned by the Ocaml compiler. As many of us, I find them very frustrating. However, the fault does not lie only in the parsing technology. The Ocaml grammar is much too ambiguous for its own good (no difference between toplevel lets and inner ones, no delimiters for ifs and matchs, etc...). As a result, the compiler often reports the error much too far. Camlp4 explains what syntactic entity it expected when it finds a parse error, but this only works if the error is detected at the right place. Language writers would be well inspired to keep that in mind when they design their language. (BTW: a link to a changelog on the homepage of Menhir would be great. And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the link to the archives is broken.) Cheers, -- Boris ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 20:31 ` Francois Pottier 2011-01-04 20:40 ` Lukasz Stafiniak 2011-01-05 14:12 ` Boris Yakobowski @ 2011-01-05 21:12 ` Boris Yakobowski 2 siblings, 0 replies; 26+ messages in thread From: Boris Yakobowski @ 2011-01-05 21:12 UTC (permalink / raw) To: The Caml Mailing List On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier <Francois.Pottier@inria.fr> wrote: > It is true that ocamlyacc (and Menhir) offer essentially no support for > explaining parse errors. (The "error" pseudo-token, inherited from yacc, is > supposed to help, but in my opinion its use pollutes the grammar and makes it > uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there > is a deep reason why LR parsers must be bad at explaining errors. In > principle, upon detecting an error, an LR parser could easily dump the stack, > which corresponds to the sentence (composed of terminal and non-terminal > symbols) that has been recognized so far. I think the stack would be useless for the user: too long, and impossible to understand without the grammar. It would be barely better for the writer of the grammar, as he would need to recognize the parsing state to produce an intelligible error report. I think the error token is a good idea, that just went too far. Its ability to shift and reduce allows writing parsers that recovers from syntax errors, but we hardly do that nowadays. Instead, using the error token causes bogus shift/reduce conflicts... What I propose is the following: still use the error token, but do not allow reduction. Instead, only allow productions that return exceptions when they contain the error token. This way, the parse errors are caught inside the grammar, as they should, but do not pollute the parsing itself. > It could also display the set of > look-ahead tokens that would *not* have caused an error in the current > state. (Come to think of it, this is a feature that I would like to add to > Menhir, if only time was not so much of the essence!) This would be incredibly useful (provided the mly writer uses sane names for its tokens, or ideally with some further cooperation from the lexer). Finally, a remark on the parse errors returned by the Ocaml compiler. As many of us, I find them very frustrating. However, the fault does not lie only in the parsing technology. The Ocaml grammar is much too ambiguous for its own good (no difference between toplevel lets and inner ones, no delimiters for ifs and matchs, etc...). As a result, the compiler often reports the error too far. Camlp4 explains what syntactic entity it expected when it finds a parse error, but this only works if the error is detected at the right place :-( (BTW: a link to a changelog on the homepage of Menhir would be great. And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the link to the archives is broken.) Cheers, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine 2011-01-04 14:56 ` Daniel Bünzli 2011-01-04 14:57 ` Török Edwin @ 2011-01-05 13:37 ` Xavier Leroy 2011-01-05 13:46 ` Jean Krivine 2 siblings, 1 reply; 26+ messages in thread From: Xavier Leroy @ 2011-01-05 13:37 UTC (permalink / raw) To: caml-list On 01/04/2011 02:41 PM, Jean Krivine wrote: > I am encountering a weird problem, I am trying to parse a very large > file (around 1.2 GB) according to a grammar defined in ocamlyacc. > During the parsing I get the exception > Invalid_argument "Array.make". > This is strange because I am not using any array. As others said, you're just overflowing the stack of the parse engine. As no one else said already, watch out for left recusion vs. right recursion in your grammar, esp. the rules matching long sequences. For more details, see e.g. http://tldp.org/HOWTO/Lex-YACC-HOWTO-6.html section 6.2, or many compiler textbooks. - Xavier Leroy ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Array.make exception and parser 2011-01-05 13:37 ` Xavier Leroy @ 2011-01-05 13:46 ` Jean Krivine 0 siblings, 0 replies; 26+ messages in thread From: Jean Krivine @ 2011-01-05 13:46 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list Good point. I think the fundamental reason is my poor grammar. But I guess this should deserve a better warning that an Invalid_argument in my face :) Thanks! On Wed, Jan 5, 2011 at 2:37 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: > On 01/04/2011 02:41 PM, Jean Krivine wrote: > >> I am encountering a weird problem, I am trying to parse a very large >> file (around 1.2 GB) according to a grammar defined in ocamlyacc. >> During the parsing I get the exception >> Invalid_argument "Array.make". >> This is strange because I am not using any array. > > As others said, you're just overflowing the stack of the parse engine. > > As no one else said already, watch out for left recusion vs. right > recursion in your grammar, esp. the rules matching long sequences. For more > details, see e.g. > > http://tldp.org/HOWTO/Lex-YACC-HOWTO-6.html > > section 6.2, or many compiler textbooks. > > - Xavier Leroy > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2011-01-05 21:13 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine 2011-01-04 14:56 ` Daniel Bünzli 2011-01-04 14:57 ` Daniel Bünzli 2011-01-04 14:57 ` Török Edwin 2011-01-04 15:14 ` Jean Krivine 2011-01-04 15:31 ` Daniel Bünzli 2011-01-04 16:22 ` Yitzhak Mandelbaum 2011-01-04 16:42 ` Daniel Bünzli 2011-01-04 17:03 ` Yitzhak Mandelbaum 2011-01-04 17:04 ` Jean Krivine 2011-01-04 17:22 ` Daniel Bünzli 2011-01-04 15:38 ` bluestorm 2011-01-04 17:43 ` Jean Krivine [not found] ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr> 2011-01-04 17:53 ` Francois Pottier [not found] ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr> 2011-01-04 17:45 ` Francois Pottier 2011-01-04 19:30 ` Daniel Bünzli 2011-01-04 19:52 ` Yitzhak Mandelbaum 2011-01-04 20:36 ` Daniel Bünzli [not found] ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr> 2011-01-04 20:31 ` Francois Pottier 2011-01-04 20:40 ` Lukasz Stafiniak 2011-01-04 21:03 ` Török Edwin 2011-01-05 3:24 ` Yitzhak Mandelbaum 2011-01-05 14:12 ` Boris Yakobowski 2011-01-05 21:12 ` Boris Yakobowski 2011-01-05 13:37 ` Xavier Leroy 2011-01-05 13:46 ` Jean Krivine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox