* camlp5/revised syntax questions
@ 2009-10-07 16:20 Aaron Bohannon
2009-10-07 20:16 ` [Caml-list] " blue storm
0 siblings, 1 reply; 5+ messages in thread
From: Aaron Bohannon @ 2009-10-07 16:20 UTC (permalink / raw)
To: caml-list
>From reading the camlp5 documentation, I've managed to write a syntax
extension that adds a new expression starting with a distinct keyword,
and it seems to work fine. However, if I want to experiment with
infix notations, things get a little trickier. I need to specify it's
precedence and associativity, of course.
So, there is a list of syntactic structures on this page:
http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/ast_transi.html
1) Where can I find the "level names" for each of these syntactic
constructs, for use with BEFORE, LIKE, etc? Is that what the
"Comment" column is for?
2) I am confused by the fact that this is a list is for the revised
syntax. I think most people (including me) want to modify the
original syntax. e.g., imagine that I want to modify the record
update operator "<-" in the original syntax---I need to refer to it
somehow, but it doesn't even appear in the list.
- Aaron
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] camlp5/revised syntax questions 2009-10-07 16:20 camlp5/revised syntax questions Aaron Bohannon @ 2009-10-07 20:16 ` blue storm 2009-10-08 14:39 ` Aaron Bohannon 0 siblings, 1 reply; 5+ messages in thread From: blue storm @ 2009-10-07 20:16 UTC (permalink / raw) To: Aaron Bohannon; +Cc: caml-list Disclaimer : I have learned camlp4 from the ocaml distribution >= 3.10, wich is different from camlp5 : take my words with a grain of salt. On Wed, Oct 7, 2009 at 6:20 PM, Aaron Bohannon <bohannon@cis.upenn.edu> wrote: > >From reading the camlp5 documentation, I've managed to write a syntax > extension that adds a new expression starting with a distinct keyword, > and it seems to work fine. However, if I want to experiment with > infix notations, things get a little trickier. I need to specify it's > precedence and associativity, of course. > > So, there is a list of syntactic structures on this page: > http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/ast_transi.html > > 1) Where can I find the "level names" for each of these syntactic > constructs, for use with BEFORE, LIKE, etc? Is that what the > "Comment" column is for? The different "level names" are not absolute among all camlp4 grammars : they're a property of each grammar rule of each grammar. If you want to "modify" a specific grammar (that is, EXTEND it), you must check the different levels available in the definition. See the files meta/pa_r.ml and etc/pa_o.ml in the camlp5 source tree for the definition of the revised and classical syntax, respectively. Luckily, the expr rules (the one defining ocaml expressions, wich you seem interested in) of both syntax mostly share the same levels (the revised syntax has an additional "where" level for example, but they're otherwise mostly the same). There is no explicitely "precedence" in camlp4 parlance, you must use the levels instead : each precedence level of infix operators has it own level, usually named after the most representative infix operator of the level : ':=', '||', '&&', '<'... > 2) I am confused by the fact that this is a list is for the revised > syntax. I think most people (including me) want to modify the > original syntax. e.g., imagine that I want to modify the record > update operator "<-" in the original syntax---I need to refer to it > somehow, but it doesn't even appear in the list. While concrete syntaxes for revised and classical syntax are different, the abstract syntax tree is the same. Camlp4 quotations works by replacing (using camlp4) the quotation you wrote by the concrete ocaml AST representation. A nice side-effet of this is that you can use quotations "in the revised syntax" (the code inside quotations use the revised syntax) when writing an extension for the classical syntax. Eg. if you parse "for i = 0 to 10 step 2 do ... done" and you want to generate the OCaml AST corresponding to "for i = 0 to 10/2 do let i = i * 2 in ... done", you can write <:expr< for i = 0 to 10 / 2 do { let i = i * 2 in ... } >> , it will generate the corresponding AST and be printed (after processing the source using the extenion) in whatever syntax the user of your extension is using (probably the classical one). This way (using revised syntax inside the quotation of your extension), your can stay consistent with the camlp5 documentation (wich describes the quotation in the revised syntax). camlp4 >= 3.10 also has quotations in the classical syntax, but I wouldn't recommend using them : revised syntax is a less ambiguous syntax wich makes those things easier. In your specific case, you can parse whatever syntax you want using the "<-" operator, then output the corresponding AST using a quotation in the revised syntax, that is <:expr< a := b >> (instead of "a <- b"). For a reference, see the related rules in etc/pa_o.ml : | ":=" NONA [ e1 = SELF; ":="; e2 = expr LEVEL "expr1" -> <:expr< $e1$.val := $e2$ >> | e1 = SELF; "<-"; e2 = expr LEVEL "expr1" -> <:expr< $e1$ := $e2$ >> ] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] camlp5/revised syntax questions 2009-10-07 20:16 ` [Caml-list] " blue storm @ 2009-10-08 14:39 ` Aaron Bohannon 2009-10-10 12:31 ` blue storm 0 siblings, 1 reply; 5+ messages in thread From: Aaron Bohannon @ 2009-10-08 14:39 UTC (permalink / raw) To: blue storm; +Cc: caml-list Thanks for your detailed reply. I had a suspicion I would have to read the source code to get the all of the necessary documentation. However, I'm still missing some basic point here. On Wed, Oct 7, 2009 at 4:16 PM, blue storm <bluestorm.dylc@gmail.com> wrote: > The different "level names" are not absolute among all camlp4 grammars > : they're a property of each grammar rule of each grammar. If you want > to "modify" a specific grammar (that is, EXTEND it), you must check > the different levels available in the definition. Yes, I understand that. But how do you specify which grammar your file is extending? My file is structured like this: #load "pa_extend.cmo"; #load "q_MLast.cmo"; open Pcaml; EXTEND GLOBAL: expr; ... END; So where did I specify whether I was extending the original syntax or the revised syntax (or some other grammar entirely)? I suppose I must have implicitly chosen the original syntax because my code works fine on that. > While concrete syntaxes for revised and classical syntax are > different, the abstract syntax tree is the same. Camlp4 quotations > works by replacing (using camlp4) the quotation you wrote by the > concrete ocaml AST representation. Yes, this point is crystal clear, and I have no problem writing the quotations in the revised syntax. > In your specific case, you can parse whatever syntax you want using > the "<-" operator, then output the corresponding AST using a quotation > in the revised syntax, that is <:expr< a := b >> (instead of "a <- > b"). For a reference, see the related rules in etc/pa_o.ml : > > | ":=" NONA > [ e1 = SELF; ":="; e2 = expr LEVEL "expr1" -> > <:expr< $e1$.val := $e2$ >> > | e1 = SELF; "<-"; e2 = expr LEVEL "expr1" -> > <:expr< $e1$ := $e2$ >> ] Thanks, I found this piece of code. Now on a more specific point, I am confused about the parsing of record access and update: 1) In the parsing rule for the simple dot noation... | e1 = SELF; "."; e2 = SELF -> <:expr< $e1$ . $e2$ >> ] ...why is the field label an "expr"? This does not agree with the OCaml manual, which has a separate syntactic category for "field" (http://caml.inria.fr/pub/docs/manual-ocaml/expr.html), nor with my intuition about the meaning of the code. 2) Furthermore, as one can see from the ":=" entry above, the entire left side of a record update is parsed as its own subexpression. So this means, that in the context of a record update, that subexpression has to be interpreted as a reference, but in other contexts, the very same expression must be interpreted as a value. I don't necessarily care what kind of magic makes this possible on the back end, but I am wondering whether this has any implications for modifying the record syntax. - Aaron ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] camlp5/revised syntax questions 2009-10-08 14:39 ` Aaron Bohannon @ 2009-10-10 12:31 ` blue storm 2009-10-14 2:04 ` Aaron Bohannon 0 siblings, 1 reply; 5+ messages in thread From: blue storm @ 2009-10-10 12:31 UTC (permalink / raw) To: Aaron Bohannon; +Cc: caml-list On Thu, Oct 8, 2009 at 4:39 PM, Aaron Bohannon <bohannon@cis.upenn.edu> wrote: > Thanks for your detailed reply. I had a suspicion I would have to > read the source code to get the all of the necessary documentation. It is actually possible to pretty-print the grammar rules during camlp* execution. For example, here is the code I gave to the toplevel (using ocamlfind/findlib) to print the default "expr" grammar and levels : #use "topfind";; #camlp4o;; open Camlp4.PreCast;; Gram.Entry.print Format.std_formatter Syntax.expr;; This is probably camlp4-specific, but the printing routine is documented for camlp5 ( http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/library.html#b:printing-grammar-entries ) so an equivalent code should work. I however prefer to read the source code, wich is easier to browse and contains more information (pretty printing shows the parsing rules, but not the parse action). > However, I'm still missing some basic point here. > > On Wed, Oct 7, 2009 at 4:16 PM, blue storm <bluestorm.dylc@gmail.com> wrote: >> The different "level names" are not absolute among all camlp4 grammars >> : they're a property of each grammar rule of each grammar. If you want >> to "modify" a specific grammar (that is, EXTEND it), you must check >> the different levels available in the definition. > > Yes, I understand that. But how do you specify which grammar your > file is extending? My file is structured like this: > > #load "pa_extend.cmo"; > #load "q_MLast.cmo"; > open Pcaml; > > EXTEND > GLOBAL: expr; > ... > END; > > So where did I specify whether I was extending the original syntax or > the revised syntax (or some other grammar entirely)? I suppose I must > have implicitly chosen the original syntax because my code works fine > on that. The syntax extension mechanism is imperative in nature : the EXTEND statement works on an existing grammar and add/change/delete rules (camlp5 documentation : http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/grammars.html ) : more precisely, the EXTEND syntax is a camlp4 extension itself, wich gets desugared to a bare ocaml expression wich modifies the given Grammar.Entry.t values (in an imperative way). The revised and classical syntax are designed as syntax extensions (pa_o.ml pa_r.ml) that extend an empty grammar, wich already contains some (empty) grammar entries. They first clear every entry of that grammar (probably to make sure it's really empty), then add by extension every syntaxic construct of the ocaml language. They get compiled to pa_o.cmo and pa_r.cmo, wich you can pass to camlp4 to choose one of the two syntax : camlp4 pa_o.cmo my_extension.cmo ... What happens here is that : - camlp4 starts with an empty ocaml grammar - you link it to pa_o.cmo, wich gets executed and set up the classical syntax (by mutation of the (empty) grammar entries) - you then add your own extension wich makes additional mutations In essence, the effect of your extension depends on the side effects that were done before. If pa_o.cmo or pa_r.cmo was passed as a parameter, you build upon their syntax rules, but it can be the case that an additional syntax extension was added before yours, and thus you're actually working upon slightly modified syntax rules. camlp4o and camlp4r are just packaged versions of camlp4, wich respectively "pa_o.cmo" and "pa_r.cmo" implicitly linked. In general, reasonably local syntax extension tends to work on both the classical and the revised syntax (because their syntax rules are quite similar). If your extension depends on one of the syntax, you should specify it. If your extension tries to delete a rule wich was not present in the syntax you're extending, you will get a runtime error (for example, trying to delete the "where"-related rule in the classical syntax). > 1) In the parsing rule for the simple dot noation... > > | e1 = SELF; "."; e2 = SELF -> <:expr< $e1$ . $e2$ >> ] > > ...why is the field label an "expr"? This does not agree with the > OCaml manual, which has a separate syntactic category for "field" > (http://caml.inria.fr/pub/docs/manual-ocaml/expr.html), nor with my > intuition about the meaning of the code. Is suppose this presentation was chosen to make the grammar rules simpler. Camlp4 parsers are not tied to the documented ocaml grammar. Camlp4 grammars for ocaml (you can use camlp4 to parse other languages, without necessarily starting from the OCaml grammar) use a camlp4-specific ocaml AST with then get translated to the specific AST the OCaml compiler expects (when no camlp4 preprocessing is needed, the ocaml compiler use its own yacc parser wich directly produces the ocaml-compiler AST). There are actually subtle differences in parsing (for example "let id x = x in id fun _ -> ()" gets rejected by the non-camlp4 parser but parses fine under camlp4 and camlp5), and I don't think any of them is "right" : they are all tied to implementation-specific parsing strategies (weird recursive descent for camlp{4,5} and yacc), and I'm not sure even the yacc version rigourously respects the documented BNF grammar. > 2) Furthermore, as one can see from the ":=" entry above, the entire > left side of a record update is parsed as its own subexpression. So > this means, that in the context of a record update, that subexpression > has to be interpreted as a reference, but in other contexts, the very > same expression must be interpreted as a value. I don't necessarily > care what kind of magic makes this possible on the back end, but I am > wondering whether this has any implications for modifying the record > syntax. I'm not sure what you mean here, but I'm under the impression that you're confusing the syntaxic representation of the expression and its runtime/compile-time semantic. Camlp* knows nothing of the meaning of the code it produces; the output is an AST wich has no idea of what a "reference" and a "value" means. The semantic of the given code depends on the deeper passes of the compiler (for example typing), wich probably have an internal language of their own, and surely make the difference between lvalue and rvalues. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] camlp5/revised syntax questions 2009-10-10 12:31 ` blue storm @ 2009-10-14 2:04 ` Aaron Bohannon 0 siblings, 0 replies; 5+ messages in thread From: Aaron Bohannon @ 2009-10-14 2:04 UTC (permalink / raw) To: blue storm; +Cc: caml-list On Sat, Oct 10, 2009 at 8:31 AM, blue storm <bluestorm.dylc@gmail.com> wrote: > The revised and classical syntax are designed as syntax extensions > (pa_o.ml pa_r.ml) that extend an empty grammar, wich already contains > some (empty) grammar entries. They first clear every entry of that > grammar (probably to make sure it's really empty), then add by > extension every syntaxic construct of the ocaml language. They get > compiled to pa_o.cmo and pa_r.cmo, wich you can pass to camlp4 to > choose one of the two syntax : > camlp4 pa_o.cmo my_extension.cmo ... > > What happens here is that : > - camlp4 starts with an empty ocaml grammar > - you link it to pa_o.cmo, wich gets executed and set up the > classical syntax (by mutation of the (empty) grammar entries) > - you then add your own extension wich makes additional mutations Ah. I didn't understand that you can (and must) compile syntax extensions without committing to which grammar you are extending. Sequencing the "cmo" files as arguments to "camlp4/5" makes enough sense. Things seem a little more magical when loading extensions with the "#load" directive because it seems the parser has to change its behavior while it's in the middle of parsing, but at least I've got the basic idea now. > I'm not sure what you mean here, but I'm under the impression that > you're confusing the syntaxic representation of the expression and its > runtime/compile-time semantic. Camlp* knows nothing of the meaning of > the code it produces; the output is an AST wich has no idea of what a > "reference" and a "value" means. The semantic of the given code > depends on the deeper passes of the compiler (for example typing), > wich probably have an internal language of their own, and surely make > the difference between lvalue and rvalues. OK. I wasn't very precise. I meant that, with those parsing rules, there must be a sort of error that is generated after parsing, but before type-checking. For instance, let's say we have type foo = { mutable bar : int; } Then, according to the grammar, there is no parse error in the expression { bar = 3 } . "bar" But I don't know of what OCaml type error this is supposed to generate either. And as I wrote this, I realized it's quite easy to try this out and see what happens. What we actually get (using camlp5o) is: Failure: lowercase identifier expected Similarly, you can try this expression "foo" <- 3 and you will get Failure: bad left part of assignment I guess that, in practical terms, these are just parse errors, too. However, they are still a bit mysterious since they get generated on expressions that conform to the grammar. Poking around the camlp5 sources, it appears the errors are generated by camlp5, which means these expressions probably do not represent valid OCaml ASTs. In that case, I find camlp5's grammar design a little puzzling. - Aaron ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-10-14 2:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-10-07 16:20 camlp5/revised syntax questions Aaron Bohannon 2009-10-07 20:16 ` [Caml-list] " blue storm 2009-10-08 14:39 ` Aaron Bohannon 2009-10-10 12:31 ` blue storm 2009-10-14 2:04 ` Aaron Bohannon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox