From: Martin Jambon <martin.jambon@ens-lyon.org>
To: Andrej Bauer <andrej.bauer@andrej.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] ocamllex and python-style indentation
Date: Thu, 11 Jun 2009 15:44:59 +0200 [thread overview]
Message-ID: <4A310A5B.9010404@ens-lyon.org> (raw)
In-Reply-To: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com>
Andrej Bauer wrote:
> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.
>
> For example, suppose we want just a very simple fragment of Python
> involving True, False, conditional statements, variables, and
> assignments, such as:
>
> if True:
> x = 3
> y = (2 +
> 4 + 5)
> else:
> x = 5
> if False:
> x = 8
> z = 2
>
> How would I go about writing a lexer/parser for such a thing in ocaml?
I would use a first pass that converts the input lines into this imaginary
structure:
{
if True:
;
{
x = 3
;
y = (2 +
;
{
4 + 5)
}
}
;
else:
;
{
x = 5
;
if False:
;
{
x = 8
;
z = 2
}
}
}
You could create a generic tool that parses a file into this:
type t = Line of loc * string | Block of loc * t list
but as suggested by Yoann, the next step should probably be to flatten this
into a stream by introducing artificial tokens:
type gen_token =
Open of loc (* fake "{" *)
| Close of loc (* fake "}" *)
| Separator of loc (* fake ";" *)
| Line of loc * string
then parse each Line into a list of tokens and flatten the result into one
single token stream:
type token =
OPEN_BLOCK of loc (* fake "{" *)
| CLOSE_BLOCK of loc (* fake "}" *)
| SEPARATOR of loc (* fake ";" *)
| ... (* your language-specific tokens here *)
The token stream could then be processed by ocamlyacc/menhir.
That's the approach I would follow if I had to solve this problem again.
Martin
--
http://mjambon.com/
next prev parent reply other threads:[~2009-06-11 13:50 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-11 12:57 Andrej Bauer
2009-06-11 13:12 ` [Caml-list] " yoann padioleau
2009-06-11 13:21 ` Andreas Rossberg
2009-06-11 13:44 ` Martin Jambon [this message]
2009-06-12 8:20 ` Andrej Bauer
2009-06-12 12:56 ` Martin Jambon
2009-06-12 13:34 ` Martin Jambon
2009-06-12 15:43 ` Andreas Rossberg
2009-06-30 18:58 ` Yitzhak Mandelbaum
2009-06-30 20:19 ` Mike Lin
2009-06-30 22:06 ` Andreas Rossberg
2009-07-01 2:13 ` Mike Lin
2009-07-01 7:31 ` Andreas Rossberg
2009-07-01 14:02 ` Mike Lin
2009-07-01 14:17 ` Andreas Rossberg
2009-07-01 14:21 ` Andreas Rossberg
2009-07-01 14:37 ` Mike Lin
2009-07-01 15:03 ` Sylvain Le Gall
2009-07-01 15:16 ` [Caml-list] " Andreas Rossberg
2009-07-01 16:26 ` Sylvain Le Gall
2009-07-01 15:19 ` [Caml-list] " Martin Jambon
2009-07-01 15:43 ` Andreas Rossberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A310A5B.9010404@ens-lyon.org \
--to=martin.jambon@ens-lyon.org \
--cc=andrej.bauer@andrej.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox