From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 4513DBC37 for ; Thu, 11 Jun 2009 15:50:30 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhwBAO2oMEpCbwQakGdsb2JhbACYPwEBAQEJCQwHEgS2WIQKBQ X-IronPort-AV: E=Sophos;i="4.42,202,1243807200"; d="scan'208";a="29326522" Received: from out2.smtp.messagingengine.com ([66.111.4.26]) by mail3-smtp-sop.national.inria.fr with ESMTP; 11 Jun 2009 15:50:13 +0200 Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id D0ABB35E356; Thu, 11 Jun 2009 09:50:11 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Thu, 11 Jun 2009 09:50:11 -0400 X-Sasl-enc: 2fB0+FX1HAqdM9WM3u3AtndwAUlIyAFdkJRIQOcM9Nv8 1244728211 Received: from [192.168.1.10] (ALyon-157-1-70-183.w81-251.abo.wanadoo.fr [81.251.157.183]) by mail.messagingengine.com (Postfix) with ESMTPSA id 2DD1B126AD; Thu, 11 Jun 2009 09:50:11 -0400 (EDT) Message-ID: <4A310A5B.9010404@ens-lyon.org> Date: Thu, 11 Jun 2009 15:44:59 +0200 From: Martin Jambon User-Agent: Thunderbird 2.0.0.17 (X11/20081008) MIME-Version: 1.0 To: Andrej Bauer Cc: caml-list@inria.fr Subject: Re: [Caml-list] ocamllex and python-style indentation References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> In-Reply-To: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; ens-lyon:01 ocamllex:01 andrej:01 lexer:01 parser:01 ocamllex:01 lexer:01 parser:01 ocaml:01 parses:01 flatten:01 tokens:01 tokens:01 flatten:01 ocamlyacc:01 Andrej Bauer wrote: > My parsing powers are not sufficient to easily come up with > lexer/parser for a simple language that uses python-style indentation > and newline rules. Does anyone have such a thing lying around, written > in ocamllex/yacc or menhir? I would appreciate a peek to see how > you've dealt with it. > > For example, suppose we want just a very simple fragment of Python > involving True, False, conditional statements, variables, and > assignments, such as: > > if True: > x = 3 > y = (2 + > 4 + 5) > else: > x = 5 > if False: > x = 8 > z = 2 > > How would I go about writing a lexer/parser for such a thing in ocaml? I would use a first pass that converts the input lines into this imaginary structure: { if True: ; { x = 3 ; y = (2 + ; { 4 + 5) } } ; else: ; { x = 5 ; if False: ; { x = 8 ; z = 2 } } } You could create a generic tool that parses a file into this: type t = Line of loc * string | Block of loc * t list but as suggested by Yoann, the next step should probably be to flatten this into a stream by introducing artificial tokens: type gen_token = Open of loc (* fake "{" *) | Close of loc (* fake "}" *) | Separator of loc (* fake ";" *) | Line of loc * string then parse each Line into a list of tokens and flatten the result into one single token stream: type token = OPEN_BLOCK of loc (* fake "{" *) | CLOSE_BLOCK of loc (* fake "}" *) | SEPARATOR of loc (* fake ";" *) | ... (* your language-specific tokens here *) The token stream could then be processed by ocamlyacc/menhir. That's the approach I would follow if I had to solve this problem again. Martin -- http://mjambon.com/