From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 028FFBC37 for ; Wed, 1 Jul 2009 17:25:13 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvoBAOccS0pCbwQbe2dsb2JhbACZCAEBFiQEtm2EEAU X-IronPort-AV: E=Sophos;i="4.42,324,1243807200"; d="scan'208";a="29102168" Received: from out3.smtp.messagingengine.com ([66.111.4.27]) by mail2-smtp-roc.national.inria.fr with ESMTP; 01 Jul 2009 17:25:12 +0200 Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 5894A3865C3; Wed, 1 Jul 2009 11:25:11 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Wed, 01 Jul 2009 11:25:12 -0400 X-Sasl-enc: KM7C+iIDL5q1AQ4cb0qQ2dweTaugV8wtFe3ZKcSsvfon 1246461910 Received: from [192.168.1.10] (ALyon-157-1-126-193.w90-42.abo.wanadoo.fr [90.42.21.193]) by mail.messagingengine.com (Postfix) with ESMTPSA id 5197C1784D; Wed, 1 Jul 2009 11:25:10 -0400 (EDT) Message-ID: <4A4B7E7F.3040908@ens-lyon.org> Date: Wed, 01 Jul 2009 17:19:27 +0200 From: Martin Jambon User-Agent: Thunderbird 2.0.0.17 (X11/20081008) MIME-Version: 1.0 To: Sylvain Le Gall Cc: caml-list@inria.fr Subject: Re: [Caml-list] Re: ocamllex and python-style indentation References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> <1750DDE9-4FC6-4657-9687-58240F520A70@mpi-sws.org> <2a1a1a0c0906301913t1dce3eebie91c7cc94f015481@mail.gmail.com> <2a1a1a0c0907010702u19ad7ba6kd831a71fa1b7f2a5@mail.gmail.com> <4A4B700D.3060701@mpi-sws.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; ens-lyon:01 ocamllex:01 rossberg:01 rossberg:01 lexer:01 lexbuf:01 lexbuf:01 decr:01 ocaml:01 wrote:01 wrote:01 compilers:01 andreas:01 caml-list:01 parentheses:01 Sylvain Le Gall wrote: > Hello, > > On 01-07-2009, Andreas Rossberg wrote: >> Mike Lin wrote: >>> OK, now I'm curious :) how does your lexer match balanced parentheses, >>> or in this case comments? >>> >> Easily, with a bit of side effects (I think that's roughly how all ML >> compilers do it): >> >> ------------------------------------------------ >> let error l s = (* ... *) >> let commentDepth = ref 0 >> let start = ref 0 >> let loc length = let pos = !start in (pos, pos+length) >> >> rule lex = >> parse eof { EOF } >> (* | ... *) >> | "{-" { start := pos lexbuf; >> lexNestComment lexbuf } >> >> and lexNestComment = >> parse eof { error (loc 2) "unterminated comment" } >> | "(*" { incr commentDepth; >> lexNestComment lexbuf } >> | "*)" { decr commentDepth; >> if !commentDepth > 0 >> then lexNestComment lexbuf >> else lex lexbuf } >> | _ { lexNestComment lexbuf } >> ------------------------------------------------ >> >> If you also want to treat strings in comments specially (like OCaml), >> then you need to do a bit more work, but it's basically the same idea. >> > > May I recommend you to write this in a more simple way: > > ------------------------------------------------------------------------- > rule lex = > parse eof { () } > | "(*" { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf } > > and lexNestComment = > parse eof { error (loc 2) "unterminated comment" } > | "(*" { lexNestComment lexbuf } > | "*)" { () } > | _ { lexNestComment lexbuf } > ------------------------------------------------------------------------- > > I think it works the same way, except that it uses less global > variables. You can even get rid of global variables completely: rule lex x = parse eof { () } | "(*" { x.start <- pos lexbuf; lexNestComment x lexbuf; lex x lexbuf } and lexNestComment x = parse eof { error (loc x 2) "unterminated comment" } | "(*" { lexNestComment x lexbuf } | "*)" { () } | _ { lexNestComment x lexbuf } Martin -- http://mjambon.com/