From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA16729 for caml-redistribution; Wed, 3 Feb 1999 12:08:28 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA26273 for ; Mon, 1 Feb 1999 14:13:36 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.8.7/8.8.7) with ESMTP id OAA07145; Mon, 1 Feb 1999 14:13:27 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA22789; Mon, 1 Feb 1999 14:13:28 +0100 (MET) Message-ID: <19990201141328.53683@pauillac.inria.fr> Date: Mon, 1 Feb 1999 14:13:28 +0100 From: Xavier Leroy To: David McClain , Liste CAML Subject: Re: GenLex stream parsers too eager? References: <000c01be4952$09752290$210148bf@dylan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1 In-Reply-To: <000c01be4952$09752290$210148bf@dylan>; from David McClain on Tue, Jan 26, 1999 at 10:33:46AM -0700 Sender: weis > It appears that the Genlex derived parsers always eagerly tokenize > negaitve integer and float constants. This causes incorrect behavior > in closely spaced code (no-spaces): > > a-2*c --> parses as "a", "-2" ,"*", "c" instead of "a","-","2","*","c" > Right. This is a classic compiler problem: one can either tokenize negative integer literals in the lexer (-?[0-9]+), which causes the weird behavior above for expressions without spaces, or have the lexer tokenize only positive integer literals ([0-9]+) and add a special case in the parser to recognize "-" followed by an integer literal. Genlex is very simple-minded and follows the former approach. The Caml compilers follow the latter. (The latter approach has its own problems. For instance, in Caml, it parses "f -1" as "f minus 1", not as "f applied to the integer -1", like many users expect.) > Any suggestions? (Perhaps I should be using OCAMLLEX and OCAMLYACC instead?) You'll have to write your own lexer, indeed. You can either use ocamllex to generate it, or start with the source code of the Genlex module and customize it to your needs. Best regards, - Xavier Leroy