From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA09779 for caml-redistribution; Thu, 4 Dec 1997 17:08:16 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA08985 for ; Thu, 4 Dec 1997 16:28:00 +0100 (MET) Received: from infbssys.ips.cs.tu-bs.de (root@infbssys.ips.cs.tu-bs.de [134.169.32.1]) by nez-perce.inria.fr (8.8.7/8.8.5) with ESMTP id QAA08660 for ; Thu, 4 Dec 1997 16:27:58 +0100 (MET) Received: from infbsst3.ips.cs.tu-bs.de (lindig@infbsst3.ips.cs.tu-bs.de [134.169.32.68]) by infbssys.ips.cs.tu-bs.de with ESMTP id QAA22970 (8.8.8/IDA-1.6 for ); Thu, 4 Dec 1997 16:27:55 +0100 Date: Thu, 4 Dec 1997 16:27:54 +0100 (MET) Message-Id: <199712041527.QAA09344@infbsst3.ips.cs.tu-bs.de> From: Christian Lindig To: caml-list@inria.fr Subject: OCamllex 1.06 patch - adds let bound regexps Sender: weis As an enthusiastic user of OCaml I would like to support the OCaml project with a small contribution: an ocamllex patch that adds the ability to bind frequently used regular expressions to names. This is the feature I missed most in comparison to other lex implementations. You can find the patch against OCaml 1.06 at our ftp server: ftp://ftp.ips.cs.tu-bs.de/pub/local/softech/misc/ocamllex-1.06.patch.gz It includes examples and updated documentation. The README which explains the patch in more detail can be found there, too, and at the end of this mail. Best regards, Christian ------------------------------------------------------------------------------ Christian Lindig lindig@ips.cs.tu-bs.de TU Braunschweig fon +49 531 391 7465 Institut fuer Programmiersprachen fax +49 531 391 8140 D-38106 Braunschweig http://www.cs.tu-bs.de ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- README ----------------------------------------------------------------------------- Ocamllex from OCaml 1.06 does not allow to bind frequently used regular expressions to names. This feature is provided by many other lex implementations and permits to write easier to maintain lex specification. This text describes a patch which adds a "let" construct to ocamllex which just adds this feature. After the header and before the entry points of an ocamllex specification "let" can be used to bind regular expressions to names: { (* header *) } let whitespace = [' ' '\t'] let digit = ['0'-'9'] let digits = ['0'-'9']+ let lowercase = ['a'-'z'] let uppercase = ['A'-'Z'] let ident = (uppercase|lowercase) (uppercase|lowercase|digit)* rule token = parse whitespace { token lexbuf } (* skip blanks *) | ['\n' ] { EOL } | digits { INT(int_of_string(Lexing.lexeme lexbuf)) } | '+' { PLUS } | '-' { MINUS } | '*' { TIMES } | '/' { DIV } | '(' { LPAREN } | ')' { RPAREN } | eof { raise Eof } Let bound symbols can be used to define other symbols and inside rules. The symbols used to bind regular expressions must be distinct from any ocamllex keyword like rule or eof und symbols must be defined before they can be used. Because let bound names are optional all old lex specifications that do not use let-bindings are accepted by the new ocamllex. The new feature is thus fully backward compatible. The documentation of ocamllex and ocamlyac has been updated to reflect the new feature. However, because the original documentation was not available as source only an ascii text 'lex.doc' is provided. To find the differences to the original documentation 'lex.doc.orig' is also provided. As a larger example the lex specification of ocamllex has been edited to use the new feature. It can be used to boostrap the new ocamllex and is provided as 'example.mll' The implementation of the let construct is straight forward: a preprocessing step builds a symbol table and replaces all symbols with the regular expressions they denote. The then symbol free regular expressions are passed to the existing machine. A better implementation what have used the bound names to avoid constructing certain automatons multiple times as it is done now. Christian Lindig Institut f\"ur Programmiersprachen und Informationssysteme Abteilung Softwaretechnologie TU Braunschweig D-38106 Braunschweig Germany lindig@ips.cs.tu-bs.de