From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA12051 for caml-red; Thu, 12 Oct 2000 18:56:27 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA12048 for ; Thu, 12 Oct 2000 18:42:58 +0200 (MET DST) Received: from nef.ens.fr (nef.ens.fr [129.199.96.32]) by concorde.inria.fr (8.10.0/8.10.0) with ESMTP id e9CGgvX14765 for ; Thu, 12 Oct 2000 18:42:58 +0200 (MET DST) Received: from basilic.ens.fr (basilic.ens.fr [129.199.99.48]) by nef.ens.fr (8.10.1/1.01.28121999) with ESMTP id e9CGgvw84960 for ; Thu, 12 Oct 2000 18:42:57 +0200 (CEST) Received: from localhost by basilic.ens.fr (8.9.2/jb-1.1) id SAA08483 ; Thu, 12 Oct 2000 18:42:57 +0200 (MET DST) X-Authentication-Warning: basilic.ens.fr: monniaux owned process doing -bs Date: Thu, 12 Oct 2000 18:42:56 +0200 (MET DST) From: David Monniaux X-Sender: monniaux@basilic.ens.fr To: Liste CAML Subject: programmer-friendly regexp package Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: weis@pauillac.inria.fr Heya all camlers, I have an alpha version of a CamlP4 preprocessing package enabling the use of a straightforward, easy syntax for regexps: http://www.di.ens.fr/~monniaux/download/regexp.tar.gz (* Defines and compiles the regexp; substrings to be extracted are labelled "foo" and "bar" *) let regexp = let k = "zzz" in (* notice the support for embedded variables and expressions *) RE { "X" (foo: (['0'-'9']+))? bar:(['a'-'z']+) "Y" k };; (* Now the actual pattern matching *) REmatch "+++X45aaabYzzz" with regexp as contents ~bar ~foo -> Printf.printf "str=%s bar=%s foo=%s\n" contents bar (match foo with Some x -> x | None -> "") | _ -> Printf.printf "Not found!\n";; It uses the "Str" library, but it may come to support PCRE as well. It requires OCaml and CamlP4 3.00 and makes use of the labels (as a side effect, when using the traditional mode, the user must supply the various labels in the "as" clause in ASCII ordering). No documentation is available yet. The syntax for the regexps is supposed to be compatible with ocamllex. There are currently pitfalls in the handling of character sets. I specifically request comments on how to make this hack better, esp. with syntactic choices. I plan to support the predefinition of regexp parts; it is not yet decided whether the nesting will be preprocessing-time or execution-time, or both. Regards, David Monniaux http://www.di.ens.fr/~monniaux Laboratoire d'informatique de l'École Normale Supérieure, Paris, France