From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA23989 for caml-redistribution; Wed, 26 Nov 1997 19:24:57 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA23650 for ; Wed, 26 Nov 1997 19:01:06 +0100 (MET) Received: from cri.ens-lyon.fr (cri.ens-lyon.fr [140.77.1.32]) by nez-perce.inria.fr (8.8.7/8.8.5) with ESMTP id TAA18796 for ; Wed, 26 Nov 1997 19:01:04 +0100 (MET) Received: from gerland (gerland [140.77.191.105]) by cri.ens-lyon.fr (8.8.5/8.8.1) with SMTP id TAA15359 for ; Wed, 26 Nov 1997 19:01:02 +0100 (MET) Date: Wed, 26 Nov 1997 19:00:58 +0100 (MET) From: David Monniaux X-Sender: dmonniau@gerland To: Caml-list Subject: hacks using camlp4 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: weis [ J'ai voulu mettre en place des expressions regulieres avec une syntaxe plus agreable et bien typees, en utilisant camlp4. J'ai quelques resultats mais aussi quelques problemes... ] Hi, I lately checked the camlp4 preprocessor. I think this tool may have lots of useful applications, since it allows especially custom syntaxes for the input of certain kind of objects, in a more programmer-friendly fashion than just inputting raw data structures into the source code. This also lessens the pressure on putting library-dependent features into the language itself, as was done for the lists, strings and more stringently with the format strings for the Printf.*printf functions. I thus tried to give a type-safe interface to regular expressions. My small hack (available on request) adds a syntax to do matching on a regular expression and return components in a friendly and type-safe way, and also a replace function. [example: let x = "meuh je (suis) un [beau] chaton {suis}" =~ item => (String.uppercase) /{item: ['(' '['] ['a'-'z']+ [']' ')']}/ in Printf.printf "%s\n" x;; will uppercase the substrings enclosed in () or []. ] To give Perl-like efficiency, I precompile regexps (let-bindings for precompiled regexps are prepended to the output); that allows efficient matching even in the middle of a loop. However, this precludes using some variable parts in the regexps (I for now only allow constant regexps); the ideal solution would be to compile the regexps at a step in the program where all the involved values can be computed. This alas involves some semantic analysis. I therefore have three problems: 1. Precompiled expressions. More generally, what would be needed would be some construction to evaluate an expression as soon as it is possible. 2. It would be nice if regexp precompilation could be done at compile or preprocessing time (I was thinking of marshalling the precompiled regexp, but I fear some C-library private data structure inside the regexp type). 3. The stupid emacs regexp syntax (my code is for now not really working if you put ^ or ] in ranges). The third point is only a matter of library, but the others deserve some thinking, I should say...