From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 1DD3DBB91 for ; Tue, 11 Jan 2005 20:58:54 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j0BJwrHD006105 for ; Tue, 11 Jan 2005 20:58:53 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA21809 for ; Tue, 11 Jan 2005 20:58:53 +0100 (MET) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.202]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j0BJwqO6016593 for ; Tue, 11 Jan 2005 20:58:53 +0100 Received: by wproxy.gmail.com with SMTP id 68so135995wri for ; Tue, 11 Jan 2005 11:58:52 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=i+ikIFT4ps4S9NBWgo0334Rn1rd5d884hdjfppRCsl7FkOclKH25jzZzbh4M5oCjkzYGEwTK3eXSEVIrjdPpjatXSTnE0H6eLRhlEO8cmKR/u19zggkrtVXVRR0KRXvLsyLNjVNNSN4VecYyM6kHctU19JAnGwvQ7etuKfBsFCM= Received: by 10.54.26.65 with SMTP id 65mr93568wrz; Tue, 11 Jan 2005 11:58:52 -0800 (PST) Received: by 10.54.5.8 with HTTP; Tue, 11 Jan 2005 11:58:52 -0800 (PST) Message-ID: <875c7e0705011111583fac3741@mail.gmail.com> Date: Tue, 11 Jan 2005 14:58:52 -0500 From: Chris King Reply-To: Chris King To: "O'Caml Mailing List" Subject: Re: [Caml-list] Thread safe Str In-Reply-To: <875c7e07050111115618692184@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <1105415669.3534.55.camel@pelican.wigram> <875c7e0705011023033fa114ef@mail.gmail.com> <1105430813.3534.116.camel@pelican.wigram> <875c7e070501111007dc3e86d@mail.gmail.com> <1105471138.2574.88.camel@pelican.wigram> <875c7e07050111115618692184@mail.gmail.com> X-Miltered: at concorde with ID 41E42FFD.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 41E42FFC.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 regexp:01 api:01 regexp:01 parsing:01 syntax:01 statically:01 syntax:01 foo:01 literals:01 substrings:01 mangling:01 arrays:01 lexer:01 lexer:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP autolearn=disabled version=3.0.0 X-Spam-Level: Forgot to CC this to the list, sorry if anyone gets a dupe.... > I'm sure you'd agree there are separable issues here: > > (1) using a string encoding of a regexp as opposed to > a lex like one -- this has nothing to do with > captures. Yes. As I stated in another e-mail in this thread, I'd love to see an API that exposes the parse tree of a regexp. > (2) Captures I'd like to add (3) Parsing vs. substitution. You can't effectively do the latter without captures (of course it can be done but it's messy). > The fact that the regexp syntax is not checked statically > isn't relevant in a dynamic language since the typing > of the rest of the program isn't either. I think we're talking about different things... I used the "s//g" syntax to represent the substitution function in whatever language is being used, not as an example of something to be compiled. (regexp "foo(.*)bar") certainly has a static type, and if it's malformed it simply raises an exception. > I'm trying to provide that in Felix. It has Python style literals, > and Python style substrings. However it is still clumbsy compared > with Perl (I guess .. I can't write Perl ..) Perl string mangling is clumsy compared with Python. The key is that Python treats strings as arrays (or lists) of characters. > > True. Performing multiple replacements on a single string with a > > regexp is retarded. But so is writing a lexer for a simple one-shot > > replacement job. > > That depends on how hard it is to write a lexer. Specifically, a lexer whose input and output are both strings and which performs substitution. Not pretty, unless captures are provided. > At present, Felix regexps have to be constants. It will be possible > in the bootstrapped compiler to generate them as text, and then > compile and link (i.e. there will be a function sort of like 'eval'). That's not acceptable for, say, an incremental search, though, where a new regexp must be generated on each keystroke. (Yes I know regexps aren't the best way to go about that but I'm sure there are better examples.) > > My point is just that regexps are useful enough to co-exist with lexers. > > But they're the same thing. Lexer provide regular definitions, > which is just a way of naming regexps, and reusing the regexps > by providing the name. Not in the form of lex/flex/ocamllex. Yes, lexer token definitions are equivalent to regexps, but everything else about them is different (specifically, lexers are event-based, and don't provied captures). I'd love to see a regexp engine allowing dynamic creation of token-based regexps, complete with captures. It could easily serve as both the base of a lexer and a substitution engine. Heck, that sounds like a fun project... what I am doing this weekend? :P