From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA00702; Thu, 27 May 2004 19:29:07 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA00824 for ; Thu, 27 May 2004 19:29:05 +0200 (MET DST) Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i4RHT4SH020324 for ; Thu, 27 May 2004 19:29:04 +0200 Received: (qmail 6590 invoked from network); 27 May 2004 17:29:03 -0000 Received: from shell2.speakeasy.net ([69.17.110.71]) (envelope-sender ) by mail2.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 27 May 2004 17:29:03 -0000 Date: Thu, 27 May 2004 10:29:03 -0700 (PDT) From: brogoff To: Ocaml Subject: Re: [Caml-list] unix.chop_extension In-Reply-To: <20040526164308.GA22145@redhat.com> Message-ID: References: <1085429093.6065.336.camel@pelican.wigram> <20040526110508.A17806@pauillac.inria.fr> <40B47D9D.1050007@baretta.com> <20040526164308.GA22145@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Miltered: at concorde with ID 40B62560.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; brogoff:01 brogoff:01 caml-list:01 chop:01 2004:99 baretta:01 api:01 pcre:01 pcre:01 stateless:01 erwig's:01 api:01 fpls:01 haskell:01 egregious:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Wed, 26 May 2004, Richard Jones wrote: > On Wed, May 26, 2004 at 01:21:01PM +0200, Alex Baretta wrote: > > Actually, I'm a happy user of Str, but I find the absence in Ocaml of a > > functional "canonical" regexp feature striking. > > I'm fascinated to know what this "functional" API would look like. I > use Pcre and it doesn't appear to have any global internal state > AFAICS ... As you note, the interface to Pcre, unlike Str, appears stateless. If you examine some functional data structure libraries, like Martin Erwig's FGL, or Gerard Huet's ZEN, you notice that they use the idea of "location + context" (similar to the C++ STL, where the locations are iterators) to navigate the structure. For string processing, the context is the entire string, and the location is the pair of endpoints of the current string. So it seems that the right way to a functional API for string processing would be in terms of substrings, { base : string; left : int; right : int } or just string * int * int, which is what is done in the SML Basis. The idea extends naturally to regexp scanning, and it's easy to see how to modify the Str interface to use substrings, eliminating Not_found exceptions by considering an empty substring to be a failed match. Of course, substrings make sense when the underlying string is array like, not in FPLs like Haskell which use lists of chars as strings, but I consider that an egregious error of Haskell. -- Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners