From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 04E94BB91 for ; Wed, 12 Jan 2005 08:42:26 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j0C7gPpG015649 for ; Wed, 12 Jan 2005 08:42:25 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA09207 for ; Wed, 12 Jan 2005 08:42:24 +0100 (MET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j0C7gMVF015641 for ; Wed, 12 Jan 2005 08:42:23 +0100 Received: from [192.168.1.200] (ppp201-179.lns1.syd3.internode.on.net [203.122.201.179]) by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j0C7gCcG065300; Wed, 12 Jan 2005 18:12:13 +1030 (CST) Subject: Re: [Caml-list] Thread safe Str From: skaller Reply-To: skaller@users.sourceforge.net To: Gerd Stolpmann Cc: Chris King , "O'Caml Mailing List" In-Reply-To: <1105475416.15581.105.camel@ice.gerd-stolpmann.de> References: <1105415669.3534.55.camel@pelican.wigram> <875c7e0705011023033fa114ef@mail.gmail.com> <1105430813.3534.116.camel@pelican.wigram> <1105445337.15581.82.camel@ice.gerd-stolpmann.de> <1105466124.2574.48.camel@pelican.wigram> <1105475416.15581.105.camel@ice.gerd-stolpmann.de> Content-Type: text/plain Message-Id: <1105515731.2574.213.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 12 Jan 2005 18:42:12 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 41E4D4E1.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 41E4D4DE.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 sourceforge:01 gerd:01 stolpmann:01 wrote:01 backtracking:01 backtracking:01 recursion:01 frisch:01 cardelli:01 pcre:01 hacked:01 regexps:01 iterator:01 dfa:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.0 X-Spam-Level: On Wed, 2005-01-12 at 07:30, Gerd Stolpmann wrote: > It's a matter of taste. Granted, you cannot define capturing directly > for an automaton, only by a possibly poor backtracking algorithm. (This > means the approach is "definition by algorithm", not really the best way > to do it.) It's worse than that .. > Which leads to a more serious problem: One must know the algorithm of > the implementation to avoid running into a hard backtracking case (which > can be exponential). .. it's worse .. some cases could be infinite recursion: see Frisch & Cardelli. It is known PCRE has this problem. It was hacked to fix one of the common cases. The hack isn't general. > There are a lot of tutorials explaining good and > bad regexps. Which are worthless junk if you're generating them.. The point is -- this is folklore not mathematics. > The lex-type scanners do not have this problem (but you may > run into space problems instead because the automaton becomes large). Indeed, but they also have another problem: they don't support captures at all. > > > > Unfortunately this has a problem: it requires a > > bidirectional iterator, whereas a DFA only requires > > an input iterator. (NFA's require forward iterators) > > For string regexps this is not a big problem, the string is in memory. > For XML, this is more a problem, especially when the XML document is > only available event by event (processing large documents). Hehe .. but the main use of the algorithm is in the specialist XML processing language XDuce... -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net