From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 7F39DBB81 for ; Wed, 22 Dec 2004 11:37:18 +0100 (CET) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.191]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id iBMAbIhJ011643 for ; Wed, 22 Dec 2004 11:37:18 +0100 Received: from [212.227.126.160] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1Ch3ro-0003OJ-00; Wed, 22 Dec 2004 11:37:16 +0100 Received: from [80.129.112.48] (helo=gate.gerd-stolpmann.de) by mrelayng.kundenserver.de with asmtp (Exim 3.35 #1) id 1Ch3rn-0003Hq-00; Wed, 22 Dec 2004 11:37:15 +0100 Received: from ice.gerd-stolpmann.de (ice [192.168.0.13]) by gate.gerd-stolpmann.de (Postfix) with ESMTP id 73F2EC04C; Wed, 22 Dec 2004 11:37:15 +0100 (CET) Received: by ice.gerd-stolpmann.de (Postfix, from userid 1000) id 5D1FEB032; Wed, 22 Dec 2004 11:37:12 +0100 (CET) Subject: Re: [Caml-list] Str.string_match incorrect From: Gerd Stolpmann To: William Lovas Cc: caml-list@yquem.inria.fr In-Reply-To: <20041222080009.GA4501@force.stwing.upenn.edu> References: <1103687369.6979.50.camel@pelican.wigram> <20041222074455.GA81342@trout> <20041222080009.GA4501@force.stwing.upenn.edu> Content-Type: text/plain Content-Transfer-Encoding: 7bit Message-Id: <1103711830.1153.21.camel@ice.gerd-stolpmann.de> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 22 Dec 2004 11:37:11 +0100 X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:a6865a839c0178d9aa0ce41878507ea2 X-Miltered: at nez-perce with ID 41C94E5E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 gerd:01 stolpmann:01 lovas:01 wrote:01 wrote:01 corresponds:01 debatable:01 semantics:01 regexp:01 bool:01 regexp:01 bool:01 transforming:01 gerd:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.0 X-Spam-Level: On Mit, 2004-12-22 at 09:00, William Lovas wrote: > On Tue, Dec 21, 2004 at 11:44:55PM -0800, Evan Martin wrote: > > This is consistent with the docs, which say: > > [string_match r s start] tests whether the characters in s starting at > > position start match the regular expression r. > > and in general with how regular expression systems work. string_match > > corresponds to running your automaton directly and seeing whether you > > end up in an accept state, while string_partial_match effectively adds > > an extra ".*" to the beginning. > > (It's more debatable whether this makes sense.) > > I concur with your assessment, but i think you're characterization of the > semantics of string_partial_match is inaccurate: > > # Str.string_match (Str.regexp "ab") "a" 0;; > - : bool = false > # Str.string_partial_match (Str.regexp "ab") "a" 0;; > - : bool = true > # Str.string_match (Str.regexp ".*ab") "a" 0;; > - : bool = false > > ... unless of course, i've misunderstood you. I don't think there's a > simple transformation on regular expressions that allow you to emulate > string_partial_match's behavior using only string_match. (Does anyone > care to prove me wrong? :) The difference between string_match and string_partial_match is quite simple: - string_match: whether a prefix of the string matches the regexp - string_partial_match: whether the string is a prefix of another string matching the regexp On the level of the automaton, string_match is true when the automaton reaches an accept state whereas string_partial_match is true when the automaton is in a valid state at the end of the string and there is still a path to an accept state. Of course, it is easy to emulate string_partial_match when you already have the automaton - just make all states accepting that have a path to an accept state. But there is not a comparably simple rule for the regular expression. You can do an emulating transformation by generating the automaton, adding the additional accept states, and transforming the automaton back to the regexp. There is an algorithm for this, but the length of the regexp explodes (exponentially when I remember correctly). Gerd -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------