From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id E77F8BB81 for ; Sat, 25 Dec 2004 05:24:19 +0100 (CET) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id iBP4OHIx008773 for ; Sat, 25 Dec 2004 05:24:19 +0100 Received: from [192.168.1.200] (ppp194-89.lns1.syd2.internode.on.net [203.122.194.89]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id iBP4O5RF078445; Sat, 25 Dec 2004 14:54:05 +1030 (CST) Subject: Re: [Caml-list] Str.string_match incorrect From: skaller Reply-To: skaller@users.sourceforge.net To: "Christopher A. Watford" Cc: David Brown , William Lovas , caml-list@yquem.inria.fr In-Reply-To: <8008871f041224190768113639@mail.gmail.com> References: <1103687369.6979.50.camel@pelican.wigram> <20041222074455.GA81342@trout> <20041222080009.GA4501@force.stwing.upenn.edu> <1103731044.6979.109.camel@pelican.wigram> <20041222165846.GA30503@old.davidb.org> <1103769192.3443.51.camel@pelican.wigram> <8008871f04122409405d1b9679@mail.gmail.com> <1103936225.6201.243.camel@pelican.wigram> <8008871f041224190768113639@mail.gmail.com> Content-Type: text/plain Organization: Message-Id: <1103948644.6201.336.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 25 Dec 2004 15:24:04 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 41CCEB71.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 sourceforge:01 wrote:01 newlines:01 sed:01 bug:01 mismatch:01 imho:01 ocaml:01 ocaml:01 pcre:01 emacs:01 sed:01 semantics:01 newlines:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.0 X-Spam-Level: On Sat, 2004-12-25 at 14:07, Christopher A. Watford wrote: > >From the documentation, as you have pointed out (now I see what you > mean), it does not make this constraint obvious. It appears that > string_match's documentation should include that an implicit ^ is > added. string_partial_match makes this clear however. Except even that isn't quite correct -- ^ matches a newline as well as start of string. When matching on a 'string' the special handling of newline is out of place -- newlines are ordinary characters. When matching on a whole file, as in awk/sed/perl style of tools, the line orientation makes some sense. In any case, the bug is a mismatch, IMHO, between what the function does and what the documentation says it does. I had expected string_match to match a whole string or fail, because that's what the documentation says it does. However there are two compatibility arguments for changing the documentation instead -- non Ocaml user expectations (clearly indicated in replies to my post), and breaking of existing Ocaml code that relies on the actual behaviour not the documented behaviour, misreading it as others seem to have done, and assuming PCRE/Emacs/awk/sed ... or whatever .. semantics. If that path is chosen I think it would be nice to provide a function that actually does what the documentation *originally* specified (that is, what it says right now): match a whole string. This is because the obvious hack: ^r$ does not in fact work if the string contains newlines -- according to the documentation anyhow :) I'm not a heavy user of Str -- I would never use it in a serious application. However I used it in the program I wrote to build test harness for ExtLib (there is now an extlib-test module in CVS which tests ExtLib) and it didn't work. I did indeed fix the problem with ^r$, because I'm checking filenames which seem unlikely to have any newlines in them. Anyhow .. it's up to Xavier: I posted the original note into the bugtracker too :) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net