From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA21748; Mon, 15 Jul 2002 20:41:41 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA21902 for ; Mon, 15 Jul 2002 20:41:40 +0200 (MET DST) Received: from hank.lickey.com (hank.lickey.com [64.81.100.235]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g6FIfcj07610 for ; Mon, 15 Jul 2002 20:41:39 +0200 (MET DST) Received: from squeaker.lickey.com (squeaker.lickey.com [192.168.100.10]) by hank.lickey.com (Postfix) with ESMTP id CB0BBEE40 for ; Mon, 15 Jul 2002 12:41:37 -0600 (MDT) Received: from localhost (localhost [127.0.0.1]) by squeaker.lickey.com (Postfix) with ESMTP id A29B1BF1B for ; Mon, 15 Jul 2002 12:41:37 -0600 (MDT) Received: from naz.lickey.com (naz.lickey.com [192.168.100.246]) by squeaker.lickey.com (Postfix) with ESMTP id 1CB7DBF17 for ; Mon, 15 Jul 2002 12:41:37 -0600 (MDT) Received: by naz.lickey.com (Postfix, from userid 1000) id 8D1CB27E1A; Mon, 15 Jul 2002 12:41:36 -0600 (MDT) To: caml-list@inria.fr Subject: [Caml-list] Re: Regular expression library: a poll on features References: <20020705161302.C20462@pauillac.inria.fr> <20020715175244.B15691@pauillac.inria.fr> From: Matt Armstrong Date: Mon, 15 Jul 2002 12:41:36 -0600 In-Reply-To: <20020715175244.B15691@pauillac.inria.fr> (Xavier Leroy's message of "Mon, 15 Jul 2002 17:52:44 +0200") Message-ID: <87u1n0g7fj.fsf@naz.lickey.com> User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386-debian-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: by AMaViS snapshot-20020300 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Xavier Leroy writes: [...] > Feature 2: partial string matching as per Str.string_partial_match, i.e. > the ability to recognize that a string is a prefix of a string that > match a regexp. > > has already used 0 > could use in some cases 6 > no use 8 [...] > Feature 2 is unusual and I haven't heard from anyone that uses it > :-) I got two replies suggesting one plausible scenario where > partial matching could come handy: find delimiters in a piece of > text that is being read block by block. However, I'm not sure > Str.string_partial_match is adequate here, it looks like a "search > forward for a partial match" operation is needed, which Str doesn't > provide... This is how a MIME message parser I wrote worked (written in a scripting language that made byte-by-byte string comparisons more costly than regexps). The parser read in the message chunk by chunk. I had a list of regexps representing the current set of MIME boundaries, and I was interested if the last N bytes of the current chunk ended with a (possibly partial) match of each regexp. If there was a match and it wasn't complete, you have to deal with a MIME boundary that might cross a chunk boundary. > It was also suggested to me that the effect of partial matching > against a regexp R can be achieved by exact matching against a > regexp R' derived from R. This is true for "textbook regexps", > e.g. if R is "ab*c", then R' would be > "epsilon|a(epsilon|b*(epsilon|c))", but doesn't work for more > complex regexps languages, especially if back-references are > supported. (Consider R = "(a+)\1".) And in the MIME parser, this is what I did -- since the regexps were simple. In Ocaml, I'm not sure I would use regexps for this at all since (I assume) comparing strings "by hand" would be fast. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners