Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Gerd Stolpmann <info@gerd-stolpmann.de>
Cc: Chris King <colanderman@gmail.com>,
	"O'Caml Mailing List" <caml-list@inria.fr>
Subject: Re: [Caml-list] Thread safe Str
Date: 12 Jan 2005 04:55:26 +1100	[thread overview]
Message-ID: <1105466124.2574.48.camel@pelican.wigram> (raw)
In-Reply-To: <1105445337.15581.82.camel@ice.gerd-stolpmann.de>

On Tue, 2005-01-11 at 23:08, Gerd Stolpmann wrote:


> Capturing just for the purpose of string extraction is not problematic.

Unfortunately you're wrong. I had in fact hoped this was not
the case, but having read a couple of paper on it,
I now know that, unfortunately, they are. (See below).

Alain Frisch is one of the leaders in this area, perhaps
he can explain it better. You can get Cardelli and Frisch
paper here:

http://felix.sf.net/papers/greedy.pdf

another by Ville Laurikari here:

http://felix.sf.net/papers/spire2000-tnfa.ps

and his Master thesis on the topic here:

http://felix.sf.net/papers/regex-submatch.ps

[and if you can figure out how to actually build
a tagged DFA after reading any of that please let
me know .. ]

Frisch's algorithm used in CDuce works with a forward
pass that ignores captures, but records the automaton states,
and then a backwards pass the extracts the information
(CDuce actually builds trees).

Unfortunately this has a problem: it requires a
bidirectional iterator, whereas a DFA only requires
an input iterator. (NFA's require forward iterators)


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




  reply	other threads:[~2005-01-11 17:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-09 19:30 Christophe TROESTLER
2005-01-09 20:57 ` [Caml-list] " Gerd Stolpmann
2005-01-10  9:57 ` Alex Baretta
2005-01-10 15:49 ` Xavier Leroy
2005-01-10 16:39   ` Richard Jones
2005-01-10 18:21   ` Eric C. Cooper
2005-01-10 20:25   ` Martin Jambon
2005-01-11  3:54     ` skaller
2005-01-11  7:03       ` Chris King
2005-01-11  8:06         ` skaller
2005-01-11 12:08           ` Gerd Stolpmann
2005-01-11 17:55             ` skaller [this message]
2005-01-11 20:30               ` Gerd Stolpmann
2005-01-12  7:42                 ` skaller
     [not found]           ` <875c7e070501111007dc3e86d@mail.gmail.com>
     [not found]             ` <1105471138.2574.88.camel@pelican.wigram>
     [not found]               ` <875c7e07050111115618692184@mail.gmail.com>
2005-01-11 19:58                 ` Chris King
2005-01-11 20:53       ` Martin Jambon
2005-01-12  7:59         ` skaller
2005-01-12 20:12           ` Martin Jambon
2005-01-11  6:41   ` Chris King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1105466124.2574.48.camel@pelican.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@inria.fr \
    --cc=colanderman@gmail.com \
    --cc=info@gerd-stolpmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox