From: skaller <skaller@users.sourceforge.net>
To: Gerd Stolpmann <info@gerd-stolpmann.de>
Cc: Chris King <colanderman@gmail.com>,
"O'Caml Mailing List" <caml-list@inria.fr>
Subject: Re: [Caml-list] Thread safe Str
Date: 12 Jan 2005 04:55:26 +1100 [thread overview]
Message-ID: <1105466124.2574.48.camel@pelican.wigram> (raw)
In-Reply-To: <1105445337.15581.82.camel@ice.gerd-stolpmann.de>
On Tue, 2005-01-11 at 23:08, Gerd Stolpmann wrote:
> Capturing just for the purpose of string extraction is not problematic.
Unfortunately you're wrong. I had in fact hoped this was not
the case, but having read a couple of paper on it,
I now know that, unfortunately, they are. (See below).
Alain Frisch is one of the leaders in this area, perhaps
he can explain it better. You can get Cardelli and Frisch
paper here:
http://felix.sf.net/papers/greedy.pdf
another by Ville Laurikari here:
http://felix.sf.net/papers/spire2000-tnfa.ps
and his Master thesis on the topic here:
http://felix.sf.net/papers/regex-submatch.ps
[and if you can figure out how to actually build
a tagged DFA after reading any of that please let
me know .. ]
Frisch's algorithm used in CDuce works with a forward
pass that ignores captures, but records the automaton states,
and then a backwards pass the extracts the information
(CDuce actually builds trees).
Unfortunately this has a problem: it requires a
bidirectional iterator, whereas a DFA only requires
an input iterator. (NFA's require forward iterators)
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
next prev parent reply other threads:[~2005-01-11 17:55 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-09 19:30 Christophe TROESTLER
2005-01-09 20:57 ` [Caml-list] " Gerd Stolpmann
2005-01-10 9:57 ` Alex Baretta
2005-01-10 15:49 ` Xavier Leroy
2005-01-10 16:39 ` Richard Jones
2005-01-10 18:21 ` Eric C. Cooper
2005-01-10 20:25 ` Martin Jambon
2005-01-11 3:54 ` skaller
2005-01-11 7:03 ` Chris King
2005-01-11 8:06 ` skaller
2005-01-11 12:08 ` Gerd Stolpmann
2005-01-11 17:55 ` skaller [this message]
2005-01-11 20:30 ` Gerd Stolpmann
2005-01-12 7:42 ` skaller
[not found] ` <875c7e070501111007dc3e86d@mail.gmail.com>
[not found] ` <1105471138.2574.88.camel@pelican.wigram>
[not found] ` <875c7e07050111115618692184@mail.gmail.com>
2005-01-11 19:58 ` Chris King
2005-01-11 20:53 ` Martin Jambon
2005-01-12 7:59 ` skaller
2005-01-12 20:12 ` Martin Jambon
2005-01-11 6:41 ` Chris King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1105466124.2574.48.camel@pelican.wigram \
--to=skaller@users.sourceforge.net \
--cc=caml-list@inria.fr \
--cc=colanderman@gmail.com \
--cc=info@gerd-stolpmann.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox