* [Caml-list] ocamllex problem
@ 2005-08-04 23:12 Jonathan Roewen
2005-08-04 23:53 ` Jonathan Roewen
0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Roewen @ 2005-08-04 23:12 UTC (permalink / raw)
To: caml-list
Hi,
Here's my .mll file, to match IRC strings. The problem is that it's
including spaces, which I assume it shouldn't.
let letter = [^' ']
rule token = parse
| ':'((letter|' ')* as s) { STRING s }
| letter+ as s { STRING s }
| [' ']+ { token lexbuf }
| eof { EOL }
(BTW, this is for matching rule 2). Maybe I'm just retarded, but this
should work, correct?
Jonathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-04 23:12 [Caml-list] ocamllex problem Jonathan Roewen
@ 2005-08-04 23:53 ` Jonathan Roewen
2005-08-05 1:03 ` skaller
0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Roewen @ 2005-08-04 23:53 UTC (permalink / raw)
To: caml-list
Yes, I'm retarded. Ignore me ;-)
On 8/5/05, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> Hi,
>
> Here's my .mll file, to match IRC strings. The problem is that it's
> including spaces, which I assume it shouldn't.
>
> let letter = [^' ']
>
> rule token = parse
> | ':'((letter|' ')* as s) { STRING s }
> | letter+ as s { STRING s }
> | [' ']+ { token lexbuf }
> | eof { EOL }
>
> (BTW, this is for matching rule 2). Maybe I'm just retarded, but this
> should work, correct?
>
> Jonathan
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-04 23:53 ` Jonathan Roewen
@ 2005-08-05 1:03 ` skaller
2005-08-05 5:11 ` Alain Frisch
2005-08-05 6:15 ` james woodyatt
0 siblings, 2 replies; 10+ messages in thread
From: skaller @ 2005-08-05 1:03 UTC (permalink / raw)
To: Jonathan Roewen; +Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 714 bytes --]
On Fri, 2005-08-05 at 11:53 +1200, Jonathan Roewen wrote:
> Yes, I'm retarded. Ignore me ;-)
No way .. you picked something I didn't know:
> > | ':'((letter|' ')* as s) { STRING s }
I check the manual .. yup, you can match
subgroups like that..
How does that work??? The algorithm for handling
this for a DFA is non-trivial. Any pointers to
the algorithm used?
Alain Frisch pointed me at some nasty papers on
this, one with a regexp -> NFA conversion and the
other with a NFA-> DFA conversion, but I couldn't
figure out how to do the direct regexp->DFA conversion,
I'd sure like to find an algorithm for that..
--
John Skaller <skaller at users dot sourceforge dot net>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 1:03 ` skaller
@ 2005-08-05 5:11 ` Alain Frisch
2005-08-05 6:15 ` james woodyatt
1 sibling, 0 replies; 10+ messages in thread
From: Alain Frisch @ 2005-08-05 5:11 UTC (permalink / raw)
To: skaller; +Cc: caml-list
skaller wrote:
> How does that work??? The algorithm for handling
> this for a DFA is non-trivial. Any pointers to
> the algorithm used?
As you can find in the source code of ocamllex:
(* To generate directly a NFA from a regular expression.
Confer Aho-Sethi-Ullman, dragon book, chap. 3
Extension to tagged automata.
Confer
Ville Larikari
``NFAs with Tagged Transitions, their Conversion to Deterministic
Automata and Application to Regular Expressions''.
Symposium on String Processing and Information Retrieval (SPIRE
2000),
http://kouli.iki.fi/~vlaurika/spire2000-tnfa.ps
(See also)
http://kouli.iki.fi/~vlaurika/regex-submatch.ps.gz
*)
-- Alain
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 1:03 ` skaller
2005-08-05 5:11 ` Alain Frisch
@ 2005-08-05 6:15 ` james woodyatt
2005-08-05 8:35 ` skaller
2005-08-05 9:15 ` Berke Durak
1 sibling, 2 replies; 10+ messages in thread
From: james woodyatt @ 2005-08-05 6:15 UTC (permalink / raw)
To: Ocaml Trade
On 04 Aug 2005, at 18:03, skaller wrote:
>
> Alain Frisch pointed me at some nasty papers on this, one with a
> regexp -> NFA conversion and the other with a NFA-> DFA conversion,
> but I couldn't figure out how to do the direct regexp->DFA
> conversion, I'd sure like to find an algorithm for that..
In my OCaml NAE Core Foundation, there is a something you may find
interesting. See the [Cf_lex] module and its subordinate [Cf_dfa].
Since it isn't trying to be a multi-stage programming tool like
[ocamllex], it produces a parser monad that executes a Lazy-DFA,
instead of a fully space-time optimized DFA. At some point, I may
implement a [study] function that fully evaluates the Lazy-DFA and
optimizes it, but I don't yet see a compelling need for that.
One thing: the pattern [':'((letter|' ')* as s)] is interesting.
You're definitely right that something non-trivial is happening
inside the DFA. My [Cf_dfa] module does not keep a stack of
backtracking sequences because I did something else to resolve the
problem. Look at the ( $@ ) operators, which allow you to use a
parser monad on the recognized input sequence to obtain the result of
a lexical rule. Using this, you can implement something like the
feature you're interested in by defining a nested hierarchy of parsers.
I know. This is probably not what you're looking for. To get what
you're looking for, I'd have to extend [Cf_dfa] to handle marker
nodes in the NFA. I thought that would be more appropriate for
[ocamllex] and similar tools, so I didn't do it. Nice to see
[ocamllex] did.
--
j h woodyatt <jhw@wetware.com>
that's my village calling... no doubt, they want their idiot back.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 6:15 ` james woodyatt
@ 2005-08-05 8:35 ` skaller
2005-08-05 9:15 ` Berke Durak
1 sibling, 0 replies; 10+ messages in thread
From: skaller @ 2005-08-05 8:35 UTC (permalink / raw)
To: james woodyatt; +Cc: Ocaml Trade
[-- Attachment #1: Type: text/plain, Size: 729 bytes --]
On Thu, 2005-08-04 at 23:15 -0700, james woodyatt wrote:
> On 04 Aug 2005, at 18:03, skaller wrote:
> I know. This is probably not what you're looking for.
Not directly: I have to generate C++ code from Ocaml.
At present, this is easy -- the Ocaml is a transliteration
of the Dragon book algorithm, and the C++ is two lines
of matrix lookup.
I would like to adapt the algorithm, which is regexp -> DFA
without constructing an NFA, to add enough information
so the C++ automata can capture groups.
Whilst a lazy automaton is nice in Ocaml, it's a bit
more challenging when the compile and run time
have to be split between Ocaml and C++ ;(
--
John Skaller <skaller at users dot sourceforge dot net>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 6:15 ` james woodyatt
2005-08-05 8:35 ` skaller
@ 2005-08-05 9:15 ` Berke Durak
2005-08-05 11:05 ` skaller
1 sibling, 1 reply; 10+ messages in thread
From: Berke Durak @ 2005-08-05 9:15 UTC (permalink / raw)
To: james woodyatt; +Cc: caml-list
On Thu, Aug 04, 2005 at 11:15:48PM -0700, james woodyatt wrote:
> One thing: the pattern [':'((letter|' ')* as s)] is interesting.
> You're definitely right that something non-trivial is happening
> inside the DFA. My [Cf_dfa] module does not keep a stack of
> backtracking sequences because I did something else to resolve the
> problem. Look at the ( $@ ) operators, which allow you to use a
> parser monad on the recognized input sequence to obtain the result of
> a lexical rule. Using this, you can implement something like the
> feature you're interested in by defining a nested hierarchy of parsers.
You may also wish to have a look at :
Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf
--
Berke Durak
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 9:15 ` Berke Durak
@ 2005-08-05 11:05 ` skaller
2005-08-05 12:21 ` Jonathan Bryant
0 siblings, 1 reply; 10+ messages in thread
From: skaller @ 2005-08-05 11:05 UTC (permalink / raw)
To: Berke Durak; +Cc: james woodyatt, caml-list
[-- Attachment #1: Type: text/plain, Size: 394 bytes --]
On Fri, 2005-08-05 at 11:15 +0200, Berke Durak wrote:
> You may also wish to have a look at :
>
> Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
> Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
> http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf
Ah, that is interesting, thanks!
--
John Skaller <skaller at users dot sourceforge dot net>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 11:05 ` skaller
@ 2005-08-05 12:21 ` Jonathan Bryant
2005-08-05 12:39 ` David MENTRE
0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Bryant @ 2005-08-05 12:21 UTC (permalink / raw)
To: skaller, caml-list
Maybe also "Modern Compiler Implementation In Java". It's in Java, but
the examples are written in a functional style (as far as is possible).
Unfortunately, I can't remember the author's name off hand, but it is
published by Cambridge Press...
--Jonathan
On Fri, 2005-08-05 at 21:05 +1000, skaller wrote:
> On Fri, 2005-08-05 at 11:15 +0200, Berke Durak wrote:
>
> > You may also wish to have a look at :
> >
> > Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
> > Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
> > http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf
>
> Ah, that is interesting, thanks!
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
--
*=========================*
|Jonathan Bryant |
|Valdosta State University|
|Information Technology |
|System Operations |
|-------------------------|
|jtbryant@valdosta.edu |
|x6358 |
*=========================*
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] ocamllex problem
2005-08-05 12:21 ` Jonathan Bryant
@ 2005-08-05 12:39 ` David MENTRE
0 siblings, 0 replies; 10+ messages in thread
From: David MENTRE @ 2005-08-05 12:39 UTC (permalink / raw)
To: jtbryant; +Cc: skaller, caml-list
Hello,
2005/8/5, Jonathan Bryant <jtbryant@valdosta.edu>:
> Maybe also "Modern Compiler Implementation In Java". It's in Java, but
> the examples are written in a functional style (as far as is possible).
> Unfortunately, I can't remember the author's name off hand, but it is
> published by Cambridge Press...
The book exists in 3 languages, Java, ML and C :
http://www.cs.princeton.edu/~appel/modern/
Yours,
d.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-08-05 12:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-04 23:12 [Caml-list] ocamllex problem Jonathan Roewen
2005-08-04 23:53 ` Jonathan Roewen
2005-08-05 1:03 ` skaller
2005-08-05 5:11 ` Alain Frisch
2005-08-05 6:15 ` james woodyatt
2005-08-05 8:35 ` skaller
2005-08-05 9:15 ` Berke Durak
2005-08-05 11:05 ` skaller
2005-08-05 12:21 ` Jonathan Bryant
2005-08-05 12:39 ` David MENTRE
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox