* Question re: camlp4 parser
@ 2005-07-26 0:11 Paul Snively
2005-07-26 1:17 ` [Caml-list] " Stephane Glondu
0 siblings, 1 reply; 6+ messages in thread
From: Paul Snively @ 2005-07-26 0:11 UTC (permalink / raw)
To: caml-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello everyone,
I'm beginning to explore some tasks in earnest. One of them is
writing a simple .ini file parser. This seems like something that
would fall easily within the LL(1) capabilities of the camlp4 parser
keyword, but I'm having a bit of trouble remembering how to structure
this.
For example, I'd like a parser that matches one or more printable
ASCII characters. Something that looks like:
let rec printable = parser [< '' '..'~'; x = printable >] -> x
This, of course, has two obvious problems:
1) On test data such as Stream.of_string "Test!\013" it raises
Stream.Failure, no doubt because it has found the \013 which doesn't
match the character range, i.e. it is, of course, not doing lookahead.
2) Even if that weren't the case, the resulting "x" would be missing
the first matching character. What I really need is the accumulation
of all of the characters.
It's just been too long since I've had to do LL(1), I think. I'm sure
I'm overlooking something obvious. Or do I just need to go ahead and
use ulex, even though I can't use it from the toplevel, which really
annoys me?
Many thanks and best regards,
Paul Snively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
iEYEARECAAYFAkLlf6wACgkQO3fYpochAqKU8wCcDYG8Z6ndVosBLI3tE3PZH2RM
n6YAoPjxNokFagTPoqI3Flnd0PbM0ESb
=BSi3
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Question re: camlp4 parser
2005-07-26 0:11 Question re: camlp4 parser Paul Snively
@ 2005-07-26 1:17 ` Stephane Glondu
2005-07-26 16:43 ` Paul Snively
0 siblings, 1 reply; 6+ messages in thread
From: Stephane Glondu @ 2005-07-26 1:17 UTC (permalink / raw)
To: Paul Snively; +Cc: caml-list
Paul Snively wrote:
> For example, I'd like a parser that matches one or more printable ASCII
> characters. Something that looks like:
>
> let rec printable = parser [< '' '..'~'; x = printable >] -> x
The inferred type should have given you a warning:
--> val printable : char Stream.t -> 'a = <fun>
In other word, your function never returns a correct value.
Try this:
let printable s =
let buf = Buffer.create 100 in
let rec aux = parser
[< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
| [< >] -> Buffer.contents buf
in aux s ;;
--> val printable : char Stream.t -> string = <fun>
printable (Stream.of_string "Test!\013") ;;
--> - : string = "Test!"
Notice that you cannot remove the occurrences of "s" (even though it
would have the same type) if you are planning to use this function
several times.
> Many thanks and best regards,
You're welcome.
--
Stephane Glondu.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Question re: camlp4 parser
2005-07-26 1:17 ` [Caml-list] " Stephane Glondu
@ 2005-07-26 16:43 ` Paul Snively
2005-07-26 17:05 ` Stephane Glondu
2005-07-27 7:04 ` Virgile Prevosto
0 siblings, 2 replies; 6+ messages in thread
From: Paul Snively @ 2005-07-26 16:43 UTC (permalink / raw)
To: Stephane Glondu; +Cc: caml-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello, Stephane!
On Jul 25, 2005, at 6:17 PM, Stephane Glondu wrote:
>
> The inferred type should have given you a warning:
> --> val printable : char Stream.t -> 'a = <fun>
>
> In other word, your function never returns a correct value.
>
Excellent point.
> Try this:
>
> let printable s =
> let buf = Buffer.create 100 in
> let rec aux = parser
> [< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
> | [< >] -> Buffer.contents buf
> in aux s ;;
> --> val printable : char Stream.t -> string = <fun>
>
> printable (Stream.of_string "Test!\013") ;;
> --> - : string = "Test!"
>
> Notice that you cannot remove the occurrences of "s" (even though it
> would have the same type) if you are planning to use this function
> several times.
>
Thanks, this is exactly the kind of thing I was hoping for! So the
key points are:
1) Use the | and an empty alternative pattern to capture the "no more
matches" case.
2) Use "as" and take advantage of expression sequencing to accumulate
the matches into a variable (Buffer, in this case).
That makes perfect sense and now seems obvious. :-)
One hopefully final question: is there a convenient shorthand for
saying something like "all printable characters except '=' or '['?" I
assume not--that is, we have ranges (' '..'~') or we have variants
('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by
Spirit in C++, and its notion of "character sets" and operations on
them, so I can say, e.g. "print_p - '='" that that will match all
printable characters other than '='.
>
>> Many thanks and best regards,
>>
>
> You're welcome.
>
Thanks again,
> --
>
> Stephane Glondu.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
Paul Snively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
iEYEARECAAYFAkLmaEAACgkQO3fYpochAqIc5QCeOaHzKj+bTBOObRMisOSzdyO7
RrkAoKkWokql0JuuFvLUeelr5NgTNsgg
=IFMX
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Question re: camlp4 parser
2005-07-26 16:43 ` Paul Snively
@ 2005-07-26 17:05 ` Stephane Glondu
2005-07-27 7:04 ` Virgile Prevosto
1 sibling, 0 replies; 6+ messages in thread
From: Stephane Glondu @ 2005-07-26 17:05 UTC (permalink / raw)
To: caml-list; +Cc: Paul Snively
On Tuesday 26 July 2005 09:43, Paul Snively wrote:
> One hopefully final question: is there a convenient shorthand for
> saying something like "all printable characters except '=' or '['?" I
> assume not--that is, we have ranges (' '..'~') or we have variants
> ('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by
> Spirit in C++, and its notion of "character sets" and operations on
> them, so I can say, e.g. "print_p - '='" that that will match all
> printable characters other than '='.
I don't know whether there is a way to do this directly. You can split your
range so that it avoids '=' and '[', or do something like this:
let printable s =
let buf = Buffer.create 100 in
let rec aux = parser
[< ' ('=' | '[') >] -> Buffer.contents buf
| [< '' '..'~' as c; x = (Buffer.add_char buf c; aux) >] -> x
| [< >] -> Buffer.contents buf
in aux s ;;
printable (Stream.of_string "path=/usr/src") ;;
--> - : string = "path"
Bear in mind that the '=' or '?' will be discarded by the parser. If you
don't want so, you can use Stream.peek (but it's much more annoying).
--
Stephane Glondu.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Question re: camlp4 parser
2005-07-26 16:43 ` Paul Snively
2005-07-26 17:05 ` Stephane Glondu
@ 2005-07-27 7:04 ` Virgile Prevosto
2005-07-28 1:27 ` Paul Snively
1 sibling, 1 reply; 6+ messages in thread
From: Virgile Prevosto @ 2005-07-27 7:04 UTC (permalink / raw)
To: caml-list
2005/7/26, Paul Snively <psnively@mac.com>:
> One hopefully final question: is there a convenient shorthand for
> saying something like "all printable characters except '=' or '['?" I
> assume not--that is, we have ranges (' '..'~') or we have variants
> ('A' | 'B' | 'C'...) and that's it. I'm somewhat spoiled, I think, by
> Spirit in C++, and its notion of "character sets" and operations on
> them, so I can say, e.g. "print_p - '='" that that will match all
> printable characters other than '='.
>
As any other pattern, stream patterns can be refined with a 'when' condition:
let printable s =
let buf = Buffer.create 100 in
let rec aux = parser
| [< '' '..'~' as c when c <> '=' && c <> '[';
x = (Buffer.add_char buf c; aux) >] -> x
| [< >] -> Buffer.contents buf
in aux s ;;
should do the trick. It might not be that convenient for a more
complex set of excluded characters, but it is possible to write a char
-> bool test outside of the stream parser.
--
E tutto per oggi, a la prossima volta
Virgile
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Question re: camlp4 parser
2005-07-27 7:04 ` Virgile Prevosto
@ 2005-07-28 1:27 ` Paul Snively
0 siblings, 0 replies; 6+ messages in thread
From: Paul Snively @ 2005-07-28 1:27 UTC (permalink / raw)
To: virgile.prevosto; +Cc: caml-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Jul 27, 2005, at 12:04 AM, Virgile Prevosto wrote:
> As any other pattern, stream patterns can be refined with a 'when'
> condition:
>
> let printable s =
> let buf = Buffer.create 100 in
> let rec aux = parser
> | [< '' '..'~' as c when c <> '=' && c <> '[';
> x = (Buffer.add_char buf c; aux) >] -> x
> | [< >] -> Buffer.contents buf
> in aux s ;;
>
> should do the trick. It might not be that convenient for a more
> complex set of excluded characters, but it is possible to write a char
> -> bool test outside of the stream parser.
>
Of course: it's all becoming quite clear now. Thanks for the
excellent suggestion and your patience with my naïvete. :-)
> --
> E tutto per oggi, a la prossima volta
> Virgile
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
Best regards,
Paul Snively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
iEYEARECAAYFAkLoNH8ACgkQO3fYpochAqI94gCfXosjSfFZAbtanYQstgCjYLfY
HqUAoIWd4QpsWhynHyj8A6WJDqWOP61B
=BKDa
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-07-28 1:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-26 0:11 Question re: camlp4 parser Paul Snively
2005-07-26 1:17 ` [Caml-list] " Stephane Glondu
2005-07-26 16:43 ` Paul Snively
2005-07-26 17:05 ` Stephane Glondu
2005-07-27 7:04 ` Virgile Prevosto
2005-07-28 1:27 ` Paul Snively
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox