* [Caml-list] Strange PCRE bug
@ 2004-09-16 15:44 Richard Jones
2004-09-17 0:21 ` Markus Mottl
0 siblings, 1 reply; 2+ messages in thread
From: Richard Jones @ 2004-09-16 15:44 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 788 bytes --]
$ ocaml -I +pcre
Objective Caml version 3.08.1
# #load "pcre.cma";;
# let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
val rex : Pcre.regexp = <abstr>
# Pcre.extract_all ~rex "a b c d ee ff ";;
(* Hangs, rapidly consuming memory. Killed with ^C ... *)
Interrupted.
# Gc.full_major ();;
- : unit = ()
The Gc.full_major () doesn't recover any memory.
On a more general point, how do I access all the strings captured by
the inner brackets in a pattern like (:? (..) )* ?
Rich.
--
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
MOD_CAML lets you run type-safe Objective CAML programs inside the Apache
webserver. http://www.merjis.com/developers/mod_caml/
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Caml-list] Strange PCRE bug
2004-09-16 15:44 [Caml-list] Strange PCRE bug Richard Jones
@ 2004-09-17 0:21 ` Markus Mottl
0 siblings, 0 replies; 2+ messages in thread
From: Markus Mottl @ 2004-09-17 0:21 UTC (permalink / raw)
To: Richard Jones; +Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]
On Thu, 16 Sep 2004, Richard Jones wrote:
> # #load "pcre.cma";;
> # let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
> val rex : Pcre.regexp = <abstr>
> # Pcre.extract_all ~rex "a b c d ee ff ";;
>
> (* Hangs, rapidly consuming memory. Killed with ^C ... *)
This is a bug concerning null patterns (i.e. ones that match empty
strings, too). I have fixed this now.
> On a more general point, how do I access all the strings captured by
> the inner brackets in a pattern like (:? (..) )* ?
The "(:?" should be "(?:".
Anyway, to answer your question: you can't. The capturing subpattern
"([a-z])+)" will always only capture the last in a series (as introduced
by "*" in your example).
I'm not sure what you want to do, but I guess you want to extract all
words containing characters from a-z in a string? In that case I'd
rather use the much simpler pattern "[a-z]+". "extract_all" will then
return an array of arrays of strings. Each array in the former denotes
an array of matched substrings. Unless you specify "~full_match:false"
the latter will contain the full match in position 0. The full match
is what we want here.
E.g.:
let () =
let rex = Pcre.regexp "[a-z]+" in
let subj = "this is 1 test" in
let many_sstrs = Pcre.extract_all ~rex subj in
let words = Array.map (fun sstrs -> sstrs.(0)) many_sstrs in
Array.iter print_endline words
This will print:
this
is
test
"extract_all" is the dual to "split". In contrast to the latter it
does not remove the matching patterns but keeps them (including matching
substrings), and ignores all else.
Regards,
Markus
--
Markus Mottl http://www.oefai.at/~markus markus@oefai.at
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-09-17 0:22 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-16 15:44 [Caml-list] Strange PCRE bug Richard Jones
2004-09-17 0:21 ` Markus Mottl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox