* [Caml-list] [ann] Regexp library supporting binding for * and +'s @ 2004-09-19 20:41 Yutaka OIWA 2004-09-20 0:38 ` skaller 0 siblings, 1 reply; 4+ messages in thread From: Yutaka OIWA @ 2004-09-19 20:41 UTC (permalink / raw) To: caml-list Hi everyone at caml-list, From the computer room at ICFP2004 in Snowbird Resort, I announce a beta version of my combinator-based regular-expression match library which supports list (Kleene-*) binding. This library provide a set of typed "combinators" which can be used to construct "regular expression matcher", which tests strings against regexps and capture the matched substring in various ways. Especially, powerful "repeat" combinator, which corresponds to * and + operators in conventional regular expression notation, returns all values captured inside as a list value. For example, the small code below open Regexp_pp_ng let s = "1 2 3 4 5" in match_string s (repeat ~sep:spacesA int_decimal) (fun x -> x) returns [1; 2; 3; 4; 5]: int list. All combinators are given static types and any mismatch of value types and matcher types are statically rejected. The implementation is available from a subversion repository. Using subversion, you can checkout the URL https://www.oiwa.jp/svn/regexp-ocaml/branches/combinators/ to get the up-to-date implementation, or you can directly access the above address by web browsers to see the latest revision. There is also a ViewCVS interface at the following address. http://www.oiwa.jp/viewcvs/regexp-ocaml/branches/combinators/ See regexp_pp_ng.mli for interfaces, and regexp_pp_ng_test.ml for some example of the use of this library. It may work partially on some older OCaml, but for real use it requires a newer version (3.07 or later) which supports the relaxed value restriction. I plan to construct a neat syntax sugar over this library and build a next-generation version of Regexp/OCaml library. Any comments are welcome. -- Yutaka Oiwa Yonezawa Lab., Dept. of Computer Science, Graduate School of Information Sci. & Tech., Univ. of Tokyo. <oiwa@yl.is.s.u-tokyo.ac.jp>, <yutaka@oiwa.jp> PGP fingerprint = C9 8D 5C B8 86 ED D8 07 EA 59 34 D8 F4 65 53 61 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] [ann] Regexp library supporting binding for * and +'s 2004-09-19 20:41 [Caml-list] [ann] Regexp library supporting binding for * and +'s Yutaka OIWA @ 2004-09-20 0:38 ` skaller 2004-09-20 6:54 ` Yutaka OIWA 0 siblings, 1 reply; 4+ messages in thread From: skaller @ 2004-09-20 0:38 UTC (permalink / raw) To: Yutaka OIWA; +Cc: caml-list On Mon, 2004-09-20 at 06:41, Yutaka OIWA wrote: > >From the computer room at ICFP2004 in Snowbird Resort, > I announce a beta version of my combinator-based > regular-expression match library which supports > list (Kleene-*) binding. > I plan to construct a neat syntax sugar over this library > and build a next-generation version of Regexp/OCaml library. > Any comments are welcome. Can you explain why/how Pcre is being used? I'm currently looking at providing the same kind of facility, however I need: (a) all pure Ocaml -- reason: maintenance, soundness (b) able to generate fairly simple automata Reason-- the execution target may be C, so it must be possible to both encode the data fairly simply, and also to provide C routines to execute various automata based on that data, without building complex data structures. (c) must process at least a stream of integer inputs Reason: 8 bit inputs are unacceptable for i18n reasons. In addition, there are uses of state machines other than processing 'strings'. I'd like to combine at least (i) tokenisation and (ii) substring extraction however a more general facility such as parsing as in C/XDuce is also appealing. Alternatively, or as well, processing tagged NFA's readily yields RTNs and hence CFG parsing support. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] [ann] Regexp library supporting binding for * and +'s 2004-09-20 0:38 ` skaller @ 2004-09-20 6:54 ` Yutaka OIWA 2004-09-20 11:12 ` skaller 0 siblings, 1 reply; 4+ messages in thread From: Yutaka OIWA @ 2004-09-20 6:54 UTC (permalink / raw) To: skaller; +Cc: caml-list >> On 20 Sep 2004 10:38:33 +1000, skaller <skaller@users.sourceforge.net> said: skaller> On Mon, 2004-09-20 at 06:41, Yutaka OIWA wrote: >> I plan to construct a neat syntax sugar over this library >> and build a next-generation version of Regexp/OCaml library. >> Any comments are welcome. skaller> Can you explain why/how Pcre is being used? The reason is simply current implemenentation convenience. It is stable, has enough features (e.g. unlimited number of captures, non-capturing groups, much of helper functions and runtime features, and is well-performing. My intension is not to implement automata engine by myself, at least in near future. However, as you can see in README in Regexp-OCaml (main version), my future plan includes supporting backends other than PCRE/OCaml. Having its own regexp parser and limiting regexp syntax to strict regular language are the provision for possible future. At the time of OCaml 3.07 released, I really considered to support the standard Str module, but unfortunately current Str lacks some of the features required by current Regexp/OCaml implementation. Anyway, backend is backend. And also, frontend is frontend. Period. It can be highly independent once it designed so, and my interests are mainly in the frontend part. I highly appreciete supports from people working on the backend part. Multilingualization is one in current high-priority to-do list. At least one of the users requested me to support EUC-JP patterns, and you might be the second person :-) I am considering how to support M17N feature: it may depends to underlying backends (e.g. Camomile?), or it may be supported solely in the frontend layer, by encoding multibyte handling into regexps. This trick is used in the Japanese port of Perl interpreter on MS-DOS, and (at least) one of Japanese handling module for Perl5. # As you can imagine, just using M17N feature of underlying library is # not sufficient: internal regexp parser must also modified to accept # multibyte-encoded regular expression. This is one of the reason that # curent Regexp/OCaml does not support UTF8 option of PCRE/OCaml. For supporting list-binding of Kleene-stars, I am very interested in richer backends which supports such features. Alain Frisch's recent posting has interested me. There is also a talk with related title in ICFP04, although I had not yet read the paper. However, I feel at the same time that backend is not a current show-stopper: it is truly better to have such backends, but it can be emulated without that, As I had shown in the combinators. I can wait for a while for theretical/practical progresses. Current problem is mainly the frontend: there are many language-design problems once we introduce nested bindings. I already had a discussion with some people in ICFP04, and I hope more. -- Yutaka Oiwa Yonezawa Lab., Dept. of Computer Science, Graduate School of Information Sci. & Tech., Univ. of Tokyo. <oiwa@yl.is.s.u-tokyo.ac.jp>, <yutaka@oiwa.jp> PGP fingerprint = C9 8D 5C B8 86 ED D8 07 EA 59 34 D8 F4 65 53 61 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] [ann] Regexp library supporting binding for * and +'s 2004-09-20 6:54 ` Yutaka OIWA @ 2004-09-20 11:12 ` skaller 0 siblings, 0 replies; 4+ messages in thread From: skaller @ 2004-09-20 11:12 UTC (permalink / raw) To: Yutaka OIWA; +Cc: caml-list On Mon, 2004-09-20 at 16:54, Yutaka OIWA wrote: > >> On 20 Sep 2004 10:38:33 +1000, skaller <skaller@users.sourceforge.net> said: > I can wait for a while for > theretical/practical progresses. Current problem is mainly the frontend: > there are many language-design problems once we introduce nested bindings. > I already had a discussion with some people in ICFP04, and I hope more. OK, keep us posted on anything that comes out of it. My engine supports lexical analysis, but I can't do substring extraction. Unclear whether to move to supporting tagged automata, or use Alain Frisch parser, or both :) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-09-20 11:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-09-19 20:41 [Caml-list] [ann] Regexp library supporting binding for * and +'s Yutaka OIWA 2004-09-20 0:38 ` skaller 2004-09-20 6:54 ` Yutaka OIWA 2004-09-20 11:12 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox