* [Caml-list] String.unescaped and some other little pitiful laments @ 2001-07-10 18:07 Berke Durak 2001-07-10 18:55 ` Markus Mottl ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Berke Durak @ 2001-07-10 18:07 UTC (permalink / raw) To: caml-list There's a reversible String.escaped function, and I think it would be nice to have its inverse function built in the String module. Also I'd like to see those horrible functions returning parameters in global variables be eradicated, such as those that can be found in the Str (regular expression) module. Is there a complete, typeful regular expression package entirely written in Ocaml ? Many people on this list are talking lighthearted about functions such as Obj.magic. These functions are pure evil. It makes me sorry to see that my favorite language has an unsafe and ugly type casting function. Modules using such features should be flagged as ``evil'', and the use of these functions should not be publicly advocated. PS. What is the purpose of the "uses unsafe features" flag in .cmo files ? (it can be seen in the output of the "objinfo" program in the tools/ directory of the compiler). I've made a test program using unsafe features such as Obj and Array.unsafe_get but the flag wasn't set. -- Berke Durak ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak @ 2001-07-10 18:55 ` Markus Mottl 2001-07-11 6:44 ` Jean-Christophe Filliatre 2001-07-11 6:37 ` Jean-Christophe Filliatre 2001-07-11 19:30 ` Xavier Leroy 2 siblings, 1 reply; 8+ messages in thread From: Markus Mottl @ 2001-07-10 18:55 UTC (permalink / raw) To: Berke Durak; +Cc: caml-list On Tue, 10 Jul 2001, Berke Durak wrote: > Also I'd like to see those horrible functions returning parameters in > global variables be eradicated, such as those that can be found in the Str > (regular expression) module. Is there a complete, typeful regular > expression package entirely written in Ocaml ? Unfortunately not. My Pcre-library has to interface to C to access the matching engine, but the huge rest of the functions that build on it are written in OCaml. In contrast to the Str-library, the Pcre-library is fully reentrant, which is nice if you want to use it with threads or want to interleave several matches with others. If somebody wants to give it a try, the SML-entry in the language shootout implements a regexp-library with NFAs and DFAs. I haven't given it a closer look yet, but performance looks excellent: http://www.bagley.org/~doug/shootout/bench/regexmatch/regexmatch.mlton > Many people on this list are talking lighthearted about functions > such as Obj.magic. These functions are pure evil. It makes me sorry > to see that my favorite language has an unsafe and ugly type casting > function. Modules using such features should be flagged as ``evil'', > and the use of these functions should not be publicly advocated. I don't think anybody would talk lightheared about "Obj.magic". It happens extremely seldom that one needs it, e.g. when you want to initialize the contents of a reference with a fully polymorphic value, which you cannot necessarily create (and matches on optional values with an "assert false"-branch look really ugly and require many more lines, too). The latter problem could be eliminated in some cases if one could raise exceptions with polymorphic contents (by binding the type variable in some enclosing expression in which the exception is defined). There is also the trick to use "Obj.magic" for resizable arrays to deallocate objects that are outside the index. I wouldn't know how else to get the same behaviour. Regards, Markus Mottl -- Markus Mottl markus@oefai.at Austrian Research Institute for Artificial Intelligence http://www.oefai.at/~markus ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-10 18:55 ` Markus Mottl @ 2001-07-11 6:44 ` Jean-Christophe Filliatre 0 siblings, 0 replies; 8+ messages in thread From: Jean-Christophe Filliatre @ 2001-07-11 6:44 UTC (permalink / raw) To: Markus Mottl; +Cc: Berke Durak, caml-list Markus Mottl writes: > If somebody wants to give it a try, the SML-entry in the language shootout > implements a regexp-library with NFAs and DFAs. I haven't given it a > closer look yet, but performance looks excellent: > > http://www.bagley.org/~doug/shootout/bench/regexmatch/regexmatch.mlton I tried Claude Marché's Regexp library instead of Pcre on that particular example and there is a speedup of 5 % approximatively. Note that the Regexp library compiles regular expressions into deterministic finite automata using a very different algorithm than the SML entry, from this article: G. Berry and R. Sethi From Regular Expressions to Deterministic Automata Theoretical Computer Science 48 (1986) 117-126 This is a very nice and concise algorithm and I encourage anybody interested in regexp and automata to have a look at it (again, the url for the code is http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz) -- Jean-Christophe Filliatre mailto:Jean-Christophe.Filliatre@lri.fr http://www.lri.fr/~filliatr ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak 2001-07-10 18:55 ` Markus Mottl @ 2001-07-11 6:37 ` Jean-Christophe Filliatre 2001-07-11 7:29 ` Claude Marche 2001-07-11 18:03 ` Jerome Vouillon 2001-07-11 19:30 ` Xavier Leroy 2 siblings, 2 replies; 8+ messages in thread From: Jean-Christophe Filliatre @ 2001-07-11 6:37 UTC (permalink / raw) To: Berke Durak; +Cc: caml-list Berke Durak writes: > > Also I'd like to see those horrible functions returning parameters in > global variables be eradicated, such as those that can be found in the Str > (regular expression) module. Is there a complete, typeful regular > expression package entirely written in Ocaml ? Yes, there is one by Claude Marché, available (in a very first release) at: http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz (Documentation can be found in the .mli files) -- Jean-Christophe Filliatre mailto:Jean-Christophe.Filliatre@lri.fr http://www.lri.fr/~filliatr ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-11 6:37 ` Jean-Christophe Filliatre @ 2001-07-11 7:29 ` Claude Marche 2001-07-11 18:03 ` Jerome Vouillon 1 sibling, 0 replies; 8+ messages in thread From: Claude Marche @ 2001-07-11 7:29 UTC (permalink / raw) To: Jean-Christophe Filliatre; +Cc: Berke Durak, caml-list Hi all, >>>>> "Jean-Christophe" == Jean-Christophe Filliatre <Jean-Christophe.Filliatre@lri.fr> writes: Jean-Christophe> Berke Durak writes: >> >> Also I'd like to see those horrible functions returning parameters in >> global variables be eradicated, such as those that can be found in the Str >> (regular expression) module. Is there a complete, typeful regular >> expression package entirely written in Ocaml ? Jean-Christophe> Yes, there is one by Claude Marché, available (in a very first Jean-Christophe> release) at: Jean-Christophe> http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz Jean-Christophe> (Documentation can be found in the .mli files) I made this small package quite recently, entirely in Caml because I needed to do so. But with respect to Str and Pcre, several features are missing. If there is enough demand I may add such features in the future. I added to the Web page the documentation (generated with ocamlweb, http://www.lri.fr/~filliatr/ocamlweb/) in HTML, PDF and PS format. See http://www.lri.fr/~marche/regexp - Claude -- | Claude Marché | mailto:Claude.Marche@lri.fr | | LRI - Bât. 490 | http://www.lri.fr/~marche/ | | Université de Paris-Sud | phoneto: +33 1 69 15 64 85 | | F-91405 ORSAY Cedex | faxto: +33 1 69 15 65 86 | ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-11 6:37 ` Jean-Christophe Filliatre 2001-07-11 7:29 ` Claude Marche @ 2001-07-11 18:03 ` Jerome Vouillon 1 sibling, 0 replies; 8+ messages in thread From: Jerome Vouillon @ 2001-07-11 18:03 UTC (permalink / raw) To: Jean-Christophe Filliatre; +Cc: Berke Durak, caml-list On Wed, Jul 11, 2001 at 08:37:38AM +0200, Jean-Christophe Filliatre wrote: > > Berke Durak writes: > > > > Also I'd like to see those horrible functions returning parameters in > > global variables be eradicated, such as those that can be found in the Str > > (regular expression) module. Is there a complete, typeful regular > > expression package entirely written in Ocaml ? > > Yes, there is one by Claude Marché, available (in a very first > release) at: > > http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz > > (Documentation can be found in the .mli files) We also implemented a regular expression module for Unison, as the standard one (Str) was unusably slow. It has a similar interface to the one by Claude Marché, but it is more complete: it support almost all Posix extended regular expression (only collating sequences are missing), filename globbing, case insensitive matching, and boolean operations (union, intersection and difference) on regular expression. Both implementation have the disavantage of not supporting submatches, though. It would be interesting to compare their perfomances. The sources are a stand-alone subset (files src/rx.ml and src/rx.mli) of the sources of Unison (available from http://www.cis.upenn.edu/~bcpierce/unison/index.html). -- Jerome ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak 2001-07-10 18:55 ` Markus Mottl 2001-07-11 6:37 ` Jean-Christophe Filliatre @ 2001-07-11 19:30 ` Xavier Leroy 2001-07-11 20:33 ` Markus Mottl 2 siblings, 1 reply; 8+ messages in thread From: Xavier Leroy @ 2001-07-11 19:30 UTC (permalink / raw) To: Berke Durak; +Cc: caml-list > Also I'd like to see those horrible functions returning parameters in > global variables be eradicated, such as those that can be found in the Str > (regular expression) module. Yes, the Str module is a thorn in my side: not only the API is bad (too much reliance on global state), but the underlying implementation (the GNU regexp library) is awful -- on moderately complex regular expressions, it can get really slow, or just abort on an exception. (Stallman et al usually write better code than this!) I'd really love to get rid of it, but as usual I'm obsessed with backward compatibility, and couldn't find an existing regexp library that recognizes the same regexp language as Str -- so that we could easily keep the old Str interface as a wrapper around the new interface. So, this is a question to the developers of alternate regexp libraries: how hard would it be to implement an Str emulation on top of your libraries? If you're interested, we can pursue this discussion by private e-mail. > Many people on this list are talking lighthearted about functions such > as Obj.magic. These functions are pure evil. It makes me sorry to see > that my favorite language has an unsafe and ugly type casting > function. Modules using such features should be flagged as > ``evil'', and the use of these functions should not be publicly > advocated. But they are not! Not by us, at least. You'd be hard-pressed to find any mention of the Obj module in the OCaml docs. There are a couple of legitimate uses of Obj.magic in the toplevel loop, and a few other uses (e.g. in ocamlyacc-generated parsers) that could be removed with a little more work. But, yes, I'd advise all OCaml programmers to never, never use Obj.magic. In particular, this can lead to incorrect code being generated by the ocamlopt compiler (because it fools its type-dependent optimizations). A few years ago, I spent a couple of hours tracking an obscure GC bug in a program sent by an user as part of a bug report. It turned out to be an incorrect use of Obj.magic in the source code... Since then, I first grep for Obj.magic in every bug report sent to us! > PS. What is the purpose of the "uses unsafe features" flag in .cmo > files ? (it can be seen in the output of the "objinfo" program in the > tools/ directory of the compiler). I've made a test program using > unsafe features such as Obj and Array.unsafe_get but the flag wasn't > set. It's poorly named. Actually, it tracks whether the module declares external primitives (using the "external" syntax). It's used for type-safe dynamic loading of compiled bytecode: the Dynlink loader lets you check that the bytecode was compiled against a set of known interfaces (presumably not including unsafe operations such as Obj.magic or Array.unsafe_get), but there is also the risk that the bytecode simply declares these operations itself using well-chosen "external" declarations. So, Dynlink can also track "external" declarations and prohibit them. - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] String.unescaped and some other little pitiful laments 2001-07-11 19:30 ` Xavier Leroy @ 2001-07-11 20:33 ` Markus Mottl 0 siblings, 0 replies; 8+ messages in thread From: Markus Mottl @ 2001-07-11 20:33 UTC (permalink / raw) To: Xavier Leroy; +Cc: Berke Durak, caml-list On Wed, 11 Jul 2001, Xavier Leroy wrote: > So, this is a question to the developers of alternate regexp > libraries: how hard would it be to implement an Str emulation on top > of your libraries? If you're interested, we can pursue this > discussion by private e-mail. Well, I somehow felt addressed by this request... ;) It may seem that one would only have to rewrite Emacs-style patterns to Perl-style ones to use my Pcre-interface. Unfortunately, there is no way to stay backward compatible _and_ get rid of the statefulness, because the interface of the Str-library just relies on the latter. One could surely emulate this behaviour with not too much effort, but the Str-library would still remain awfully stateful then (though it would perform matching somewhat faster). I think that in the long run there will be no way around declaring the Str-interface obsolete. Especially for multi-threaded applications a stateless regexp engine is really a requirement. The longer we keep this interface, the more legacy code we will get... If you want, just shamelessly grab the Pcre-library and adapt it to your needs: it's LGPLed anyway. Though, I admit that I'd also like to see a featureful and fast regexp-library purely written in OCaml rather than one cowardly interfacing to existing C-libraries (would require significantly more work). If anybody wants to write a substitute for the Str- building on the Pcre-library and needs some hints, just tell me. Unfortunately, I won't have time to work on this issue in the near future... Best regards, Markus Mottl -- Markus Mottl markus@oefai.at Austrian Research Institute for Artificial Intelligence http://www.oefai.at/~markus ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-07-11 20:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak 2001-07-10 18:55 ` Markus Mottl 2001-07-11 6:44 ` Jean-Christophe Filliatre 2001-07-11 6:37 ` Jean-Christophe Filliatre 2001-07-11 7:29 ` Claude Marche 2001-07-11 18:03 ` Jerome Vouillon 2001-07-11 19:30 ` Xavier Leroy 2001-07-11 20:33 ` Markus Mottl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox