From: Bruno De Fraine <Bruno.De.Fraine@vub.ac.be>
To: Caml-list ml <caml-list@inria.fr>
Cc: Oliver Bandel <oliver@first.in-berlin.de>
Subject: ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)]
Date: Mon, 24 Sep 2007 20:22:00 +0200 [thread overview]
Message-ID: <EFC8B570-0C5E-44D6-88CF-4CC04FA2CAFA@vub.ac.be> (raw)
In-Reply-To: <20070427231220.GA1507@first.in-berlin.de>
Hello,
On 28 Apr 2007, at 01:12, Oliver Bandel wrote:
> So, I then checked my mboxlib and saw that it is quite slow,
> compared to what I expected ( expect! I did not tried it
> on my development machine because I have nomutt installed there)
> and even if native-code smuch faster, it's nevertheless slow...
> ...so I thought I have to redesign my scanner-stage.
> (I use Str-module and ocamnllex mixed together; maybe
> using a plain selfwritten OCaml-scanner might be better here).
I don't know if Oliver ever got to the bottom of this speed problem,
but, I also noticed ocamllex can be quite slow for simple scanning.
For example, I used this ocamllex source:
{ }
rule translate = parse
| "current_directory" { print_endline (Sys.getcwd ()); translate
lexbuf }
| _ { translate lexbuf }
| eof { () }
{
for i = 1 to (Array.length Sys.argv - 1); do
translate (Lexing.from_channel (open_in Sys.argv.(i)))
done ;;
}
And compared it against this version using the Str module:
let re = Str.regexp_string "current_directory" ;;
for i = 1 to (Array.length Sys.argv - 1); do
let ch = open_in Sys.argv.(i) in
try
while true; do
let line = input_line ch in
try
let _ = Str.search_forward re line 0 in
print_endline (Sys.getcwd ())
with Not_found -> ()
done
with End_of_file -> close_in ch
done ;;
Neither version does anything useful, except print the current
directory when it encounters the string "current_directory". I tested
this on a 57M text file (that has only a few "current_directory"
occurrences). The ocamllex-version takes about 3.5s, while the Str-
version takes only 0.35s. What causes this difference? Perhaps there
is a high overhead in calling the translate function for every input
character in such big input files, but I don't know how this can be
avoided.
Thanks,
Bruno
next prev parent reply other threads:[~2007-09-24 18:22 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-27 13:54 mboxlib reloaded ;-) Oliver Bandel
2007-04-27 16:29 ` [Caml-list] " Richard Jones
2007-04-27 23:12 ` Oliver Bandel
2007-04-28 0:54 ` skaller
2007-04-28 10:47 ` Oliver Bandel
2007-04-28 10:54 ` Gabriel Kerneis
2007-04-28 11:44 ` Oliver Bandel
2007-04-28 13:49 ` skaller
2007-04-28 14:18 ` Oliver Bandel
2007-04-29 10:45 ` Richard Jones
2007-04-29 15:41 ` Oliver Bandel
2007-04-29 18:51 ` Robert Roessler
2007-05-01 11:00 ` camomile-problem (Re: [Caml-list] mboxlib reloaded ;-)) Oliver Bandel
2007-05-01 10:56 ` [Caml-list] mboxlib reloaded ;-) Oliver Bandel
2007-04-28 7:56 ` Richard Jones
2007-04-28 10:58 ` Oliver Bandel
[not found] ` <20070429103911.GA30510@furbychan.cocan.org>
2007-04-29 15:43 ` Oliver Bandel
2007-09-24 18:22 ` Bruno De Fraine [this message]
2007-09-24 19:54 ` ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)] Alain Frisch
2007-09-25 8:53 ` Bruno De Fraine
2007-09-24 22:06 ` skaller
2007-09-27 5:26 ` Chris King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=EFC8B570-0C5E-44D6-88CF-4CC04FA2CAFA@vub.ac.be \
--to=bruno.de.fraine@vub.ac.be \
--cc=caml-list@inria.fr \
--cc=oliver@first.in-berlin.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox