From: "Mattias Waldau" <mattias.waldau@abc.se>
To: <caml-list@inria.fr>
Subject: [Caml-list] Beware of compare (and Ocaml beaten by Java)
Date: Mon, 26 Nov 2001 19:40:00 +0100 [thread overview]
Message-ID: <AAEBJHFJOIPMMIILCEPBCECKDFAA.mattias.waldau@abc.se> (raw)
In-Reply-To: <20011127015149.A10358@earth.cs.mu.oz.au>
Why isn't compare compiled as '-'? According to the definition
of compare this should be okay.
The reason why I am asking this can be found below.
/mattias
Suffix arrays are very efficient for fast string searching
in big texts. I implemented a first nice version, which
took 40 s to run on my P4, when applied to the bible (4.5 MB)
The core of the slow program is
(* compare two substrings of the SAME text
[compare x y] returns [0] if [x=y], a negative integer if
[x<y], and a positive integer if [x>y] *)
let rec same_substr_compare str idx1 idx2 : int =
let length = String.length str in
(* shortest string is smaller *)
if idx1 = length then -1 else
if idx2 = length then 1 else
(* compare one char *)
let res = compare str.[idx1] str.[idx2] in
(* char was equal, recurse *)
if res = 0 then same_substr_compare str (idx1+1) (idx2+1)
(* char was different, finished *)
else res ;;
(* create and sort index array for string *)
let create_index str : int array =
let idx = Array.init (String.length str) (fun ii -> ii) in
Array.stable_sort (same_substr_compare str) idx;
idx ;;
However, a colleague implemented it using Java. His code
only takes 15 s to run.
Thus, I had to further tune the code by replacing 'compare'
with '-', inlining better and removing bound-checking
on strings and arrays. The new version runs in 10 s.
During this lesson I learned that
1. compare is incredible slow! By replacing
let res = compare str.[idx1] str.[idx2] in
with
let res = (-) (int_of_char str.[idx1]) (int_of_char str.[idx2]) in
the run-time of the complete program went down
from 20 s to 10 s.
2. Tail-recursive comparation is faster than using a
loop and exit (runtime down from 80s to 60s). See the file
other_compares.ml in the nice version for these other attempts,
one example is
exception Result of int ;;
let substr_compare str idx1 idx2 : int =
let res = compare str.[idx1] str.[idx2] in
if res <> 0 then res
else try
let length = String.length str in
let max_size = if idx1 > idx2 then
length - idx1
else length - idx2 in
for offset = 1 to max_size - 1 do
let res = compare str.[idx1+offset] str.[idx2+offset] in
if res<>0 then raise (Result res);
done;
0
with Result res -> res ;;
3. Mergesort (=Array.stable_sort) is faster than
heapsort (=Array.sort). (runtime down from 60s to 40s).
(I also tried quicksort (=Sort.array), but after 8 hours
it still hadn't finished.)
4. The rest is inline and removing bounds checking.
This is the first time I got beaten by Java, I will
have to reevaluate Java. Note that Java still does
bounds checking!
/mattias
The nice version is at
http://www.abc.se/~m10217/download/mans.tar.bz2
The fast version is at
http://www.abc.se/~m10217/download/mans.opt.tar.bz2
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
next prev parent reply other threads:[~2001-11-26 18:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-02 17:53 [Caml-list] replay debugger Yoann Padioleau
2001-10-04 12:39 ` Xavier Leroy
2001-11-26 14:51 ` Fergus Henderson
2001-11-26 16:14 ` Xavier Leroy
2001-11-26 18:40 ` Mattias Waldau [this message]
2001-11-27 9:21 ` [Caml-list] Beware of compare (and Ocaml beaten by Java) Xavier Leroy
2001-11-27 9:41 ` Mattias Waldau
2001-11-30 9:12 ` Pierre Weis
2001-12-03 21:37 ` Chris Hecker
2001-11-27 17:03 ` [Caml-list] bytegen.comp_expr error when doing object copying Neil Inala
2001-11-28 20:15 ` Xavier Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AAEBJHFJOIPMMIILCEPBCECKDFAA.mattias.waldau@abc.se \
--to=mattias.waldau@abc.se \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox