From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA29179 for caml-redistribution; Wed, 20 Jan 1999 14:02:14 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id DAA30811 for ; Wed, 20 Jan 1999 03:01:37 +0100 (MET) Received: from miss.wu-wien.ac.at (miss.wu-wien.ac.at [137.208.107.17]) by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id DAA19165 for ; Wed, 20 Jan 1999 03:01:34 +0100 (MET) Received: (from mottl@localhost) by miss.wu-wien.ac.at (8.9.0/8.9.0) id DAA17202; Wed, 20 Jan 1999 03:01:30 +0100 (MET) From: Markus Mottl Message-Id: <199901200201.DAA17202@miss.wu-wien.ac.at> Subject: Re: counting words To: tm1@cise.npl.co.uk (Toby Moth) Date: Wed, 20 Jan 1999 03:01:30 +0100 (MET) Cc: caml-list@inria.fr (OCAML) In-Reply-To: <199901191647.QAA06463@squall.cise.npl.co.uk> from "Toby Moth" at Jan 19, 99 04:47:19 pm X-Mailer: ELM [version 2.4 PL21] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: weis Hello, > Here are some results just to give a flavour of relative speeds on an Ultra 5. > > [tm1]$ ls -l tm.dv > -rw-rw-r-- 1 tm1 this 452849 Jan 13 13:06 tm.dv > [tm1]$ time cat tm.dv | perl ~/lang/perl/wc.pl > pcount > > real 0m28.604s PERL > user 0m28.260s > sys 0m0.100s > [tm1]$ time cat tm.dv | ~/lang/c/wc > ccount > > real 0m2.708s C++ > user 0m2.260s > sys 0m0.180s > [tm1]$ time cat tm.dv | ~/lang/caml/ywc > mcount > > real 0m0.999s OCAML > user 0m0.720s > sys 0m0.080s Since comparing the performance of compilers is one of my hobbies ;-), I have tried to follow this "contest" and have conducted a more thorough test: I implemented the solution to the problem with "flex" in C++, both with maps and hash tables; then I wrote the PERL-version with hashes, too (*very* short!) and then I took a look at the performance of the native code compiler and interpreter of OCAML. The result of all runs had an equal amount of characters (though, not in the same order, because hashes don't sort). Thus, the implementation seems to be correct. The test was conducted under Linux (PII 400Mhz, 128MB RAM), the C++-compiler is "egcs-2.91.57" (optimizations: -O2 -fomit-frame-pointer). The test case was the concatenated text of all the Linux HOWTO's on my machine (about 7.3MB). One of the problems with the C++-version (besides an internal compiler error when trying to code the solution ;-) is that the number of buckets in the hash table cannot be fixed at a low level. If the table is created without parameters, it starts out with 193 buckets and increases this number to 196613! To make up for this, I conducted the test under OCAML with 101 buckets (as used by Albert) and the maximum found in C++ (196613). In another test I let C++ start out with this high level - I think that we have a really fair comparison now. Another problem with C++ was that finding the right use of templates was everything but easy - the compiler not only sometimes emitted screen-fulls of expanded templates as error message, but I'd say that the C++-code looks really ill now! (I hope it's not my fault...) By the way: I didn't use the STL-"string"-class for efficiency reasons... Anyway, the speed result is very interesting indeed (in order from best to worst, three successive tests each): ocamlopt & ocamllex (196613 buckets): 4.270u 0.150s 0:04.42 100.0% 0+0k 0+0io 542pf+0w 4.250u 0.180s 0:04.66 95.0% 0+0k 0+0io 542pf+0w 4.240u 0.180s 0:04.42 100.0% 0+0k 0+0io 542pf+0w ocaml (interpreted) & ocamllex (196613 buckets): 9.080u 0.180s 0:09.25 100.1% 0+0k 0+0io 539pf+0w 9.120u 0.150s 0:09.26 100.1% 0+0k 0+0io 539pf+0w 8.990u 0.270s 0:09.45 97.9% 0+0k 0+0io 539pf+0w ocamlopt & ocamllex (101 buckets): 11.160u 0.270s 0:11.43 100.0% 0+0k 0+0io 542pf+0w 11.200u 0.240s 0:11.44 100.0% 0+0k 0+0io 542pf+0w 11.250u 0.200s 0:11.44 100.0% 0+0k 0+0io 542pf+0w PERL: 11.340u 0.240s 0:11.58 100.0% 0+0k 0+0io 643pf+0w 11.380u 0.200s 0:11.58 100.0% 0+0k 0+0io 643pf+0w 11.350u 0.230s 0:11.58 100.0% 0+0k 0+0io 643pf+0w C++ & flex (hash_map, 196613 buckets): 11.560u 0.340s 0:11.90 100.0% 0+0k 0+0io 552pf+0w 11.620u 0.280s 0:11.90 100.0% 0+0k 0+0io 552pf+0w 11.570u 0.320s 0:11.89 100.0% 0+0k 0+0io 552pf+0w C++ & flex (hash_map, 193++ buckets): 11.780u 0.260s 0:12.04 100.0% 0+0k 0+0io 552pf+0w 11.720u 0.310s 0:12.03 100.0% 0+0k 0+0io 552pf+0w 11.870u 0.160s 0:12.03 100.0% 0+0k 0+0io 552pf+0w C++ & flex (map): 14.530u 0.290s 0:14.82 100.0% 0+0k 0+0io 551pf+0w 14.540u 0.250s 0:14.80 99.9% 0+0k 0+0io 551pf+0w 14.630u 0.180s 0:14.81 100.0% 0+0k 0+0io 551pf+0w ocaml (interpreted) & ocamllex (101 buckets): 20.450u 0.130s 0:20.58 100.0% 0+0k 0+0io 541pf+0w 20.300u 0.270s 0:20.57 100.0% 0+0k 0+0io 539pf+0w 20.450u 0.150s 0:20.59 100.0% 0+0k 0+0io 539pf+0w The funny thing is that the OCAML-interpreter beats off all other competitors (not the OCAML native code compiler, of course)! Please tell this your local C++-guru :-) Even PERL beats C++ without problem. OCAML translated to native code is *way* faster than C++ (far more than twice!) The slowest competitor was the OCAML-interpreter with the unfair bucket count of 101 - I just added it to show a result for a not unreasonable number of buckets (196613 is probably a bit exaggerated...) Now for the source code (shortest first): PERL (yes, it *can* be this short!): --------------------------------------------------------------------------- while(<>) { for (/[^ \t\n\r]+/g) { ++$table{$_}; } } while (($key, $value) = each %table) { printf "%4d\t\%s\n", $value, $key; } --------------------------------------------------------------------------- ocamllex-file (note: 196613 buckets): --------------------------------------------------------------------------- { let hash = Hashtbl.create 101 let hash_add string = try let v = Hashtbl.find hash string in v:= !v + 1 with Not_found -> Hashtbl.add hash string (ref 1) } rule read = parse [' ' '\n' '\r' '\t']+ { read lexbuf } | [^' ' '\n' '\r' '\t']+ { hash_add ( Lexing.lexeme lexbuf ); read lexbuf } | eof { Hashtbl.iter ( fun k v -> Printf.printf "%4d\t%s\n" !v k ) hash } { let _ = read_rec ( Lexing.from_channel stdin ) 0 } --------------------------------------------------------------------------- flex and C++ with hash (note: 196613 buckets): Don't forget to use option "-+" to get C++-code! --------------------------------------------------------------------------- %option noyywrap #include #include #include struct eqstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; typedef hash_map, eqstr > mymap; mymap vec(196613); %% [^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext); ++vec[foo]; } [ \t\n\r]+ ; %% int main() { yyFlexLexer MyScanner; MyScanner.yylex(); for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p) printf ("%4d\t%s\n", p->second, p->first); } --------------------------------------------------------------------------- flex and C++ with map: Don't forget to use option "-+" to get C++-code! --------------------------------------------------------------------------- %option noyywrap #include #include #include struct less_str { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) < 0; } }; typedef map mymap; mymap vec; %% [^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext); ++vec[foo]; } [ \t\n\r]+ ; %% int main() { yyFlexLexer MyScanner; MyScanner.yylex(); for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p) printf ("%4d\t%s\n", p->second, p->first); } --------------------------------------------------------------------------- So much about C++ and efficiency ;-) Interesting to see that PERL did perform very well on this test. Its brevity is unmatched! One shouldn't forget that its hash functions are highly optimized, its regular expression matching capabilities, as well. Though, it is definitely not a language for implementing data structures, as we have seen in Albert's example. I think that this test is really significant. I cannot imagine how to make C++ so much faster that it approaches OCAML - without low-level coding hash tables and pattern matching. This test showed me, what a nice life I have since I use OCAML instead of C++ (not only in terms of efficiency...) Best regards, Markus Mottl -- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl