From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA29179 for caml-redistribution; Wed, 20 Jan 1999 14:02:14 +0100 (MET)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id DAA30811 for <caml-list@pauillac.inria.fr>; Wed, 20 Jan 1999 03:01:37 +0100 (MET)
Received: from miss.wu-wien.ac.at (miss.wu-wien.ac.at [137.208.107.17])
	by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id DAA19165
	for <caml-list@inria.fr>; Wed, 20 Jan 1999 03:01:34 +0100 (MET)
Received: (from mottl@localhost)
	by miss.wu-wien.ac.at (8.9.0/8.9.0) id DAA17202;
	Wed, 20 Jan 1999 03:01:30 +0100 (MET)
From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199901200201.DAA17202@miss.wu-wien.ac.at>
Subject: Re: counting words
To: tm1@cise.npl.co.uk (Toby Moth)
Date: Wed, 20 Jan 1999 03:01:30 +0100 (MET)
Cc: caml-list@inria.fr (OCAML)
In-Reply-To: <199901191647.QAA06463@squall.cise.npl.co.uk> from "Toby Moth" at Jan 19, 99 04:47:19 pm
X-Mailer: ELM [version 2.4 PL21]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Sender: weis

Hello,

> Here are some results just to give a flavour of relative speeds on an Ultra 5.
> 
> [tm1]$ ls -l tm.dv
> -rw-rw-r--   1 tm1      this      452849 Jan 13 13:06 tm.dv
> [tm1]$ time cat tm.dv | perl ~/lang/perl/wc.pl > pcount 
> 
> real    0m28.604s                       PERL
> user    0m28.260s                            
> sys     0m0.100s
> [tm1]$ time cat tm.dv | ~/lang/c/wc > ccount
> 
> real    0m2.708s                        C++
> user    0m2.260s
> sys     0m0.180s
> [tm1]$ time cat tm.dv | ~/lang/caml/ywc > mcount
> 
> real    0m0.999s                       OCAML
> user    0m0.720s
> sys     0m0.080s

Since comparing the performance of compilers is one of my hobbies ;-),
I have tried to follow this "contest" and have conducted a more thorough
test:

I implemented the solution to the problem with "flex" in C++, both with
maps and hash tables; then I wrote the PERL-version with hashes, too
(*very* short!) and then I took a look at the performance of the native
code compiler and interpreter of OCAML. The result of all runs had an
equal amount of characters (though, not in the same order, because hashes
don't sort). Thus, the implementation seems to be correct.

The test was conducted under Linux (PII 400Mhz, 128MB RAM), the
C++-compiler is "egcs-2.91.57" (optimizations: -O2 -fomit-frame-pointer).
The test case was the concatenated text of all the Linux HOWTO's on my
machine (about 7.3MB).

One of the problems with the C++-version (besides an internal compiler
error when trying to code the solution ;-) is that the number of buckets
in the hash table cannot be fixed at a low level. If the table is
created without parameters, it starts out with 193 buckets and increases
this number to 196613! To make up for this, I conducted the test under
OCAML with 101 buckets (as used by Albert) and the maximum found in C++
(196613). In another test I let C++ start out with this high level -
I think that we have a really fair comparison now.

Another problem with C++ was that finding the right use of templates was
everything but easy - the compiler not only sometimes emitted screen-fulls
of expanded templates as error message, but I'd say that the C++-code
looks really ill now! (I hope it's not my fault...)
By the way: I didn't use the STL-"string"-class for efficiency reasons...

Anyway, the speed result is very interesting indeed (in order from best to
worst, three successive tests each):

ocamlopt & ocamllex (196613 buckets):

4.270u 0.150s 0:04.42 100.0%    0+0k 0+0io 542pf+0w
4.250u 0.180s 0:04.66 95.0%     0+0k 0+0io 542pf+0w
4.240u 0.180s 0:04.42 100.0%    0+0k 0+0io 542pf+0w


ocaml (interpreted) & ocamllex (196613 buckets):

9.080u 0.180s 0:09.25 100.1%    0+0k 0+0io 539pf+0w
9.120u 0.150s 0:09.26 100.1%    0+0k 0+0io 539pf+0w
8.990u 0.270s 0:09.45 97.9%     0+0k 0+0io 539pf+0w


ocamlopt & ocamllex (101 buckets):

11.160u 0.270s 0:11.43 100.0%   0+0k 0+0io 542pf+0w
11.200u 0.240s 0:11.44 100.0%   0+0k 0+0io 542pf+0w
11.250u 0.200s 0:11.44 100.0%   0+0k 0+0io 542pf+0w


PERL:

11.340u 0.240s 0:11.58 100.0%   0+0k 0+0io 643pf+0w
11.380u 0.200s 0:11.58 100.0%   0+0k 0+0io 643pf+0w
11.350u 0.230s 0:11.58 100.0%   0+0k 0+0io 643pf+0w


C++ & flex (hash_map, 196613 buckets):

11.560u 0.340s 0:11.90 100.0%   0+0k 0+0io 552pf+0w
11.620u 0.280s 0:11.90 100.0%   0+0k 0+0io 552pf+0w
11.570u 0.320s 0:11.89 100.0%   0+0k 0+0io 552pf+0w

C++ & flex (hash_map, 193++ buckets):

11.780u 0.260s 0:12.04 100.0%   0+0k 0+0io 552pf+0w
11.720u 0.310s 0:12.03 100.0%   0+0k 0+0io 552pf+0w
11.870u 0.160s 0:12.03 100.0%   0+0k 0+0io 552pf+0w


C++ & flex (map):

14.530u 0.290s 0:14.82 100.0%   0+0k 0+0io 551pf+0w
14.540u 0.250s 0:14.80 99.9%    0+0k 0+0io 551pf+0w
14.630u 0.180s 0:14.81 100.0%   0+0k 0+0io 551pf+0w


ocaml (interpreted) & ocamllex (101 buckets):
20.450u 0.130s 0:20.58 100.0%   0+0k 0+0io 541pf+0w
20.300u 0.270s 0:20.57 100.0%   0+0k 0+0io 539pf+0w
20.450u 0.150s 0:20.59 100.0%   0+0k 0+0io 539pf+0w


The funny thing is that the OCAML-interpreter beats off all other
competitors (not the OCAML native code compiler, of course)!
Please tell this your local C++-guru :-)
Even PERL beats C++ without problem.

OCAML translated to native code is *way* faster than C++ (far more
than twice!)

The slowest competitor was the OCAML-interpreter with the unfair bucket
count of 101 - I just added it to show a result for a not unreasonable
number of buckets (196613 is probably a bit exaggerated...)


Now for the source code (shortest first):


PERL (yes, it *can* be this short!):
---------------------------------------------------------------------------
while(<>) { for (/[^ \t\n\r]+/g) { ++$table{$_}; } }
while (($key, $value) = each %table) { printf "%4d\t\%s\n", $value, $key; }
---------------------------------------------------------------------------


ocamllex-file (note: 196613 buckets):
---------------------------------------------------------------------------
{
let hash = Hashtbl.create 101

let hash_add string =
  try
    let v = Hashtbl.find hash string in v:= !v + 1
  with
    Not_found -> Hashtbl.add hash string (ref 1)
}

rule read = parse
  [' ' '\n' '\r' '\t']+ { read lexbuf }
| [^' ' '\n' '\r' '\t']+ { hash_add ( Lexing.lexeme lexbuf ); read lexbuf }
| eof { Hashtbl.iter ( fun k v -> Printf.printf "%4d\t%s\n" !v k ) hash }

{ let _ = read_rec ( Lexing.from_channel stdin ) 0 }
---------------------------------------------------------------------------


flex and C++ with hash (note: 196613 buckets):
Don't forget to use option "-+" to get C++-code!
---------------------------------------------------------------------------
%option noyywrap

  #include <stdio.h>
  #include <string.h>
  #include <hash_map>

  struct eqstr
  {
    bool operator()(const char* s1, const char* s2) const
    {
      return strcmp(s1, s2) == 0;
    }
  };

  typedef hash_map<const char*, int, hash<const char *>, eqstr > mymap;
  mymap vec(196613);

%%

[^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext);
              ++vec[foo]; }
[ \t\n\r]+  ;

%%

int main()
{
  yyFlexLexer MyScanner;
  MyScanner.yylex();
  for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p)
    printf ("%4d\t%s\n",  p->second, p->first);
}
---------------------------------------------------------------------------


flex and C++ with map:
Don't forget to use option "-+" to get C++-code!
---------------------------------------------------------------------------
%option noyywrap

  #include <stdio.h>
  #include <string>
  #include <map>

  struct less_str
  {
    bool operator()(const char* s1, const char* s2) const
    {
      return strcmp(s1, s2) < 0;
    }
  };

  typedef map<const char*, int, less_str> mymap;
  mymap vec;

%%

[^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext);
              ++vec[foo]; }
[ \t\n\r]+  ;

%%

int main()
{
  yyFlexLexer MyScanner;
  MyScanner.yylex();
  for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p)
    printf ("%4d\t%s\n",  p->second, p->first);
}
---------------------------------------------------------------------------

So much about C++ and efficiency ;-)

Interesting to see that PERL did perform very well on this test. Its
brevity is unmatched! One shouldn't forget that its hash functions
are highly optimized, its regular expression matching capabilities,
as well. Though, it is definitely not a language for implementing data
structures, as we have seen in Albert's example.

I think that this test is really significant. I cannot imagine how to
make C++ so much faster that it approaches OCAML - without low-level
coding hash tables and pattern matching.

This test showed me, what a nice life I have since I use OCAML instead of
C++ (not only in terms of efficiency...)

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl