Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Brian Hurt <bhurt@janestcapital.com>
To: Robert Fischer <robert@fischerventure.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Why OCaml sucks
Date: Tue, 13 May 2008 10:01:42 -0400	[thread overview]
Message-ID: <48299F46.5030906@janestcapital.com> (raw)
In-Reply-To: <48299C67.1010905@fischerventure.com>

[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]

Robert Fischer wrote:

>>>>>5. Strings: pushing unicode throughout a general purpose language is a
>>>>>mistake, IMHO. This is why languages like Java and C# are so slow.
>>>>>          
>>>>>
>>>>Unicode by itself, when wider-than-byte encodings are used, adds "zero"
>>>>runtime overhead; the only overhead is storage (2 or 4 bytes per
>>>>character).
>>>>        
>>>>
>>>You cannot degrade memory consumption without also degrading performance.
>>>Moreover, there are hidden costs such as the added complexity in a lexer
>>>which potentially has 256x larger dispatch tables or an extra indirection
>>>for every byte read.
>>>      
>>>
>
>Okay, I was going to let this slide, but it kept resurfacing and annoying me.
>
>Is there any empirical support for the assertion that Java and C# are slow because of *unicode*?  Of
>all things, *unicode*?  The fact that they're bytecod languages isn't a bigger hit?  At least with
>the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
>increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
>available to ridiculous levels so that the full GC barely ever has to fire.  The the nigh-universal
>optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
>too.  And this is to say nothing of user-space problems like the explosion of nontrivial types
>associated with the object-driven style.  With all that going on, you're blaming their *Unicode
>support* for why they're slow?  "This is why languages like Java and C# are so slow."  Really?  Got
>evidence for that?
>
>~~ Robert.
>
>_
>

The problem, as I understand it, is in writting parsers.  Your standard 
finite automata based regular expression library or lexical analyzer is 
based, at it's heart, on a table lookup- you have a 2D array, whose size 
is the number of input characters times the number of states.  For ASCII 
input, the number of possible input characters is small- 256 at most.  
256 input characters times hundreds of states isn't that big of a table- 
we're looking at sizes in 10's of K- easily handlable even in the bad 
old days of 64K segments.  Even going to UTF-16 ups the number of input 
characters from 256 to 65,536- and now a moderately large state machine 
(hundreds of states) weighs in at tens of megabytes of table space.  
And, of course, if you try to handle the entire 31-bit full unicode 
point space, welcome to really large tables :-).

The solution, I think, is to change the implementation of your finite 
automata to use some data structure smarter than a flat 2D array, but 
that's me.

Brian


[-- Attachment #2: Type: text/html, Size: 3260 bytes --]

  reply	other threads:[~2008-05-13 14:01 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-09  0:39 Jon Harrop
2008-05-09  1:11 ` [Caml-list] " Matthew William Cox
2008-05-09  5:10   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09  4:45 ` [Caml-list] Re: Why OCaml sucks Arthur Chan
2008-05-09  5:09   ` Jon Harrop
2008-05-09 11:12     ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 11:58       ` Gabriel Kerneis
2008-05-09 12:10         ` Concurrency [was Re: [Caml-list] Re: Why OCaml rocks] Robert Fischer
2008-05-09 12:41         ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 12:49         ` David Teller
2008-05-09 18:10       ` Jon Harrop
2008-05-09 20:40         ` Gerd Stolpmann
2008-05-09 20:55           ` Berke Durak
2008-05-10 10:56             ` Gerd Stolpmann
2008-05-09 21:00           ` Till Varoquaux
2008-05-09 21:13             ` Berke Durak
2008-05-09 22:26               ` Richard Jones
2008-05-09 23:01                 ` Berke Durak
2008-05-10  7:52                   ` Richard Jones
2008-05-10  8:24                     ` Berke Durak
2008-05-10  8:51                       ` Richard Jones
2008-05-13  3:47           ` Jon Harrop
2008-05-09 22:25         ` David Teller
2008-05-09 22:57           ` Vincent Hanquez
2008-05-10 19:59           ` Jon Harrop
2008-05-10 21:39             ` Charles Forsyth
2008-05-11  3:58               ` Jon Harrop
2008-05-11  9:41                 ` Charles Forsyth
2008-05-12 13:22             ` Richard Jones
2008-05-12 18:07               ` Jon Harrop
2008-05-12 20:05                 ` Arthur Chan
2008-05-13  0:42               ` Gerd Stolpmann
2008-05-13  1:19                 ` Jon Harrop
2008-05-13  2:03                   ` Gerd Stolpmann
2008-05-13  3:13                     ` Jon Harrop
2008-05-12 20:33             ` Arthur Chan
2008-05-12 21:22               ` Till Varoquaux
2008-05-09 13:00     ` [Caml-list] Re: Why OCaml sucks Ulf Wiger (TN/EAB)
2008-05-09 17:46       ` Jon Harrop
2008-05-09 18:17         ` Ulf Wiger (TN/EAB)
2008-05-10  1:29           ` Jon Harrop
2008-05-10 14:51             ` [Caml-list] Re: Why OCaml **cks Ulf Wiger (TN/EAB)
2008-05-10 18:19               ` Jon Harrop
2008-05-10 21:58                 ` Ulf Wiger (TN/EAB)
2008-05-10 18:39               ` Mike Lin
2008-05-12 13:31           ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 18:18             ` Jon Harrop
2008-05-12 13:13   ` Kuba Ober
2008-05-12 19:32     ` Arthur Chan
2008-05-09  6:31 ` Tom Primožič
2008-05-09  6:46 ` Elliott Oti
2008-05-09  7:53   ` Till Varoquaux
2008-05-09  7:45 ` Richard Jones
2008-05-09  8:10   ` Jon Harrop
2008-05-09  9:31     ` Richard Jones
2008-05-09  7:58 ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 10:29   ` Jon Harrop
2008-05-09 13:08     ` David Teller
2008-05-09 15:38     ` Jeff Polakow
2008-05-09 18:09       ` Jon Harrop
2008-05-09 20:36         ` Berke Durak
2008-05-09 22:34         ` Richard Jones
2008-05-14 13:44           ` Kuba Ober
2008-05-09  8:29 ` constructive criticism about Ocaml Ulf Wiger (TN/EAB)
2008-05-09  9:45 ` [Caml-list] Re: Why OCaml sucks Vincent Hanquez
2008-05-09 10:23   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09 22:01     ` Vincent Hanquez
2008-05-09 22:23       ` David Teller
2008-05-10  8:36       ` Christophe TROESTLER
2008-05-10  9:18         ` Vincent Hanquez
2008-05-09 11:37   ` [Caml-list] Re: Why OCaml sucks Ralph Douglass
2008-05-09 13:02     ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 12:33 ` not all functional languages lack parallelism Ulf Wiger (TN/EAB)
2008-05-09 18:10   ` Jon Harrop
2008-05-09 20:26     ` Ulf Wiger (TN/EAB)
2008-05-12 12:54 ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 14:16   ` Jon Harrop
2008-05-13 13:33     ` Kuba Ober
2008-05-13 13:49       ` Robert Fischer
2008-05-13 14:01         ` Brian Hurt [this message]
2008-05-13 14:13           ` Robert Fischer
2008-05-13 15:18             ` Berke Durak
2008-05-14  4:40             ` Kuba Ober
2008-05-13 14:25           ` Gerd Stolpmann
2008-05-14  4:29           ` Kuba Ober
2008-05-12 13:01 ` Kuba Ober
2008-05-12 19:18   ` Arthur Chan
2008-05-12 19:41     ` Karl Zilles
2008-05-13 13:17     ` Kuba Ober

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48299F46.5030906@janestcapital.com \
    --to=bhurt@janestcapital.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=robert@fischerventure.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox