From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <robert@fischerventure.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id A9554BBCA
	for <caml-list@yquem.inria.fr>; Tue, 13 May 2008 15:49:30 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AiwCAKM5KUjQccgFf2dsb2JhbACSEAEBCwUCBgcRmkU
X-IronPort-AV: E=Sophos;i="4.27,479,1204498800"; 
   d="scan'208";a="10666553"
Received: from lax-green-bigip-5.dreamhost.com (HELO looneymail-a3.g.dreamhost.com) ([208.113.200.5])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 13 May 2008 15:49:30 +0200
Received: from Robert-Fischers-Smokejumping-MacBook-Pro.local (unknown [64.241.37.140])
	by looneymail-a3.g.dreamhost.com (Postfix) with ESMTP id 2D6DE27E1F
	for <caml-list@yquem.inria.fr>; Tue, 13 May 2008 06:49:28 -0700 (PDT)
Message-ID: <48299C67.1010905@fischerventure.com>
Date: Tue, 13 May 2008 08:49:27 -0500
From: Robert Fischer <robert@fischerventure.com>
User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421)
MIME-Version: 1.0
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Why OCaml sucks
References: <200805090139.54870.jon@ffconsultancy.com>	<200805120854.45499.ober.14@osu.edu>	<200805121516.13983.jon@ffconsultancy.com> <200805130933.13952.ober.14@osu.edu>
In-Reply-To: <200805130933.13952.ober.14@osu.edu>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Spam: no; 0.00; ocaml:01 encodings:01 runtime:01 lexer:01 indirection:01 byte:01 assertion:01 mutable:01 imho:01 caml-list:01 strings:01 data:02 bytes:03 dispatch:03 dispatch:03 

>>>> 5. Strings: pushing unicode throughout a general purpose language is a
>>>> mistake, IMHO. This is why languages like Java and C# are so slow.
>>> Unicode by itself, when wider-than-byte encodings are used, adds "zero"
>>> runtime overhead; the only overhead is storage (2 or 4 bytes per
>>> character).
>> You cannot degrade memory consumption without also degrading performance.
>> Moreover, there are hidden costs such as the added complexity in a lexer
>> which potentially has 256x larger dispatch tables or an extra indirection
>> for every byte read.
> 

Okay, I was going to let this slide, but it kept resurfacing and annoying me.

Is there any empirical support for the assertion that Java and C# are slow because of *unicode*?  Of
all things, *unicode*?  The fact that they're bytecod languages isn't a bigger hit?  At least with
the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
available to ridiculous levels so that the full GC barely ever has to fire.  The the nigh-universal
optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
too.  And this is to say nothing of user-space problems like the explosion of nontrivial types
associated with the object-driven style.  With all that going on, you're blaming their *Unicode
support* for why they're slow?  "This is why languages like Java and C# are so slow."  Really?  Got
evidence for that?

~~ Robert.