5. Strings: pushing unicode throughout a general purpose language is a
mistake, IMHO. This is why languages like Java and C# are so slow.
Unicode by itself, when wider-than-byte encodings are used, adds "zero"
runtime overhead; the only overhead is storage (2 or 4 bytes per
character).
You cannot degrade memory consumption without also degrading performance.
Moreover, there are hidden costs such as the added complexity in a lexer
which potentially has 256x larger dispatch tables or an extra indirection
for every byte read.
Okay, I was going to let this slide, but it kept resurfacing and annoying me.
Is there any empirical support for the assertion that Java and C# are slow because of *unicode*? Of
all things, *unicode*? The fact that they're bytecod languages isn't a bigger hit? At least with
the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
available to ridiculous levels so that the full GC barely ever has to fire. The the nigh-universal
optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
too. And this is to say nothing of user-space problems like the explosion of nontrivial types
associated with the object-driven style. With all that going on, you're blaming their *Unicode
support* for why they're slow? "This is why languages like Java and C# are so slow." Really? Got
evidence for that?
~~ Robert.
_