Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Robert Fischer <robert@fischerventure.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Why OCaml sucks
Date: Tue, 13 May 2008 09:13:27 -0500	[thread overview]
Message-ID: <4829A207.2030601@fischerventure.com> (raw)
In-Reply-To: <48299F46.5030906@janestcapital.com>

> The problem, as I understand it, is in writting parsers.  Your standard
> finite automata based regular expression library or lexical analyzer is
> based, at it's heart, on a table lookup- you have a 2D array, whose size
> is the number of input characters times the number of states.  For ASCII
> input, the number of possible input characters is small- 256 at most. 
> 256 input characters times hundreds of states isn't that big of a table-
> we're looking at sizes in 10's of K- easily handlable even in the bad
> old days of 64K segments.  Even going to UTF-16 ups the number of input
> characters from 256 to 65,536- and now a moderately large state machine
> (hundreds of states) weighs in at tens of megabytes of table space. 
> And, of course, if you try to handle the entire 31-bit full unicode
> point space, welcome to really large tables :-).
> 
> The solution, I think, is to change the implementation of your finite
> automata to use some data structure smarter than a flat 2D array, but
> that's me.
> 
Yes.  It is certainly possible to write slow code to solve this problem.

A slightly more involved analysis is probably in order, so let's ask Wikipedia for some more light.

http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design

As Kuba pointed out, the high bit is 0 on any ASCII characters, and the significant bits of a
multi-byte sequence determine the length of the sequence.  There are also a few large classes of bit
sequences which are simply not allowed.  Now, I'm not an expert in writing parsers, but these
qualities certainly sound like nice optimization hooks.

Getting back to the original question, though -- is there any evidence that Java/C# are slow because
of unicode support, and not because of other aspects of the languages?  Because that assertion seems
flat-out bogus to me.

~~ Robert.


  reply	other threads:[~2008-05-13 14:13 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-09  0:39 Jon Harrop
2008-05-09  1:11 ` [Caml-list] " Matthew William Cox
2008-05-09  5:10   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09  4:45 ` [Caml-list] Re: Why OCaml sucks Arthur Chan
2008-05-09  5:09   ` Jon Harrop
2008-05-09 11:12     ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 11:58       ` Gabriel Kerneis
2008-05-09 12:10         ` Concurrency [was Re: [Caml-list] Re: Why OCaml rocks] Robert Fischer
2008-05-09 12:41         ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 12:49         ` David Teller
2008-05-09 18:10       ` Jon Harrop
2008-05-09 20:40         ` Gerd Stolpmann
2008-05-09 20:55           ` Berke Durak
2008-05-10 10:56             ` Gerd Stolpmann
2008-05-09 21:00           ` Till Varoquaux
2008-05-09 21:13             ` Berke Durak
2008-05-09 22:26               ` Richard Jones
2008-05-09 23:01                 ` Berke Durak
2008-05-10  7:52                   ` Richard Jones
2008-05-10  8:24                     ` Berke Durak
2008-05-10  8:51                       ` Richard Jones
2008-05-13  3:47           ` Jon Harrop
2008-05-09 22:25         ` David Teller
2008-05-09 22:57           ` Vincent Hanquez
2008-05-10 19:59           ` Jon Harrop
2008-05-10 21:39             ` Charles Forsyth
2008-05-11  3:58               ` Jon Harrop
2008-05-11  9:41                 ` Charles Forsyth
2008-05-12 13:22             ` Richard Jones
2008-05-12 18:07               ` Jon Harrop
2008-05-12 20:05                 ` Arthur Chan
2008-05-13  0:42               ` Gerd Stolpmann
2008-05-13  1:19                 ` Jon Harrop
2008-05-13  2:03                   ` Gerd Stolpmann
2008-05-13  3:13                     ` Jon Harrop
2008-05-12 20:33             ` Arthur Chan
2008-05-12 21:22               ` Till Varoquaux
2008-05-09 13:00     ` [Caml-list] Re: Why OCaml sucks Ulf Wiger (TN/EAB)
2008-05-09 17:46       ` Jon Harrop
2008-05-09 18:17         ` Ulf Wiger (TN/EAB)
2008-05-10  1:29           ` Jon Harrop
2008-05-10 14:51             ` [Caml-list] Re: Why OCaml **cks Ulf Wiger (TN/EAB)
2008-05-10 18:19               ` Jon Harrop
2008-05-10 21:58                 ` Ulf Wiger (TN/EAB)
2008-05-10 18:39               ` Mike Lin
2008-05-12 13:31           ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 18:18             ` Jon Harrop
2008-05-12 13:13   ` Kuba Ober
2008-05-12 19:32     ` Arthur Chan
2008-05-09  6:31 ` Tom Primožič
2008-05-09  6:46 ` Elliott Oti
2008-05-09  7:53   ` Till Varoquaux
2008-05-09  7:45 ` Richard Jones
2008-05-09  8:10   ` Jon Harrop
2008-05-09  9:31     ` Richard Jones
2008-05-09  7:58 ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 10:29   ` Jon Harrop
2008-05-09 13:08     ` David Teller
2008-05-09 15:38     ` Jeff Polakow
2008-05-09 18:09       ` Jon Harrop
2008-05-09 20:36         ` Berke Durak
2008-05-09 22:34         ` Richard Jones
2008-05-14 13:44           ` Kuba Ober
2008-05-09  8:29 ` constructive criticism about Ocaml Ulf Wiger (TN/EAB)
2008-05-09  9:45 ` [Caml-list] Re: Why OCaml sucks Vincent Hanquez
2008-05-09 10:23   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09 22:01     ` Vincent Hanquez
2008-05-09 22:23       ` David Teller
2008-05-10  8:36       ` Christophe TROESTLER
2008-05-10  9:18         ` Vincent Hanquez
2008-05-09 11:37   ` [Caml-list] Re: Why OCaml sucks Ralph Douglass
2008-05-09 13:02     ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 12:33 ` not all functional languages lack parallelism Ulf Wiger (TN/EAB)
2008-05-09 18:10   ` Jon Harrop
2008-05-09 20:26     ` Ulf Wiger (TN/EAB)
2008-05-12 12:54 ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 14:16   ` Jon Harrop
2008-05-13 13:33     ` Kuba Ober
2008-05-13 13:49       ` Robert Fischer
2008-05-13 14:01         ` Brian Hurt
2008-05-13 14:13           ` Robert Fischer [this message]
2008-05-13 15:18             ` Berke Durak
2008-05-14  4:40             ` Kuba Ober
2008-05-13 14:25           ` Gerd Stolpmann
2008-05-14  4:29           ` Kuba Ober
2008-05-12 13:01 ` Kuba Ober
2008-05-12 19:18   ` Arthur Chan
2008-05-12 19:41     ` Karl Zilles
2008-05-13 13:17     ` Kuba Ober

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4829A207.2030601@fischerventure.com \
    --to=robert@fischerventure.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox