From: Berke Durak <berke.durak@exalead.com>
To: Robert Fischer <robert@fischerventure.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Why OCaml sucks
Date: Tue, 13 May 2008 17:18:00 +0200 [thread overview]
Message-ID: <4829B128.2020708@exalead.com> (raw)
In-Reply-To: <4829A207.2030601@fischerventure.com>
Robert Fischer wrote:
> Getting back to the original question, though -- is there any evidence that Java/C# are slow because
> of unicode support, and not because of other aspects of the languages? Because that assertion seems
> flat-out bogus to me.
I do not think the JVM is especially slow in practice. However, one potential source of
slowness could be, in some particular cases, conversions to and from the internal short array-based
string representation to UTF8 when using native code. Similarly, Java strings being immutable,
in-place modification of strings is not possible from native code, so a lot of bindings to C libraries
end up duplicating strings a lot (see e.g. PCRE).
This is why the NIO API exposing mutable and/or externally allocated buffers was introduced in the JVM,
but it remains hard to use.
However it is true that regexes on UTF8 can be quite slow. Compare (on Linux):
/udir/durak> dd if=/dev/urandom bs=10M count=1 of=/dev/shm/z
/udir/durak> time LANG=en_US.UTF-8 grep -c "^[a-z]*$" /dev/shm/z
2.31s user 0.01s system 99% cpu 2.320 total
/udir/durak> time LANG=C grep -c "^[a-z]*$" /dev/shm/z
0.04s user 0.01s system 98% cpu 0.048 total
Lesson 1: when lexing, do not read unicode chars one at a time. Pre-process your regular expression
according to your input encoding, instead.
That being said, I think strings should be represented as they are today, and that the core
Ocaml libraries do not have much business dealing with UTF8. We seldom need letter-indexed
random access to strings.
However, the time is ripe for throwing out old 8-bit charsets such as ISO-8859-x (a.k.a Latin-y)
and whatnot. This simplifies considerably lesson 1: it's either ASCII or UTF8 Unicode. I think
the Ocaml lexer should simply treat any byte with its high bit set as a lowercase letter.
--
Berke DURAK
next prev parent reply other threads:[~2008-05-13 15:18 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-09 0:39 Jon Harrop
2008-05-09 1:11 ` [Caml-list] " Matthew William Cox
2008-05-09 5:10 ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09 4:45 ` [Caml-list] Re: Why OCaml sucks Arthur Chan
2008-05-09 5:09 ` Jon Harrop
2008-05-09 11:12 ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 11:58 ` Gabriel Kerneis
2008-05-09 12:10 ` Concurrency [was Re: [Caml-list] Re: Why OCaml rocks] Robert Fischer
2008-05-09 12:41 ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 12:49 ` David Teller
2008-05-09 18:10 ` Jon Harrop
2008-05-09 20:40 ` Gerd Stolpmann
2008-05-09 20:55 ` Berke Durak
2008-05-10 10:56 ` Gerd Stolpmann
2008-05-09 21:00 ` Till Varoquaux
2008-05-09 21:13 ` Berke Durak
2008-05-09 22:26 ` Richard Jones
2008-05-09 23:01 ` Berke Durak
2008-05-10 7:52 ` Richard Jones
2008-05-10 8:24 ` Berke Durak
2008-05-10 8:51 ` Richard Jones
2008-05-13 3:47 ` Jon Harrop
2008-05-09 22:25 ` David Teller
2008-05-09 22:57 ` Vincent Hanquez
2008-05-10 19:59 ` Jon Harrop
2008-05-10 21:39 ` Charles Forsyth
2008-05-11 3:58 ` Jon Harrop
2008-05-11 9:41 ` Charles Forsyth
2008-05-12 13:22 ` Richard Jones
2008-05-12 18:07 ` Jon Harrop
2008-05-12 20:05 ` Arthur Chan
2008-05-13 0:42 ` Gerd Stolpmann
2008-05-13 1:19 ` Jon Harrop
2008-05-13 2:03 ` Gerd Stolpmann
2008-05-13 3:13 ` Jon Harrop
2008-05-12 20:33 ` Arthur Chan
2008-05-12 21:22 ` Till Varoquaux
2008-05-09 13:00 ` [Caml-list] Re: Why OCaml sucks Ulf Wiger (TN/EAB)
2008-05-09 17:46 ` Jon Harrop
2008-05-09 18:17 ` Ulf Wiger (TN/EAB)
2008-05-10 1:29 ` Jon Harrop
2008-05-10 14:51 ` [Caml-list] Re: Why OCaml **cks Ulf Wiger (TN/EAB)
2008-05-10 18:19 ` Jon Harrop
2008-05-10 21:58 ` Ulf Wiger (TN/EAB)
2008-05-10 18:39 ` Mike Lin
2008-05-12 13:31 ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 18:18 ` Jon Harrop
2008-05-12 13:13 ` Kuba Ober
2008-05-12 19:32 ` Arthur Chan
2008-05-09 6:31 ` Tom Primožič
2008-05-09 6:46 ` Elliott Oti
2008-05-09 7:53 ` Till Varoquaux
2008-05-09 7:45 ` Richard Jones
2008-05-09 8:10 ` Jon Harrop
2008-05-09 9:31 ` Richard Jones
2008-05-09 7:58 ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 10:29 ` Jon Harrop
2008-05-09 13:08 ` David Teller
2008-05-09 15:38 ` Jeff Polakow
2008-05-09 18:09 ` Jon Harrop
2008-05-09 20:36 ` Berke Durak
2008-05-09 22:34 ` Richard Jones
2008-05-14 13:44 ` Kuba Ober
2008-05-09 8:29 ` constructive criticism about Ocaml Ulf Wiger (TN/EAB)
2008-05-09 9:45 ` [Caml-list] Re: Why OCaml sucks Vincent Hanquez
2008-05-09 10:23 ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09 22:01 ` Vincent Hanquez
2008-05-09 22:23 ` David Teller
2008-05-10 8:36 ` Christophe TROESTLER
2008-05-10 9:18 ` Vincent Hanquez
2008-05-09 11:37 ` [Caml-list] Re: Why OCaml sucks Ralph Douglass
2008-05-09 13:02 ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 12:33 ` not all functional languages lack parallelism Ulf Wiger (TN/EAB)
2008-05-09 18:10 ` Jon Harrop
2008-05-09 20:26 ` Ulf Wiger (TN/EAB)
2008-05-12 12:54 ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 14:16 ` Jon Harrop
2008-05-13 13:33 ` Kuba Ober
2008-05-13 13:49 ` Robert Fischer
2008-05-13 14:01 ` Brian Hurt
2008-05-13 14:13 ` Robert Fischer
2008-05-13 15:18 ` Berke Durak [this message]
2008-05-14 4:40 ` Kuba Ober
2008-05-13 14:25 ` Gerd Stolpmann
2008-05-14 4:29 ` Kuba Ober
2008-05-12 13:01 ` Kuba Ober
2008-05-12 19:18 ` Arthur Chan
2008-05-12 19:41 ` Karl Zilles
2008-05-13 13:17 ` Kuba Ober
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4829B128.2020708@exalead.com \
--to=berke.durak@exalead.com \
--cc=caml-list@yquem.inria.fr \
--cc=robert@fischerventure.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox