From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: Jens Wagner <jwagner@handshake.de>, caml-list@inria.fr
Subject: Re: Unicode-Support in Ocaml?
Date: Fri, 12 Mar 1999 16:26:10 +0100 [thread overview]
Message-ID: <19990312162610.21453@pauillac.inria.fr> (raw)
In-Reply-To: <36E6C8E6.45008AEF@handshake.de>; from Jens Wagner on Wed, Mar 10, 1999 at 08:32:54PM +0100
> Are there any plans of supporting Unicode in objective caml?
> A support for Unicode characters is very important for writing
> Web-Applications and would be a great improvement for other applications
> too (IMHO)!
Several applications need Unicode support, indeed. For instance, my
CamlIDL COM interface will eventually need Unicode strings, because
many Windows API use them. Here are my thoughts about it.
Switching everything to use Unicode strings is too much work, so we'd
need a separate type "wstring" of wide-character strings in addition
to the standard "string", with conversion functions between the two.
Now, I can see two ways to go about it:
- Build on the ISO C "wide character" and "wide string" abstractions.
Pros:
* all recent C libraries support those, and provide powerful
functions for conversion between wide strings and multi-byte strings
(e.g. Unicode <-> UTF8), for comparing wide strings, etc;
* the use of Unicode and UTF8 is not hardwired, most systems provide
32-bit characters and support several multi-byte encodings.
I've been told this is better for e.g. Japanese, which has
several popular encodings that are not Unicode-based.
Cons:
* the use of Unicode and UTF8 is not hardwired, so different systems
represent wide strings differently (e.g. as Unicode under Windows
and 32-bit chars under Unix), precluding binary exchange of
wide strings via output_value/input_value;
* the various multi-byte encodings provided differ from Unix vendor
to Unix vendor (different encodings, different names for the same
encoding, ...);
* while in theory Unicode and UTF8 are just one encoding among others,
I don't know of any Unix system that actually provides "locale"
files describing Unicode/UTF8: Linux with glibc 2.0 doesn't,
Digital Unix 4 doesn't seem to either, and ditto for Solaris 2.5.
- Decide that Caml will use Unicode on every platform (like Java does).
Pros:
* binary compatibility between platforms;
* proper Unicode support is guaranteed.
Cons:
* lots of code to write by hand (Unicode/UTF8 conversion,
comparisons, upper/lowercasing, etc);
* the Japanese might not be happy about it.
> Id could also be interesting having a standard signature for the DOM (
> http://www.w3.org/ ) in ocaml.
> Is there any organization defining signatures of existing API's? Of
> course one could use an IDL->stub converter, but there is much more
> possible using signatures than IDL-Files.
In the great tradition of free software, whomever writes the code (to
interface to an existing API) gets to choose the signature. However,
you can ask for feedback or suggestions on this list.
- Xavier Leroy
prev parent reply other threads:[~1999-03-12 17:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-03-10 19:32 Jens Wagner
1999-03-11 12:50 ` Pierpaolo Bernardi
1999-03-12 15:26 ` Xavier Leroy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19990312162610.21453@pauillac.inria.fr \
--to=xavier.leroy@inria.fr \
--cc=caml-list@inria.fr \
--cc=jwagner@handshake.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox