From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA21970 for caml-redistribution; Thu, 21 Oct 1999 19:10:47 +0200 (MET DST)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA13398 for <caml-list@pauillac.inria.fr>; Thu, 21 Oct 1999 14:05:03 +0200 (MET DST)
Received: from mail.nap.com.ar (mail-in.nap.com.ar [200.49.40.90])
	by nez-perce.inria.fr (8.8.7/8.8.7) with SMTP id OAA27992
	for <caml-list@inria.fr>; Thu, 21 Oct 1999 14:04:51 +0200 (MET DST)
Received: from [200.41.180.74] (HELO k-bell.com) by mail.nap.com.ar (Stalker SMTP Server 1.8b3) with ESMTP id S.0003814090; Thu, 21 Oct 1999 09:04:43 -0300
Message-ID: <380F0157.CDBBAD7D@k-bell.com>
Date: Thu, 21 Oct 1999 09:05:00 -0300
From: =?iso-8859-1?Q?Mat=EDas?= Giovannini <matias@k-bell.com>
Reply-To: matias@k-bell.com
Organization: Script S.A.
X-Mailer: Mozilla 4.7 (Macintosh; I; PPC)
X-Accept-Language: en,es-AR,es
MIME-Version: 1.0
To: caml-list@inria.fr
CC: Gerd.Stolpmann@darmstadt.netsurf.de, skaller <skaller@maxtal.com.au>
Subject: Re: localization, internationalization and Caml
References: <380CB30E.56D1A8A2@maxtal.com.au> <99102100543400.15513@ice>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: weis

Gerd Stolpmann wrote:
> 
> On Tue, 19 Oct 1999, John Skaller wrote:
> >Gerd Stolpmann wrote:
> >> The enlarged character sets become more and more important, and it is only a
> >> matter of time until every piece of software which wants to be taken seriously
> >> can process them, even a dumb terminal or simple text editor. So you will be
> >> able to put accented characters into your comments, and you will see them as
> >> such even if you 'cat' the program text to the terminal or printer; this will
> >> work everywhere...
> >
> >       Yes. This time is not here yet, but it will come soon that
> >international support is mandatory for all large software purchases
> >by governments and large corporations.
> 
> I do not believe that this will be the driving force because the current
> solutions exist, and it is VERY expensive to replace them. It is even cheaper
> to replace a language than a character set/encoding. Looks like another Year
> 2000 but without deadline.

I still don't understand the point of this discussion. As a MacOS
programmer of many years, I tend to view localization and
internationalization as tasks best performed by the operating system, or
at least by pluggable modules. This discussion of patching l12n and i18n
functions *into* OCaml is, to me at least, losing direction.

OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll
agree that my view is chauvinistic (and selfish, perhaps: I already have
"俊衢眢鷘窳赏于苎" for writing in Spanish, why should I ask for more?),
I see no restriction in that (well, If I were Chinese, or Egiptian, I
would see things differently). What's more, the whole syntactic
apparatus of a programming language *assumes* a Latin setting, where
things make sense when read from left to right, from top to bottom; and
where punctuation is what we're used to. Programming languages suited
for a Han, or Arab, or even a Hebrew audience would have to be rethinked
from the grounds up.

On the other hand, OCaml provides a String type that *can be* seen as a
variable-length sequence of uninterpreted bytes. We have uninterpreted
bytes! It's all we need to build whatever I18NString type we may need.
What is missing is *library* facilities to abstract that view into a
full-fledged i18n machinery. Of course, there's a problem with the
manipulation of 32-bit integer values, but if used with care, the Nat
datatype could serve perfectly well as the underlying, low-level datatype.

Which makes me think, John, you already have variable-length int arrays.
Nat's are as unsafe as they get :-)

Regards,
Mat韆s.

-- 
I got your message. I couldn't read it. It was a cryptogram.
-- Laurie Anderson