From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: caml-list@inria.fr
Cc: shawnw@speakeasy.org
Subject: Re: [Caml-list] Ocaml interface to ctype.h functions
Date: Tue, 5 Jun 2001 18:29:09 +0200 [thread overview]
Message-ID: <20010605182909.A16268@pauillac.inria.fr> (raw)
In-Reply-To: <20010601232433.A22189@speakeasy.org>; from shawnw@speakeasy.org on Fri, Jun 01, 2001 at 11:24:33PM -0700
> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them.
It would make sense to have classification functions in the Char
module. The main issue is: what is a letter?, or: how to deal with
character sets.
If only one, fixed character set is supported (e.g. US-ASCII or
Latin-1), it's truly easy, but will not satisfy everyone. OCaml has
already been criticized for supporting ISO Latin-1 accented letters in
identifiers! (Look at the caml-list archives if you don't believe me.)
Building on the C functions isalpha(), etc, is a bit of a cop-out,
because then we're dependent on what these functions actually do on a
variety of Unix, Windows and Macintosh systems. In particular, we
become dependent on the ISO C internationalization framework ("locales"),
which I think is a mess because it relies too much on a global state
(the current locale).
To give an example of the kind of problems I fear, just doing
setlocale(LC_ALL, "fr_FR") in an OCaml program causes
float_of_string "3.14" to return 0.0. Guess why? float_of_string
relies on the C function atof(), which is internationalized, and
doesn't recognize "." as a decimal point -- French uses a "," instead...
Finally, there's the Unicode approach. Letters, etc, are well defined
without reference to a "locale" or whatever piece of state. But then
we've just shifted the problem to a more general one: retrofitting
Unicode into OCaml, which again has been the subject of lively
discussions on this mailing list :-)
> If it's
> just a matter of waiting for someone to do it, I'm willing to volunteer, as
> I'd probably be doing it anyways on my own.
It's mostly a matter of knowing what we want these classification
functions to do. Meanwhile, it might be easier to define your own
isalpha, etc, predicates; at least you get to choose the encoding!
Besides, it's really easy using pattern-matching, e.g. for ASCII:
let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false
- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr
next prev parent reply other threads:[~2001-06-05 16:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-06-02 6:24 Shawn Wagner
2001-06-02 13:25 ` Michael Hicks
2001-06-02 21:04 ` Shawn Wagner
[not found] ` <shawnw@speakeasy.org>
2001-06-05 7:35 ` Luc MAZARDO
2001-06-05 13:59 ` Shawn Wagner
2001-06-05 16:29 ` Xavier Leroy [this message]
2001-06-05 16:44 ` Sylvain Kerjean
2001-06-05 18:17 ` Chris Hecker
2001-06-11 16:00 ` Shawn Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010605182909.A16268@pauillac.inria.fr \
--to=xavier.leroy@inria.fr \
--cc=caml-list@inria.fr \
--cc=shawnw@speakeasy.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox