From: Martin Jambon <martin_jambon@emailuser.net>
To: Caml List <caml-list@inria.fr>
Cc: Martin Jambon <martin_jambon@emailuser.net>
Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
Date: Thu, 25 May 2006 12:54:17 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.63.0605251207150.7706@munge> (raw)
In-Reply-To: <4475E9E0.2030009@cs.caltech.edu>
On Thu, 25 May 2006, Aleksey Nogin wrote:
> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>> OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is
> simply missing in OCaml. The usage I am very interested in is essentially
> using strings as "printable tokens". In other words, a data type that is easy
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a
>> waste of energy. The problem is that freezing or unfreezing a string safely
>> involves a copy of the whole string. And obviously it's not possible to
>> handle only immutable strings since somehow you have to create them, and
>> unlike record fields, they won't be set in one operation but in n
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.
Isn't it a bit limited? What if I want other functions?
But if it satisfies you, you can do the syntax part with an unsafe_freeze
function and a bit of camlp4. The rest is just plain old OCaml.
> Of course, it might be the case that the standard library might have to use
> some sort of "unsafe" operations that would "inappropriately" mutate the
> newly created immutable string buffer, but this is IMHO no different than how
> the unsafe operations are already used in standard library for arrays and
> strings.
I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of
your suggestions. This strongly constrasts with mistakes in array/string
indices which happen all the time.
>> So I'd really love to see actual examples where using immutable strings
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the
>> user of a library, then it is trivial using module signatures and
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes
> that a data structure is immutable and tries really hard to preserve sharing
> (including using functions like a sharing-preserving version of map (*),
> etc). In such a setting, these extra copies can potentially have a
> devastating effect on memory usage, cache performance, etc. And this
> situation is exactly what we have in our MetaPRL project - there we have
> resorted to simply using strings and pretending they are immutable, but this
> is clearly suboptimal.
Yes, so how do you avoid copies without using the "unsafe" conversions all
over the place?
> ----
> (*)
> let rec smap f = function
> [] -> []
> | (hd :: tl) as l ->
> let hd' = f hd in
> let tl' = smap f tl in
> if hd == hd' && tl == tl' then l else hd' :: tl'
In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it
doesn't break sharing:
let freeze s =
let s' = make_constant s (* using a copy! *) in
if s' is in the table then return the element from the table
else add s' and return s'
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
next prev parent reply other threads:[~2006-05-25 19:54 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
2006-05-15 21:27 ` akalin
2006-05-15 22:51 ` Oliver Bandel
2006-05-16 0:48 ` Brian Hurt
2006-05-16 9:57 ` Damien Doligez
2006-05-16 15:10 ` Markus Mottl
2006-05-16 8:01 ` Xavier Leroy
2006-05-16 8:20 ` Nicolas Cannasse
2006-05-19 17:13 ` Xavier Leroy
2006-05-19 5:57 ` Frederick Akalin
2006-05-19 6:21 ` Oliver Bandel
2006-05-19 12:15 ` Jon Harrop
2006-05-19 19:36 ` akalin
2006-05-19 20:17 ` Oliver Bandel
2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 20:08 ` Oliver Bandel
2006-05-19 21:26 ` Jon Harrop
2006-05-20 1:06 ` Brian Hurt
2006-05-20 18:32 ` brogoff
2006-05-20 21:29 ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-22 22:09 ` Aleksey Nogin
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] " Oliver Bandel
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
2006-05-25 17:31 ` Aleksey Nogin
2006-05-25 19:54 ` Martin Jambon [this message]
2006-05-25 11:18 ` Brian Hurt
2006-05-25 17:34 ` Aleksey Nogin
2006-05-25 18:44 ` Tom
2006-05-25 23:00 ` Jon Harrop
2006-05-25 23:15 ` Martin Jambon
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20 1:17 ` Frederick Akalin
2006-05-20 1:52 ` Brian Hurt
2006-05-20 9:08 ` Jozef Kosoru
2006-05-20 10:12 ` skaller
2006-05-20 11:06 ` Jozef Kosoru
2006-05-20 12:02 ` skaller
2006-05-20 21:42 ` Oliver Bandel
2006-05-21 1:24 ` skaller
2006-05-21 14:10 ` Oliver Bandel
[not found] ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
2006-05-20 19:52 ` Jozef Kosoru
2006-05-20 21:45 ` Oliver Bandel
2006-05-21 9:26 ` Richard Jones
[not found] ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
2006-05-22 10:40 ` Richard Jones
2006-05-20 10:51 ` Jozef Kosoru
2006-05-20 14:22 ` Brian Hurt
2006-05-20 18:41 ` j h woodyatt
2006-05-20 19:37 ` Jon Harrop
2006-05-20 20:47 ` Jozef Kosoru
2006-05-26 18:34 ` Ken Rose
2006-05-20 22:07 ` Oliver Bandel
2006-05-20 15:15 ` Don Syme
2006-05-20 22:15 ` Oliver Bandel
2006-05-21 1:25 ` skaller
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-05-29 20:52 Harrison, John R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.63.0605251207150.7706@munge \
--to=martin_jambon@emailuser.net \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox