* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-28 23:20 Harrison, John R
2006-05-29 2:36 ` Martin Jambon
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Harrison, John R @ 2006-05-28 23:20 UTC (permalink / raw)
To: Martin Jambon, Caml List; +Cc: Harrison, John R
Hi Martin,
| I disagree: has it ever happened to you to mutate a string by
accident?
The point is not that I will mutate a string by accident. I've never
done
it by accident or by design. The point is that I can't depend on code
that I call, or code that calls mine, not to subsequently modify strings
that are passed as arguments. So if I really need to reliably fix them I
am forced into expensive copy operations.
In practice, the obvious library calls are safe, so like Aleksey, I use
the built-in strings for the sake of convenience and compatibility. But
it's unsatisfactory intellectually. Some of us want to program in a
primarily functional style, yet the implementation of one of the most
basic and useful datatypes is not functional.
| Yes, so how do you avoid copies without using the "unsafe" conversions
all
| over the place?
With immutable strings, you'd never need to do conversions at the module
interfaces. As with any other functional data structure, you only copy
when you want to change part of it.
John.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
@ 2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-06-05 20:54 ` immutable strings Matti Jokinen
2 siblings, 0 replies; 20+ messages in thread
From: Martin Jambon @ 2006-05-29 2:36 UTC (permalink / raw)
To: Harrison, John R; +Cc: Caml List
Hi John,
On Sun, 28 May 2006, Harrison, John R wrote:
> With immutable strings, you'd never need to do conversions at the module
> interfaces. As with any other functional data structure, you only copy
> when you want to change part of it.
OK, but let's be pragmatic: what kind of interface and implementation do
you have in mind?
(and then: isn't it possible to implement in OCaml?)
If anyone is interested:
Before posting I tried a polymorphic (wrt mutability) string type.
It was fun enough, but it doesn't scale very well. I put it there:
http://martin.jambon.free.fr/ocaml.html#gstring
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
@ 2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-06-05 20:54 ` immutable strings Matti Jokinen
2 siblings, 0 replies; 20+ messages in thread
From: Jean-Christophe Filliatre @ 2006-05-31 12:53 UTC (permalink / raw)
To: Harrison, John R; +Cc: Martin Jambon, Caml List
Harrison, John R writes:
> The point is not that I will mutate a string by accident.
I once discovered a bug in the Coq proof assistant that was precisely
due to a string (an identifier) mutated by accident (may be I
shouldn't say it :-) A name was capitalized in-place somewhere in a
piece of code unrelated with the Coq kernel but of course it had
consequences all over the system (including the kernel).
So I'm definitely in favor of immutable strings, for the exact reasons
mentioned by John.
But I think an abstract data type is not really an issue, since one
does little pattern-matching on strings in practice. And having your
own abstract data type for immutable strings has other advantages,
such as the ability to share equal strings (using hash-consing) to
speedup names comparisons. Even printing is not painful provided a
suitable formatter-based printing function and %a.
--
Jean-Christophe Filliâtre (http://www.lri.fr/~filliatr)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: immutable strings
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
@ 2006-06-05 20:54 ` Matti Jokinen
2006-06-07 0:36 ` [Caml-list] " Jacques Garrigue
2 siblings, 1 reply; 20+ messages in thread
From: Matti Jokinen @ 2006-06-05 20:54 UTC (permalink / raw)
To: Caml List
> In practice, the obvious library calls are safe, so like Aleksey, I use
> the built-in strings for the sake of convenience and compatibility. But
> it's unsatisfactory intellectually.
Actually, there are cases of unsafe sharing even in the standard library.
# let x = "X" in
let g = Genlex.make_lexer [x] in
let s = Stream.of_string "X" in
let t = g s in
let _ = Stream.peek t in
x.[0] <- 'Y';
Stream.peek t;;
result:
- : Genlex.token option = Some (Genlex.Kwd "Y")
Format:
# let x = "X" in
let f = Format.make_formatter (output stdout) (fun () -> flush stdout) in
Format.pp_print_string f x;
x.[0] <- 'Y';
Format.pp_print_newline f (); Format.pp_print_flush f ();;
output:
Y
I think this demonstrates that the problem is real: it is too easy to
forget copying.
- Matti Jokien
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings
2006-06-05 20:54 ` immutable strings Matti Jokinen
@ 2006-06-07 0:36 ` Jacques Garrigue
0 siblings, 0 replies; 20+ messages in thread
From: Jacques Garrigue @ 2006-06-07 0:36 UTC (permalink / raw)
To: moj; +Cc: caml-list
From: moj@utu.fi (Matti Jokinen)
> > In practice, the obvious library calls are safe, so like Aleksey, I use
> > the built-in strings for the sake of convenience and compatibility. But
> > it's unsatisfactory intellectually.
>
> Actually, there are cases of unsafe sharing even in the standard library.
>
>
> # let x = "X" in
> let g = Genlex.make_lexer [x] in
> let s = Stream.of_string "X" in
> let t = g s in
> let _ = Stream.peek t in
> x.[0] <- 'Y';
> Stream.peek t;;
>
> result:
>
> - : Genlex.token option = Some (Genlex.Kwd "Y")
[...]
> I think this demonstrates that the problem is real: it is too easy to
> forget copying.
I don't think this is what the original poster meant by "unsafe".
Standard library functions do not mutate strings when this is not
explicitly stated.
If you apply this principle to user behaviour, it means that you
shouldn't mutate a string passed to or from a library function except
when it is explicitly ok.
In practice this usually works well, because the string type is
actually used as two independent types:
* mutable strings for some I/O and buffers
* immutable strings for all other uses
But this still puts a burden on users.
Jacques Garrigue
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-29 20:52 Harrison, John R
0 siblings, 0 replies; 20+ messages in thread
From: Harrison, John R @ 2006-05-29 20:52 UTC (permalink / raw)
To: Martin Jambon, Caml List; +Cc: Harrison, John R
Hi Martin,
| OK, but let's be pragmatic: what kind of interface and implementation
do
| you have in mind?
I did indeed have a very specific example in mind, my theorem prover HOL
Light. I have an OCaml type of typed lambda-terms:
type term =
Var of string * hol_type
| Const of string * hol_type
| Comb of term * term
| Abs of term * term
The type "term" is private, and the abstract type interface only permits
you to construct well-typed terms, via interface functions like "mk_var"
and "mk_comb". For example, the call "mk_comb(s,t)" gives "Comb(s,t)"
provided the types agree, and fails otherwise.
I would like the user to be able to write "mk_var(x,ty)" and the net
result to be just one cons "Var(x,ty)" with "x" and "ty" identical to
the
input arguments. But with mutable strings, it is possible in principle
for the string "x" inside that object to get modified by other code.
Of course, it's a bit artificial, but I would like it to be impossible,
given that the principle of LCF provers is that the user should be able
to use arbitrary programs while having soundness enforced by the ML
type system.
Of course, I can use my own private type of strings, but then I need
to convert every time I use standard library functions, pattern matching
is a bit less convenient, etc.
John.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Array 4 MB size limit
@ 2006-05-15 18:12 akalin
2006-05-19 5:57 ` [Caml-list] " Frederick Akalin
0 siblings, 1 reply; 20+ messages in thread
From: akalin @ 2006-05-15 18:12 UTC (permalink / raw)
To: caml-list
I'm running into cases where the 4 MB limit on arrays is starting to
become a problem. Lists are much slower and cause seg faults for me on
the same data set, and Bigarrays are a partial solution because I'd
like to be able to store arbitrary types in arrays (not just numbers).
I was greatly surprised when I found out there was such a low limit on
arrays. Is there a reason for this? Will this limit ever be increased?
Is the limit a limit on the number of elements or the total size? The
language in Sys.max_array_size implies the former, but the fact the
limit is halved for floats implies the latter. If I had a record type
with 5 floats, will the limit then by Sys.max_array_size / 10? Is there
some sort of existing ArrayList module that works around this problem?
Ideally, I'd like to have something like C++'s std::vector<> type,
which can be dynamically resized. Do I have to write my own? :(
Also, the fact that using lists crashes for the same data set is
surprising. Is there a similar hard limit for lists, or would this be
a bug? Should I post a test case?
Thanks!
Frederick Akalin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
@ 2006-05-19 5:57 ` Frederick Akalin
2006-05-19 16:28 ` Jozef Kosoru
0 siblings, 1 reply; 20+ messages in thread
From: Frederick Akalin @ 2006-05-19 5:57 UTC (permalink / raw)
To: Xavier Leroy; +Cc: caml-list
Xavier Leroy wrote:
>> I was greatly surprised when I found out there was such a low limit on
>> arrays. Is there a reason for this? Will this limit ever be increased?
>>
>
> As Brian Hurt explained, this limit comes from the fact that heap
> object sizes are stored in N-10 bits, where N is the bit width of the
> processor (32 or 64).
>
> Historical digression: this representation decision was initially
> taken when designing Caml Light in 1989-1990. At that time, even
> professional workstations ... Little did I know that the 32-bitters would
> survive so long. Now, it's 2006, and 64-bit processors are becoming
> universally available, in desktop machines at least. (I've been
> running an AMD64 PC at home since january 2005 with absolutely zero
> problems.) So, no the data representations of OCaml are not going to
> change to lift the array size limit on 32-bit machines.
>
>
I see. This is a perfectly reasonable explanation. I ended up just
using Bigarrays of floats for now and converting from those to 3-d
vectors on demand (what I need), but it would be nice to have a type
that wraps around Array that can get around the 4M limit (using an array
of arrays like someone suggested earlier). It's sad, but I think 32-bit
is going to be around for a while, and surely I can't be the only person
to run up against this. :) Not that I mind writing such a library and
releasing it. I wonder if the Extlib guys would be interested...
> A better idea would be to determine exactly what data structure you need:
> which abstract operations, what performance requirements, etc. C++
> and Lisp programmers tend to encode everything as arrays or lists,
> respectively, but quite often these are not the best data structure
> for the application of interest.
>
I find your assertion surprising. C++ and Common LISP are by no means
lacking in standard data structures (and using bare arrays is
discouraged in C++, as far as I know) and in my experience I haven't
much seen C++ code that used arrays/vectors when not appropriate.
In any case, in my application (a raytracer) I am reading in lists of
numbers from a file representing the points of a mesh and the triangles
that make up the mesh (represented by a list of indices into the mesh
list). A dynamically resizable, reasonable scalable array seems like
the best data structure to use.
>> Also, the fact that using lists crashes for the same data set is
>> surprising. Is there a similar hard limit for lists, or would this be a
>> bug? Should I post a test case?
>>
>
> Depends on the platform you use. In principle, Caml should report
> stack overflows cleanly, by throwing a Stack_overflow exception.
> However, this is hard to do in native code as it depends a lot on the
> processor and OS used. So, some combinations (e.g. x86/Linux) will
> report stack overflows via an exception, and others will let the
> kernel generate a segfault.
>
> If you're getting the segfault under x86/Linux for instance, please
> post a test case on the bug tracking system. It's high time that
> Damien shaves :-)
>
>
Due to some confusion on my part (I didn't realize stderr was buffered
in OCaml and needed to be flushed), I erroneously thought my program was
crashing on building the list. It was, in fact, a List.map lurking
further along in my code. I'm running PPC/OS X, which I'm guessing is
one of those platforms that lets the kernel generate a segfault? I
tried it on Linux and it threw an exception as expected. Damien's
facial hair is safe. :)
> - Xavier Leroy
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
Frederick Akalin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 5:57 ` [Caml-list] " Frederick Akalin
@ 2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 21:26 ` Jon Harrop
0 siblings, 1 reply; 20+ messages in thread
From: Jozef Kosoru @ 2006-05-19 16:28 UTC (permalink / raw)
To: caml-list
On Thu, May 18, 2006 at 22:57:30 -0700, Frederick Akalin wrote:
> >Historical digression: this representation decision was initially
> >taken when designing Caml Light in 1989-1990. At that time, even
> >professional workstations ... Little did I know that the 32-bitters
> >would survive so long. Now, it's 2006, and 64-bit processors are
> >becoming universally available, in desktop machines at least. (I've
> >been running an AMD64 PC at home since january 2005 with absolutely
> >zero problems.) So, no the data representations of OCaml are not
> >going to change to lift the array size limit on 32-bit machines.
>
> I see. This is a perfectly reasonable explanation. I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an
> array of arrays like someone suggested earlier). It's sad, but I
> think 32-bit is going to be around for a while, and surely I can't be
> the only person to run up against this. :) Not that I mind writing
> such a library and releasing it. I wonder if the Extlib guys would be
> interested...
Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
RAM is now standard and 1G RAM is very common for an average PC - having
a programming language with 4M limit for the array size is like to have
an 8+3 characters filename limitation on a filesystem using a mainstream
300G disk in 2006. However I understand the techical difficulty to solve
this issue - it's pretty annoying anyhow :)
Jozef
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 16:28 ` Jozef Kosoru
@ 2006-05-19 21:26 ` Jon Harrop
2006-05-20 1:06 ` Brian Hurt
0 siblings, 1 reply; 20+ messages in thread
From: Jon Harrop @ 2006-05-19 21:26 UTC (permalink / raw)
To: caml-list
On Friday 19 May 2006 17:28, Jozef Kosoru wrote:
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC - having
> a programming language with 4M limit for the array size is like to have
> an 8+3 characters filename limitation on a filesystem using a mainstream
> 300G disk in 2006.
I find it more concerning that array length is a function of the type, i.e.
different for float array.
> However I understand the techical difficulty to solve
> this issue - it's pretty annoying anyhow :)
Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
and strings as char arrays?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 21:26 ` Jon Harrop
@ 2006-05-20 1:06 ` Brian Hurt
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] Array 4 MB size limit) Oliver Bandel
0 siblings, 1 reply; 20+ messages in thread
From: Brian Hurt @ 2006-05-20 1:06 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
On Fri, 19 May 2006, Jon Harrop wrote:
> Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> and strings as char arrays?
Why not just run Ocaml as a 64-bit app on a 64-bit OS?
We I designing a language today, I'd have 63-bit array lengths- of course,
I'd do it by not bothering to support 32-bit systems...
As for strings, I'd be inclined to make them immutable- the correct way to
manipulate strings is with regular expressions. But I'm widely
acknowledged to be an extremist.
Brian
^ permalink raw reply [flat|nested] 20+ messages in thread
* immutable strings (Re: [Caml-list] Array 4 MB size limit)
2006-05-20 1:06 ` Brian Hurt
@ 2006-05-20 21:11 ` Oliver Bandel
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
0 siblings, 1 reply; 20+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:11 UTC (permalink / raw)
To: caml-list
On Fri, May 19, 2006 at 09:06:27PM -0400, Brian Hurt wrote:
>
>
> On Fri, 19 May 2006, Jon Harrop wrote:
>
> >Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> >and strings as char arrays?
>
> Why not just run Ocaml as a 64-bit app on a 64-bit OS?
>
> We I designing a language today, I'd have 63-bit array lengths- of course,
> I'd do it by not bothering to support 32-bit systems...
>
> As for strings, I'd be inclined to make them immutable- the correct way to
> manipulate strings is with regular expressions. But I'm widely
> acknowledged to be an extremist.
IMHO strings should be possibly made immutable by using the "immutable"
keyword, which is the opposite of the "mutable" keyword as it is used for
records. So the user/programmer can chose the kind of strings he7she wants.
Ciao,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: immutable strings (Re: Array 4 MB size limit)
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] Array 4 MB size limit) Oliver Bandel
@ 2006-05-25 4:32 ` Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 11:18 ` Brian Hurt
0 siblings, 2 replies; 20+ messages in thread
From: Stefan Monnier @ 2006-05-25 4:32 UTC (permalink / raw)
To: caml-list
> IMHO strings should be possibly made immutable by using the "immutable"
> keyword, which is the opposite of the "mutable" keyword as it is used for
> records. So the user/programmer can chose the kind of strings he7she wants.
I think it's OK to have (mutable) byte arrays, but strings should simply
always be immutable.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
@ 2006-05-25 5:56 ` Martin Jambon
2006-05-25 7:23 ` j h woodyatt
` (2 more replies)
2006-05-25 11:18 ` Brian Hurt
1 sibling, 3 replies; 20+ messages in thread
From: Martin Jambon @ 2006-05-25 5:56 UTC (permalink / raw)
To: Stefan Monnier; +Cc: caml-list
On Thu, 25 May 2006, Stefan Monnier wrote:
>> IMHO strings should be possibly made immutable by using the "immutable"
>> keyword, which is the opposite of the "mutable" keyword as it is used for
>> records. So the user/programmer can chose the kind of strings he7she wants.
>
> I think it's OK to have (mutable) byte arrays, but strings should simply
> always be immutable.
OCaml strings are compact byte arrays which serve their purpose well.
Having a whole different type for immutable strings is in my opinion a
waste of energy. The problem is that freezing or unfreezing a string
safely involves a copy of the whole string. And obviously it's not
possible to handle only immutable strings since somehow you have to create
them, and unlike record fields, they won't be set in one operation but in
n operations, n being the length of the string.
So I'd really love to see actual examples where using immutable strings
would be such an improvement over mutable strings.
If the problem is just to ensure that string data won't be changed by the
user of a library, then it is trivial using module signatures and
String.copy for the conversions.
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
@ 2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
2006-05-25 11:14 ` Brian Hurt
2006-05-25 17:31 ` Aleksey Nogin
2 siblings, 2 replies; 20+ messages in thread
From: j h woodyatt @ 2006-05-25 7:23 UTC (permalink / raw)
To: The Caml Trade
On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
>
> So I'd really love to see actual examples where using immutable
> strings would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed
> by the user of a library, then it is trivial using module
> signatures and String.copy for the conversions.
I have no dog in this fight, but I can imagine a pragmatic approach
that might satisfy some of these concerns without introducing much in
the way of extra runtime complexity costs.
Change the way strings are boxed so that there is an additional tag
for immutable strings as opposed to the normal mutable ones. In all
respects, allow string objects with either tag to be treated the same
by functions that do not modify the content of the string. When the
"safe" variants of the string mutation functions are called on a
string object with the immutable tag, throw a runtime exception.
Finally, provide a function that changes the tag of a string from
mutable to immutable, i.e. like a special blessed case of
Obj.set_tag. Not from immutable to mutable, though-- that should be
disallowed. If the -unsafe is not set, then allow a new compiler
option -constliteral (or something) that would make all string
literals immutable. You'd need to use String.copy to get a mutable
string with contents copied from an immutable one.
Of course, I can't offer a convincing reason to believe this would
bring an improvement over what we have today. My proposal would not
give the programmer any assurance that contents of string values
passed as parameters to library functions are not modified by the
library, i.e. the "unsafe" functions would still work. It *might*
give a pragmatic benefit of surfacing unintentional programming
errors earlier than they might otherwise be found, but I'm not
prepared to argue in defense of that.
If you did immutable strings this way, it would be nice for the sake
of consistency to have a comparable feature for arrays.
—
j h woodyatt <jhw@conjury.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 7:23 ` j h woodyatt
@ 2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
1 sibling, 0 replies; 20+ messages in thread
From: Jon Harrop @ 2006-05-25 10:22 UTC (permalink / raw)
To: caml-list
On Thursday 25 May 2006 08:23, j h woodyatt wrote:
> Change the way strings are boxed so that there is an additional tag
> for immutable strings as opposed to the normal mutable ones. In all
> respects, allow string objects with either tag to be treated the same
> by functions that do not modify the content of the string. When the
> "safe" variants of the string mutation functions are called on a
> string object with the immutable tag, throw a runtime exception.
Blech. I didn't come all this way for dynamic typing... ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
@ 2006-05-25 19:28 ` Oliver Bandel
1 sibling, 0 replies; 20+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:28 UTC (permalink / raw)
To: caml-list
On Thu, May 25, 2006 at 12:23:51AM -0700, j h woodyatt wrote:
> On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
> >
> >So I'd really love to see actual examples where using immutable
> >strings would be such an improvement over mutable strings.
> >If the problem is just to ensure that string data won't be changed
> >by the user of a library, then it is trivial using module
> >signatures and String.copy for the conversions.
>
> I have no dog in this fight, but I can imagine a pragmatic approach
> that might satisfy some of these concerns without introducing much in
> the way of extra runtime complexity costs.
>
> Change the way strings are boxed so that there is an additional tag
> for immutable strings as opposed to the normal mutable ones. In all
> respects, allow string objects with either tag to be treated the same
> by functions that do not modify the content of the string. When the
> "safe" variants of the string mutation functions are called on a
> string object with the immutable tag, throw a runtime exception.
[...]
Compiletime error would be what I would prefer.
[...]
> It *might*
> give a pragmatic benefit of surfacing unintentional programming
> errors earlier than they might otherwise be found,
[...]
The earlier the better it is.
It's like non-matching types found by the compiler: it helps
a lot in development. Debugging normallay takes the most time
in development; and with OCaml debugging time decreases much - that
is what I experienced. And this comes from early getting a hint
to possible problems (possible with compilers that would accept
such code; im possible when the compiler says: "no! I don't accept
these sources!").
Ciao,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
@ 2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
2006-05-25 17:31 ` Aleksey Nogin
2 siblings, 2 replies; 20+ messages in thread
From: Brian Hurt @ 2006-05-25 11:14 UTC (permalink / raw)
To: Martin Jambon; +Cc: Stefan Monnier, caml-list
On Wed, 24 May 2006, Martin Jambon wrote:
> And obviously it's not possible to handle only immutable strings since
> somehow you have to create them, and unlike record fields, they won't be
> set in one operation but in n operations, n being the length of the
> string.
This is the advantage of regular expressions- they can be implemented on
immutable strings (regular expressions don't modify the old strings, they
produce new strings), and can change large chunks of the string in one
"operation", limiting the number of unnecessary string copies you need to
do. Yes, the underlying implementation would be mutable- but so is they
underlying implementation of everything else. The semantics remain
applicative.
Note that this also eliminates an inefficiency of ocaml. When you use a
constant string, it has to allocate a new string for you. Consider
Ocaml's response to the expression:
"foo" == "foo";;
The reason ocaml returns false in the above expression is that when you
give a constant string in an expression, Ocaml first has to make a copy of
that string (in case you modify it), and then it returns the copy. So it
makes two copies of the string "foo" above, and then compares them for
referential equality- which, since they're different copies, returns
false.
Note that even pure-imperitive languages, like C/C++, hit this problem.
What is the output of the following peice of C code:
for (i = 0; i < 10; ++i) {
char * p = "foo";
p[0] += 1;
printf("%s\n", p);
}
C/C++ itself wants *some* strings to be immutable. Fortran used to let
you change the value of 2, which made for some interesting (in the bad
sense) code tricks. I comment that they've since fixed this. But mutable
strings allow you to change the value of "two", which is almost as bad.
Brian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:14 ` Brian Hurt
@ 2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
1 sibling, 0 replies; 20+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:42 UTC (permalink / raw)
To: caml-list
On Thu, May 25, 2006 at 07:14:55AM -0400, Brian Hurt wrote:
[...]
> Note that even pure-imperitive languages, like C/C++, hit this problem.
> What is the output of the following peice of C code:
> for (i = 0; i < 10; ++i) {
> char * p = "foo";
> p[0] += 1;
> printf("%s\n", p);
> }
[...]
Heheh, this brings a "Bus Error" (which one might call an "exception" ;-)
(an exception th C'ish way)).
Ciao,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
@ 2006-05-26 6:51 ` Alain Frisch
1 sibling, 0 replies; 20+ messages in thread
From: Alain Frisch @ 2006-05-26 6:51 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
Brian Hurt wrote:
> Note that this also eliminates an inefficiency of ocaml. When you use a
> constant string, it has to allocate a new string for you. Consider
> Ocaml's response to the expression:
>
> "foo" == "foo";;
>
> The reason ocaml returns false in the above expression is that when you
> give a constant string in an expression, Ocaml first has to make a copy
> of that string (in case you modify it), and then it returns the copy.
This is not true. The two strings above are physically different because
the two constants appear at different location of the source.
The expression
let f () = "foo" in f () == f ()
evaluates to true. The block for "foo" is allocated only once; the
string is not copied not each time the body of f is evaluated.
As a consequence,
let f () = "foo" in (f ()).[0] <- 'F'; f ()
returns "Foo".
-- Alain
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
2006-05-25 11:14 ` Brian Hurt
@ 2006-05-25 17:31 ` Aleksey Nogin
2006-05-25 19:54 ` Martin Jambon
2 siblings, 1 reply; 20+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:31 UTC (permalink / raw)
To: Caml List; +Cc: Martin Jambon
On 24.05.2006 22:56, Martin Jambon wrote:
>> I think it's OK to have (mutable) byte arrays, but strings should simply
>> always be immutable.
>
> OCaml strings are compact byte arrays which serve their purpose well.
Yes, however immutable strings are also very useful and that
functionality is simply missing in OCaml. The usage I am very interested
in is essentially using strings as "printable tokens". In other words, a
data type that is easy to compare and has an obvious I/O representation.
> Having a whole different type for immutable strings is in my opinion a
> waste of energy. The problem is that freezing or unfreezing a string
> safely involves a copy of the whole string. And obviously it's not
> possible to handle only immutable strings since somehow you have to
> create them, and unlike record fields, they won't be set in one
> operation but in n operations, n being the length of the string.
This is not true. All I want is having a purely functional interface with:
- Constants (a compiler flag for turning "..." constants into immutable
strings instead of mutable ones).
- Inputing from a channel
- Concatenation
- Things like string_of_int for immutable string.
Of course, it might be the case that the standard library might have to
use some sort of "unsafe" operations that would "inappropriately" mutate
the newly created immutable string buffer, but this is IMHO no different
than how the unsafe operations are already used in standard library for
arrays and strings.
> So I'd really love to see actual examples where using immutable strings
> would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed by
> the user of a library, then it is trivial using module signatures and
> String.copy for the conversions.
Such a copy operation can be extremely prohibitive in a setting that
assumes that a data structure is immutable and tries really hard to
preserve sharing (including using functions like a sharing-preserving
version of map (*), etc). In such a setting, these extra copies can
potentially have a devastating effect on memory usage, cache
performance, etc. And this situation is exactly what we have in our
MetaPRL project - there we have resorted to simply using strings and
pretending they are immutable, but this is clearly suboptimal.
----
(*)
let rec smap f = function
[] -> []
| (hd :: tl) as l ->
let hd' = f hd in
let tl' = smap f tl in
if hd == hd' && tl == tl' then l else hd' :: tl'
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 17:31 ` Aleksey Nogin
@ 2006-05-25 19:54 ` Martin Jambon
0 siblings, 0 replies; 20+ messages in thread
From: Martin Jambon @ 2006-05-25 19:54 UTC (permalink / raw)
To: Caml List; +Cc: Martin Jambon
On Thu, 25 May 2006, Aleksey Nogin wrote:
> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>> OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is
> simply missing in OCaml. The usage I am very interested in is essentially
> using strings as "printable tokens". In other words, a data type that is easy
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a
>> waste of energy. The problem is that freezing or unfreezing a string safely
>> involves a copy of the whole string. And obviously it's not possible to
>> handle only immutable strings since somehow you have to create them, and
>> unlike record fields, they won't be set in one operation but in n
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.
Isn't it a bit limited? What if I want other functions?
But if it satisfies you, you can do the syntax part with an unsafe_freeze
function and a bit of camlp4. The rest is just plain old OCaml.
> Of course, it might be the case that the standard library might have to use
> some sort of "unsafe" operations that would "inappropriately" mutate the
> newly created immutable string buffer, but this is IMHO no different than how
> the unsafe operations are already used in standard library for arrays and
> strings.
I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of
your suggestions. This strongly constrasts with mistakes in array/string
indices which happen all the time.
>> So I'd really love to see actual examples where using immutable strings
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the
>> user of a library, then it is trivial using module signatures and
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes
> that a data structure is immutable and tries really hard to preserve sharing
> (including using functions like a sharing-preserving version of map (*),
> etc). In such a setting, these extra copies can potentially have a
> devastating effect on memory usage, cache performance, etc. And this
> situation is exactly what we have in our MetaPRL project - there we have
> resorted to simply using strings and pretending they are immutable, but this
> is clearly suboptimal.
Yes, so how do you avoid copies without using the "unsafe" conversions all
over the place?
> ----
> (*)
> let rec smap f = function
> [] -> []
> | (hd :: tl) as l ->
> let hd' = f hd in
> let tl' = smap f tl in
> if hd == hd' && tl == tl' then l else hd' :: tl'
In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it
doesn't break sharing:
let freeze s =
let s' = make_constant s (* using a copy! *) in
if s' is in the table then return the element from the table
else add s' and return s'
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
@ 2006-05-25 11:18 ` Brian Hurt
2006-05-25 17:34 ` Aleksey Nogin
1 sibling, 1 reply; 20+ messages in thread
From: Brian Hurt @ 2006-05-25 11:18 UTC (permalink / raw)
To: caml-list
I wish to reiterate that my comments on immutable strings are solely in
the context of designing a new language, and should not be construed to
indicate a desire to change Ocaml. Changing this behavior in Ocaml would
break huge amounts of code (including some of my code).
Brian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:18 ` Brian Hurt
@ 2006-05-25 17:34 ` Aleksey Nogin
2006-05-25 18:44 ` Tom
0 siblings, 1 reply; 20+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:34 UTC (permalink / raw)
To: Caml List
On 25.05.2006 04:18, Brian Hurt wrote:
> I wish to reiterate that my comments on immutable strings are solely in
> the context of designing a new language, and should not be construed to
> indicate a desire to change Ocaml.
Well, I am advocating a change. IMHO the immutable strings is a very
useful concept and I would really like to see it in OCaml.
> Changing this behavior in Ocaml
> would break huge amounts of code (including some of my code).
It is obvious that at this point changing the default behavior would not
be possible. However, adding the new "immutable strings" type would
definitely be great and adding a compiler flag that would turn all
string constants, "^", etc in a given module into immutable string
operations would be nice as well and would not break any existing code.
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 17:34 ` Aleksey Nogin
@ 2006-05-25 18:44 ` Tom
2006-05-25 23:00 ` Jon Harrop
0 siblings, 1 reply; 20+ messages in thread
From: Tom @ 2006-05-25 18:44 UTC (permalink / raw)
To: Caml List
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
On 25/05/06, Aleksey Nogin <nogin@cs.caltech.edu> wrote:
>
> On 25.05.2006 04:18, Brian Hurt wrote:
>
> > Changing this behavior in Ocaml
> > would break huge amounts of code (including some of my code).
>
> It is obvious that at this point changing the default behavior would not
> be possible. However, adding the new "immutable strings" type would
> definitely be great and adding a compiler flag that would turn all
> string constants, "^", etc in a given module into immutable string
> operations would be nice as well and would not break any existing code.
But why don't you add such a type yourself?
[-- Attachment #2: Type: text/html, Size: 943 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 18:44 ` Tom
@ 2006-05-25 23:00 ` Jon Harrop
2006-05-25 23:15 ` Martin Jambon
0 siblings, 1 reply; 20+ messages in thread
From: Jon Harrop @ 2006-05-25 23:00 UTC (permalink / raw)
To: caml-list
On Thursday 25 May 2006 19:44, Tom wrote:
[immutable strings]
> But why don't you add such a type yourself?
You would have to make your new type abstract in order to distinguish it from
the build-in mutable string type. Then you lose the ability to do pattern
matching and s.[i].
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 23:00 ` Jon Harrop
@ 2006-05-25 23:15 ` Martin Jambon
0 siblings, 0 replies; 20+ messages in thread
From: Martin Jambon @ 2006-05-25 23:15 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
On Fri, 26 May 2006, Jon Harrop wrote:
> On Thursday 25 May 2006 19:44, Tom wrote:
> [immutable strings]
>> But why don't you add such a type yourself?
>
> You would have to make your new type abstract in order to distinguish it from
> the build-in mutable string type. Then you lose the ability to do pattern
> matching and s.[i].
The former problem can be solved with camlp4 (which would use a "unsafe"
function to do the type conversion). The latter can be solved with camlp4
too, or simply by opening a module which defines a String module which
provides a "get" function.
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-06-07 0:34 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-06-05 20:54 ` immutable strings Matti Jokinen
2006-06-07 0:36 ` [Caml-list] " Jacques Garrigue
-- strict thread matches above, loose matches on Subject: below --
2006-05-29 20:52 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-19 5:57 ` [Caml-list] " Frederick Akalin
2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 21:26 ` Jon Harrop
2006-05-20 1:06 ` Brian Hurt
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
2006-05-25 17:31 ` Aleksey Nogin
2006-05-25 19:54 ` Martin Jambon
2006-05-25 11:18 ` Brian Hurt
2006-05-25 17:34 ` Aleksey Nogin
2006-05-25 18:44 ` Tom
2006-05-25 23:00 ` Jon Harrop
2006-05-25 23:15 ` Martin Jambon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox