* Array 4 MB size limit
@ 2006-05-15 18:12 akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
` (4 more replies)
0 siblings, 5 replies; 67+ messages in thread
From: akalin @ 2006-05-15 18:12 UTC (permalink / raw)
To: caml-list
I'm running into cases where the 4 MB limit on arrays is starting to
become a problem. Lists are much slower and cause seg faults for me on
the same data set, and Bigarrays are a partial solution because I'd
like to be able to store arbitrary types in arrays (not just numbers).
I was greatly surprised when I found out there was such a low limit on
arrays. Is there a reason for this? Will this limit ever be increased?
Is the limit a limit on the number of elements or the total size? The
language in Sys.max_array_size implies the former, but the fact the
limit is halved for floats implies the latter. If I had a record type
with 5 floats, will the limit then by Sys.max_array_size / 10? Is there
some sort of existing ArrayList module that works around this problem?
Ideally, I'd like to have something like C++'s std::vector<> type,
which can be dynamically resized. Do I have to write my own? :(
Also, the fact that using lists crashes for the same data set is
surprising. Is there a similar hard limit for lists, or would this be
a bug? Should I post a test case?
Thanks!
Frederick Akalin
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 18:12 Array 4 MB size limit akalin
@ 2006-05-15 18:22 ` Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
` (3 subsequent siblings)
4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Cannasse @ 2006-05-15 18:22 UTC (permalink / raw)
To: akalin; +Cc: caml-list
> Also, the fact that using lists crashes for the same data set is
> surprising. Is there a similar hard limit for lists, or would this be a
> bug? Should I post a test case?
>
> Thanks!
>
> Frederick Akalin
This is most probably because some list operation is causing a Stack
Overflow (List.map if I remember correctly). You can use ExtLib
(http://ocaml-libs.sf.net) for example that provides tail-recursive List
operations.
Nicolas
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
@ 2006-05-15 20:32 ` Damien Doligez
2006-05-15 21:27 ` akalin
2006-05-15 22:51 ` Oliver Bandel
` (2 subsequent siblings)
4 siblings, 1 reply; 67+ messages in thread
From: Damien Doligez @ 2006-05-15 20:32 UTC (permalink / raw)
To: caml users
On 2006-05-15, at 20:12, akalin@akalin.cx wrote:
> I'm running into cases where the 4 MB limit on arrays is starting
> to become a problem. Lists are much slower and cause seg faults
> for me on the same data set, and Bigarrays are a partial solution
> because I'd like to be able to store arbitrary types in arrays (not
> just numbers).
You could move to a two-level solution (an array of arrays). The
code is not
hard to write, but it does entail a performance hit, which may or may
not matter
for your application.
> I was greatly surprised when I found out there was such a low limit
> on arrays. Is there a reason for this? Will this limit ever be
> increased?
It will be increased when you get a 64-bit machine ;-)
> Also, the fact that using lists crashes for the same data set is
> surprising. Is there a similar hard limit for lists, or would this
> be a bug? Should I post a test case?
I bet my mustache these crashes are stack overflows.
-- Damien
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 20:32 ` Damien Doligez
@ 2006-05-15 21:27 ` akalin
0 siblings, 0 replies; 67+ messages in thread
From: akalin @ 2006-05-15 21:27 UTC (permalink / raw)
To: Damien Doligez; +Cc: caml users
Quoting Damien Doligez <damien.doligez@inria.fr>:
> On 2006-05-15, at 20:12, akalin@akalin.cx wrote:
>
>> I was greatly surprised when I found out there was such a low limit
>> on arrays. Is there a reason for this? Will this limit ever be
>> increased?
>
> It will be increased when you get a 64-bit machine ;-)
Of course it will. :) But what I meant was that the limit is nowhere
near 32 bits (4 MB is only 22 bits), which leads me to believe that the
limit is due to the implementation of arrays. I was asking if there
would be an improved implemetation of arrays which would increase this
limit (up to 32 bits).
>> Also, the fact that using lists crashes for the same data set is
>> surprising. Is there a similar hard limit for lists, or would this
>> be a bug? Should I post a test case?
>
> I bet my mustache these crashes are stack overflows.
I was thinking otherwise, as it seemed to be crashing only when
building the list. The list is built by a parser rule which is reading
from a line of floats. The rule itself is recursive, which does imply
a stack overflow, (or does it? ocamlyacc generates a LALR or some
variant, right?) but changing that rule to use a Queue worked fine.
(Until I tried to convert it into an array, which led me to discover
the 4 MB limit)
Incidentally, what data structure does the Queue use internally? I
didn't run into any of the problems I did with lists or arrays.
Frederick Akalin
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
@ 2006-05-15 22:51 ` Oliver Bandel
2006-05-16 0:48 ` Brian Hurt
2006-05-16 8:01 ` Xavier Leroy
4 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-15 22:51 UTC (permalink / raw)
To: caml-list
On Mon, May 15, 2006 at 02:12:30PM -0400, akalin@akalin.cx wrote:
> I'm running into cases where the 4 MB limit on arrays is starting to
> become a problem.
ooops :(
> Lists are much slower and cause seg faults for me on
> the same data set,
well.... exceptions can be acceptable, but segfaults?
Very unnice :(
Shouldn't be...
[...]
> Also, the fact that using lists crashes for the same data set is
> surprising. Is there a similar hard limit for lists, or would this be
> a bug? Should I post a test case?
[...]
Do you really need an array or other linear data structures?
Couldn't the problem be re-formulated to use trees instead (Map/Set)?
(hoping these limitations are not there...)
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 18:12 Array 4 MB size limit akalin
` (2 preceding siblings ...)
2006-05-15 22:51 ` Oliver Bandel
@ 2006-05-16 0:48 ` Brian Hurt
2006-05-16 9:57 ` Damien Doligez
2006-05-16 8:01 ` Xavier Leroy
4 siblings, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-16 0:48 UTC (permalink / raw)
To: akalin; +Cc: caml-list
On Mon, 15 May 2006, akalin@akalin.cx wrote:
> I'm running into cases where the 4 MB limit on arrays is starting to become a
> problem. Lists are much slower and cause seg faults for me on the same data
> set, and Bigarrays are a partial solution because I'd like to be able to
> store arbitrary types in arrays (not just numbers).
>
> I was greatly surprised when I found out there was such a low limit on
> arrays. Is there a reason for this? Will this limit ever be increased?
It comes from how arrays are stored in memory. Every boxed value has a
"tag word", which, among other uses, is used by the GC to figure out how
big the object is (i.e. where the next object starts). Now, the encoding
scheme of this tag word is fairly complicated, but it works out that for
arrays, 10 bits of the 32 bits is used for other stuff, leaving 22 bits
for the size. This limits the size of an array to 4M-1 elements
(actually, it limits the size of the array to 16M-4 bytes, as size is
measured in words- which is why arrays of floats are limited to 2M-1
entries- each float takes up 8 bytes aka 2 words).
When you move to 64 bits, the tag word doubles in size, but the amount of
"other" information in the tag word doesn't- this means that suddenly you
have 52 bits of size, or 4T arrays. And now floats only take up one word,
not two, so they too can have 4T arrays.
The array of array idea someone brought up is a good one. There's a
slight performance hit, but not much. I'd keep the top level array short-
1K entries is enough. The advantage of this is that if you're heavily
using the array, the top level array will stay in cache (possibly even L1
cache), meaning the main cost of an array access only goes up by about 10%
or so, maybe less (maybe 1% if it stays in L1 cache). The reasoning goes
like this: the largest cost by far of an array access on a large array is
the cache miss. Doing a memory reference that is satisified out of L1
cache generally takes 1-4 clock cycles. If you have to go to L2 cache,
that's going to take 10-40 clock cycles (approximately). Going to main
memory? 100-350+ clock cycles. With a large array, you're likely going
to take the 100-350+ clock cycle hit anyways. If the top level array is in
L1 cache, you're only adding 1-4 clockcycles to the 100-350+ clock cycle
hit you're going to take anyways.
Personally, I think PCs should have gone 64 bit circa 1997 or so. Once we
started getting hard disks large enough that we couldn't mmap the entire
hard disk, we started hitting problems. Long long and the large file
interfaces are examples of these problems. The longer we go- and the
larger memories and hard disks get, the bigger the problems get.
Microsoft seems to be the main impediment, now that AMD has forced Intel
to get off the stick.
> If I had a record type with 5 floats,
> will the limit then by Sys.max_array_size / 10?
Within the record, the floats are unboxed (assuming you didn't have any
non-float elements). This means the floats are stored directly in the
record, and that the record doesn't hold pointers to the floats. But the
record itself is boxed- which means that an array of these records is
really an array of pointers to these records, meaning that you're back at
the 4M-1 limit.
Note that a record of 5 floats costs 44 bytes (40 bytes for the 5 floats +
4 bytes for the tag word). Assuming no records are stored in more than
one location, the total memory cost of an array of 4M-1 of them is 16M for
the array, plus (4M-1)*44 for the records, for a total of 192M-44 bytes.
This is almost 10% of your available memory space right there.
> Is there some sort of
> existing ArrayList module that works around this problem? Ideally, I'd like
> to have something like C++'s std::vector<> type, which can be dynamically
> resized.
Ocaml can't dynamically resize arrays. This gets tricky to do when arrays
get large. At the extreme, if the array takes up 50% + 1 byte of the
total address space, you can't resize it to be any larger- to resize
requires you to copy it, which takes more memory than you have. I've
seend problems well before then.
> Also, the fact that using lists crashes for the same data set is surprising.
> Is there a similar hard limit for lists, or would this be a bug? Should I
> post a test case?
I'm willing to bet dollars to donuts that the problem you hit with lists
was stack overflow. Without signifigantly impacting performance there
isn't a whole lot Ocaml can do about detecting stack overflow before the
segfault hits. My general rule is, if it's going to contain more than a
few dozen items, it probably shouldn't be a list. Extlib's library is
less prone to this error, but you can still run into serious problems with
long lists.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-16 0:48 ` Brian Hurt
@ 2006-05-16 9:57 ` Damien Doligez
2006-05-16 15:10 ` Markus Mottl
0 siblings, 1 reply; 67+ messages in thread
From: Damien Doligez @ 2006-05-16 9:57 UTC (permalink / raw)
To: caml users
On 2006-05-16, at 02:48, Brian Hurt wrote:
> When you move to 64 bits, the tag word doubles in size, but the
> amount of "other" information in the tag word doesn't- this means
> that suddenly you have 52 bits of size, or 4T arrays.
This is not quite correct, but a mere factor of 1000 doesn't matter
much at
this point :-)
-- Damien
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-15 18:12 Array 4 MB size limit akalin
` (3 preceding siblings ...)
2006-05-16 0:48 ` Brian Hurt
@ 2006-05-16 8:01 ` Xavier Leroy
2006-05-16 8:20 ` Nicolas Cannasse
2006-05-19 5:57 ` Frederick Akalin
4 siblings, 2 replies; 67+ messages in thread
From: Xavier Leroy @ 2006-05-16 8:01 UTC (permalink / raw)
To: akalin; +Cc: caml-list
> I was greatly surprised when I found out there was such a low limit on
> arrays. Is there a reason for this? Will this limit ever be increased?
As Brian Hurt explained, this limit comes from the fact that heap
object sizes are stored in N-10 bits, where N is the bit width of the
processor (32 or 64).
Historical digression: this representation decision was initially
taken when designing Caml Light in 1989-1990. At that time, even
professional workstations had 16 M of RAM at best, so limiting arrays to
4M elements was reasonable. The decision was then reconsidered in
1995 during the redesign that led to OCaml. At that time,
64-bit architectures were all the rage: OCaml was actually implemented
on a 64-bit Alpha, and only then backported to 32-bit machines.
So, the original header format was kept, since it makes complete sense
on a 64-bit architecture. Little did I know that the 32-bitters would
survive so long. Now, it's 2006, and 64-bit processors are becoming
universally available, in desktop machines at least. (I've been
running an AMD64 PC at home since january 2005 with absolutely zero
problems.) So, no the data representations of OCaml are not going to
change to lift the array size limit on 32-bit machines.
> Is the limit a limit on the number of elements or the total size? The
> language in Sys.max_array_size implies the former, but the fact the
> limit is halved for floats implies the latter. If I had a record type
> with 5 floats, will the limit then by Sys.max_array_size / 10?
No. In general, Caml arrays are not unboxed, meaning that your array
of 5-float records is actually an array of pointers to individual
blocks holding 5 floats. The only exception is for arrays of floats,
which are unboxed.
> Is there
> some sort of existing ArrayList module that works around this problem?
> Ideally, I'd like to have something like C++'s std::vector<> type, which
> can be dynamically resized. Do I have to write my own? :(
A better idea would be to determine exactly what data structure you need:
which abstract operations, what performance requirements, etc. C++
and Lisp programmers tend to encode everything as arrays or lists,
respectively, but quite often these are not the best data structure
for the application of interest.
> Also, the fact that using lists crashes for the same data set is
> surprising. Is there a similar hard limit for lists, or would this be a
> bug? Should I post a test case?
Depends on the platform you use. In principle, Caml should report
stack overflows cleanly, by throwing a Stack_overflow exception.
However, this is hard to do in native code as it depends a lot on the
processor and OS used. So, some combinations (e.g. x86/Linux) will
report stack overflows via an exception, and others will let the
kernel generate a segfault.
If you're getting the segfault under x86/Linux for instance, please
post a test case on the bug tracking system. It's high time that
Damien shaves :-)
- Xavier Leroy
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-16 8:01 ` Xavier Leroy
@ 2006-05-16 8:20 ` Nicolas Cannasse
2006-05-19 17:13 ` Xavier Leroy
2006-05-19 5:57 ` Frederick Akalin
1 sibling, 1 reply; 67+ messages in thread
From: Nicolas Cannasse @ 2006-05-16 8:20 UTC (permalink / raw)
To: Xavier Leroy; +Cc: akalin, caml-list
>>Also, the fact that using lists crashes for the same data set is
>>surprising. Is there a similar hard limit for lists, or would this be a
>>bug? Should I post a test case?
>
>
> Depends on the platform you use. In principle, Caml should report
> stack overflows cleanly, by throwing a Stack_overflow exception.
> However, this is hard to do in native code as it depends a lot on the
> processor and OS used. So, some combinations (e.g. x86/Linux) will
> report stack overflows via an exception, and others will let the
> kernel generate a segfault.
A segfault will happen on Windows/MSVC port. I also found some cases
where the commandline program (the haXe compiler in that case) just
silently exited on Stack Overflow (exit code was not 0 but no error or
"program error" infamous message box was displayed).
I think that there is some MSVC specific C extension for catching such
stack overflows (__try / __except*). It would be nice to have such a
handling at the ocaml toplevel that would at least exit with a
meaningful error message. I don't care so much in my case since I'm not
using C code a lot, I know for sure that any crash/early abort is
indeeed a stack overflow. That might not be the case for all Win32 users.
Best,
Nicolas
*
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/key_s-z_4.asp
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-16 8:20 ` Nicolas Cannasse
@ 2006-05-19 17:13 ` Xavier Leroy
0 siblings, 0 replies; 67+ messages in thread
From: Xavier Leroy @ 2006-05-19 17:13 UTC (permalink / raw)
To: Nicolas Cannasse; +Cc: akalin, caml-list
>> [Stack overflow in ocamlopt-generated code]
> A segfault will happen on Windows/MSVC port. I also found some cases
> where the commandline program (the haXe compiler in that case) just
> silently exited on Stack Overflow (exit code was not 0 but no error or
> "program error" infamous message box was displayed).
>
> I think that there is some MSVC specific C extension for catching such
> stack overflows (__try / __except*). It would be nice to have such a
> handling at the ocaml toplevel that would at least exit with a
> meaningful error message.
I'm aware of __try/__finally, but a bit of experimentation suggested
that it doesn't play nice with OCaml's exception handling. There is a
lower-level API whose name I cannot remember that might be usable
within Caml, but I didn't pursue this. Windows users are welcome to
contribute suggestions or code.
- Xavier Leroy
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-16 8:01 ` Xavier Leroy
2006-05-16 8:20 ` Nicolas Cannasse
@ 2006-05-19 5:57 ` Frederick Akalin
2006-05-19 6:21 ` Oliver Bandel
` (2 more replies)
1 sibling, 3 replies; 67+ messages in thread
From: Frederick Akalin @ 2006-05-19 5:57 UTC (permalink / raw)
To: Xavier Leroy; +Cc: caml-list
Xavier Leroy wrote:
>> I was greatly surprised when I found out there was such a low limit on
>> arrays. Is there a reason for this? Will this limit ever be increased?
>>
>
> As Brian Hurt explained, this limit comes from the fact that heap
> object sizes are stored in N-10 bits, where N is the bit width of the
> processor (32 or 64).
>
> Historical digression: this representation decision was initially
> taken when designing Caml Light in 1989-1990. At that time, even
> professional workstations ... Little did I know that the 32-bitters would
> survive so long. Now, it's 2006, and 64-bit processors are becoming
> universally available, in desktop machines at least. (I've been
> running an AMD64 PC at home since january 2005 with absolutely zero
> problems.) So, no the data representations of OCaml are not going to
> change to lift the array size limit on 32-bit machines.
>
>
I see. This is a perfectly reasonable explanation. I ended up just
using Bigarrays of floats for now and converting from those to 3-d
vectors on demand (what I need), but it would be nice to have a type
that wraps around Array that can get around the 4M limit (using an array
of arrays like someone suggested earlier). It's sad, but I think 32-bit
is going to be around for a while, and surely I can't be the only person
to run up against this. :) Not that I mind writing such a library and
releasing it. I wonder if the Extlib guys would be interested...
> A better idea would be to determine exactly what data structure you need:
> which abstract operations, what performance requirements, etc. C++
> and Lisp programmers tend to encode everything as arrays or lists,
> respectively, but quite often these are not the best data structure
> for the application of interest.
>
I find your assertion surprising. C++ and Common LISP are by no means
lacking in standard data structures (and using bare arrays is
discouraged in C++, as far as I know) and in my experience I haven't
much seen C++ code that used arrays/vectors when not appropriate.
In any case, in my application (a raytracer) I am reading in lists of
numbers from a file representing the points of a mesh and the triangles
that make up the mesh (represented by a list of indices into the mesh
list). A dynamically resizable, reasonable scalable array seems like
the best data structure to use.
>> Also, the fact that using lists crashes for the same data set is
>> surprising. Is there a similar hard limit for lists, or would this be a
>> bug? Should I post a test case?
>>
>
> Depends on the platform you use. In principle, Caml should report
> stack overflows cleanly, by throwing a Stack_overflow exception.
> However, this is hard to do in native code as it depends a lot on the
> processor and OS used. So, some combinations (e.g. x86/Linux) will
> report stack overflows via an exception, and others will let the
> kernel generate a segfault.
>
> If you're getting the segfault under x86/Linux for instance, please
> post a test case on the bug tracking system. It's high time that
> Damien shaves :-)
>
>
Due to some confusion on my part (I didn't realize stderr was buffered
in OCaml and needed to be flushed), I erroneously thought my program was
crashing on building the list. It was, in fact, a List.map lurking
further along in my code. I'm running PPC/OS X, which I'm guessing is
one of those platforms that lets the kernel generate a segfault? I
tried it on Linux and it threw an exception as expected. Damien's
facial hair is safe. :)
> - Xavier Leroy
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
Frederick Akalin
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 5:57 ` Frederick Akalin
@ 2006-05-19 6:21 ` Oliver Bandel
2006-05-19 12:15 ` Jon Harrop
2006-05-19 16:28 ` Jozef Kosoru
2 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19 6:21 UTC (permalink / raw)
To: caml-list
On Thu, May 18, 2006 at 10:57:30PM -0700, Frederick Akalin wrote:
[...]
> I see. This is a perfectly reasonable explanation. I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an array
> of arrays like someone suggested earlier). It's sad, but I think 32-bit
> is going to be around for a while, and surely I can't be the only person
> to run up against this. :) Not that I mind writing such a library and
> releasing it. I wonder if the Extlib guys would be interested...
> >A better idea would be to determine exactly what data structure you need:
> >which abstract operations, what performance requirements, etc. C++
> >and Lisp programmers tend to encode everything as arrays or lists,
> >respectively, but quite often these are not the best data structure
> >for the application of interest.
[...]
You have > 4 Million points?
What datastructure are you using right now?
Some code to show?
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 5:57 ` Frederick Akalin
2006-05-19 6:21 ` Oliver Bandel
@ 2006-05-19 12:15 ` Jon Harrop
2006-05-19 19:36 ` akalin
2006-05-19 16:28 ` Jozef Kosoru
2 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-19 12:15 UTC (permalink / raw)
To: caml-list
On Friday 19 May 2006 06:57, Frederick Akalin wrote:
> I see. This is a perfectly reasonable explanation. I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an array
> of arrays like someone suggested earlier). It's sad, but I think 32-bit
> is going to be around for a while, and surely I can't be the only person
> to run up against this. :) Not that I mind writing such a library and
> releasing it. I wonder if the Extlib guys would be interested...
If you want your code to be memory efficient then using 32-bit floats via big
arrays would seem like a good idea. If you want your code to be fast then
you'll want a monomorphic data structure to avoid the overhead of polymorphic
function calls (which is often very significant on simple, numerical
container types).
> > A better idea would be to determine exactly what data structure you need:
> > which abstract operations, what performance requirements, etc. C++
> > and Lisp programmers tend to encode everything as arrays or lists,
> > respectively, but quite often these are not the best data structure
> > for the application of interest.
>
> I find your assertion surprising. C++ and Common LISP are by no means
> lacking in standard data structures (and using bare arrays is
> discouraged in C++, as far as I know) and in my experience I haven't
> much seen C++ code that used arrays/vectors when not appropriate.
Yes, I would say that C and Fortran programmers overuse arrays because other
data structures are prohibitively difficult to implement and reuse.
> In any case, in my application (a raytracer) I am reading in lists of
> numbers from a file representing the points of a mesh and the triangles
> that make up the mesh (represented by a list of indices into the mesh
> list). A dynamically resizable, reasonable scalable array seems like
> the best data structure to use.
Why not use a list and then apply Array.of_list?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 12:15 ` Jon Harrop
@ 2006-05-19 19:36 ` akalin
2006-05-19 20:17 ` Oliver Bandel
0 siblings, 1 reply; 67+ messages in thread
From: akalin @ 2006-05-19 19:36 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
> On Friday 19 May 2006 06:57, Frederick Akalin wrote:
>> > A better idea would be to determine exactly what data structure you
>> need:
>> > which abstract operations, what performance requirements, etc. C++
>> > and Lisp programmers tend to encode everything as arrays or lists,
>> > respectively, but quite often these are not the best data structure
>> > for the application of interest.
>>
>> I find your assertion surprising. C++ and Common LISP are by no means
>> lacking in standard data structures (and using bare arrays is
>> discouraged in C++, as far as I know) and in my experience I haven't
>> much seen C++ code that used arrays/vectors when not appropriate.
>
> Yes, I would say that C and Fortran programmers overuse arrays because
> other
> data structures are prohibitively difficult to implement and reuse.
I would agree with you on your statement about C and Fortran. However, I
was talking about C++, which has now grown far beyond its "C with Classes"
legacy. C++'s standard library includes all of the most common data
structures (maps, vectors, lists), so you'd have to go out of your way to
use arrays for everything.
>> In any case, in my application (a raytracer) I am reading in lists of
>> numbers from a file representing the points of a mesh and the triangles
>> that make up the mesh (represented by a list of indices into the mesh
>> list). A dynamically resizable, reasonable scalable array seems like
>> the best data structure to use.
>
> Why not use a list and then apply Array.of_list?
I did try that. Then I ran up against the 4 MB limit. Which led to my
original e-mail complaining about it here... :)
Frederick Akalin
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 19:36 ` akalin
@ 2006-05-19 20:17 ` Oliver Bandel
0 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19 20:17 UTC (permalink / raw)
To: caml-list
On Fri, May 19, 2006 at 03:36:10PM -0400, akalin@akalin.cx wrote:
> > On Friday 19 May 2006 06:57, Frederick Akalin wrote:
> >> > A better idea would be to determine exactly what data structure you
> >> need:
> >> > which abstract operations, what performance requirements, etc. C++
> >> > and Lisp programmers tend to encode everything as arrays or lists,
> >> > respectively, but quite often these are not the best data structure
> >> > for the application of interest.
> >>
> >> I find your assertion surprising. C++ and Common LISP are by no means
> >> lacking in standard data structures (and using bare arrays is
> >> discouraged in C++, as far as I know) and in my experience I haven't
> >> much seen C++ code that used arrays/vectors when not appropriate.
> >
> > Yes, I would say that C and Fortran programmers overuse arrays because
> > other
> > data structures are prohibitively difficult to implement and reuse.
>
> I would agree with you on your statement about C and Fortran. However, I
> was talking about C++, which has now grown far beyond its "C with Classes"
> legacy. C++'s standard library includes all of the most common data
> structures (maps, vectors, lists), so you'd have to go out of your way to
> use arrays for everything.
But IMHO even today these lists are not type-polymorphc like in
OCaml.
==================
first:~ oliver$ ocaml
Objective Caml version 3.09.0
# List.map;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# List.map (fun x -> x * 3 ) [2; 5;1;22;4];;
- : int list = [6; 15; 3; 66; 12]
# List.map (fun str -> str ^ str ) [ "Hi"; "yes"; "hello"; "world" ];;
- : string list = ["HiHi"; "yesyes"; "hellohello"; "worldworld"]
#
==================
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 5:57 ` Frederick Akalin
2006-05-19 6:21 ` Oliver Bandel
2006-05-19 12:15 ` Jon Harrop
@ 2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 20:08 ` Oliver Bandel
` (2 more replies)
2 siblings, 3 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-19 16:28 UTC (permalink / raw)
To: caml-list
On Thu, May 18, 2006 at 22:57:30 -0700, Frederick Akalin wrote:
> >Historical digression: this representation decision was initially
> >taken when designing Caml Light in 1989-1990. At that time, even
> >professional workstations ... Little did I know that the 32-bitters
> >would survive so long. Now, it's 2006, and 64-bit processors are
> >becoming universally available, in desktop machines at least. (I've
> >been running an AMD64 PC at home since january 2005 with absolutely
> >zero problems.) So, no the data representations of OCaml are not
> >going to change to lift the array size limit on 32-bit machines.
>
> I see. This is a perfectly reasonable explanation. I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an
> array of arrays like someone suggested earlier). It's sad, but I
> think 32-bit is going to be around for a while, and surely I can't be
> the only person to run up against this. :) Not that I mind writing
> such a library and releasing it. I wonder if the Extlib guys would be
> interested...
Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
RAM is now standard and 1G RAM is very common for an average PC - having
a programming language with 4M limit for the array size is like to have
an 8+3 characters filename limitation on a filesystem using a mainstream
300G disk in 2006. However I understand the techical difficulty to solve
this issue - it's pretty annoying anyhow :)
Jozef
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 16:28 ` Jozef Kosoru
@ 2006-05-19 20:08 ` Oliver Bandel
2006-05-19 21:26 ` Jon Harrop
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19 20:08 UTC (permalink / raw)
To: caml-list
On Fri, May 19, 2006 at 06:28:44PM +0200, Jozef Kosoru wrote:
> On Thu, May 18, 2006 at 22:57:30 -0700, Frederick Akalin wrote:
> > >Historical digression: this representation decision was initially
> > >taken when designing Caml Light in 1989-1990. At that time, even
> > >professional workstations ... Little did I know that the 32-bitters
> > >would survive so long. Now, it's 2006, and 64-bit processors are
> > >becoming universally available, in desktop machines at least. (I've
> > >been running an AMD64 PC at home since january 2005 with absolutely
> > >zero problems.) So, no the data representations of OCaml are not
> > >going to change to lift the array size limit on 32-bit machines.
> >
> > I see. This is a perfectly reasonable explanation. I ended up just
> > using Bigarrays of floats for now and converting from those to 3-d
> > vectors on demand (what I need), but it would be nice to have a type
> > that wraps around Array that can get around the 4M limit (using an
> > array of arrays like someone suggested earlier). It's sad, but I
> > think 32-bit is going to be around for a while, and surely I can't be
> > the only person to run up against this. :) Not that I mind writing
> > such a library and releasing it. I wonder if the Extlib guys would be
> > interested...
>
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC - having
> a programming language with 4M limit for the array size is like to have
> an 8+3 characters filename limitation on a filesystem using a mainstream
> 300G disk in 2006. However I understand the techical difficulty to solve
> this issue - it's pretty annoying anyhow :)
[...]
Well, my first computer (VC 20/VIC 20) had about 3,5 kB RAM, was an 8-Bit
processor based computer (6502 CPU :)) and had 1 MHz clock frequency :)
Some years later HP came up with their HP 9000 workstation,
32 Bit processors they had! :)
I thought that some years later I also would have such a beast!
But: It took a too long time.... I was a computer-"hater" for many
years, had jumped into analog technics (electrical engineering)
and only came back to programming at the end of the study, when
I merged eletrotechnics and programming (noise simulation and
analog-digital converting in C under DOS or windows, don't know,
but I think it was a x386 processor machine with maybe 100 MHz clock frequency?
looking for quantization errors, yielded by noise - all simulated in C)
What I wanted to say: yes, the technological "revolution" we so often
hear in the media, is much slower than it could be...
...the fastest machines are created for game playing.
So, when the next playstation comes out, I maybe buy one and install
Linux on it. I hope, that there will be a native OCaml-port for it then :)
Best Regards,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 20:08 ` Oliver Bandel
@ 2006-05-19 21:26 ` Jon Harrop
2006-05-20 1:06 ` Brian Hurt
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-19 21:26 UTC (permalink / raw)
To: caml-list
On Friday 19 May 2006 17:28, Jozef Kosoru wrote:
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC - having
> a programming language with 4M limit for the array size is like to have
> an 8+3 characters filename limitation on a filesystem using a mainstream
> 300G disk in 2006.
I find it more concerning that array length is a function of the type, i.e.
different for float array.
> However I understand the techical difficulty to solve
> this issue - it's pretty annoying anyhow :)
Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
and strings as char arrays?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 21:26 ` Jon Harrop
@ 2006-05-20 1:06 ` Brian Hurt
2006-05-20 18:32 ` brogoff
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] " Oliver Bandel
0 siblings, 2 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20 1:06 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
On Fri, 19 May 2006, Jon Harrop wrote:
> Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> and strings as char arrays?
Why not just run Ocaml as a 64-bit app on a 64-bit OS?
We I designing a language today, I'd have 63-bit array lengths- of course,
I'd do it by not bothering to support 32-bit systems...
As for strings, I'd be inclined to make them immutable- the correct way to
manipulate strings is with regular expressions. But I'm widely
acknowledged to be an extremist.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 1:06 ` Brian Hurt
@ 2006-05-20 18:32 ` brogoff
2006-05-20 21:29 ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] " Oliver Bandel
1 sibling, 1 reply; 67+ messages in thread
From: brogoff @ 2006-05-20 18:32 UTC (permalink / raw)
To: caml-list
On Fri, 19 May 2006, Brian Hurt wrote:
> On Fri, 19 May 2006, Jon Harrop wrote:
>
> > Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> > and strings as char arrays?
Strings are random access character collections, so under the hood they might
be arrays, but I think mutable strings as the default string are a mistake.
One can imagine other representations for strings, like ropes. I'd like a
language to allow many different underlying string representations.
Extensible arrays are useful, but they're slower than fixed sized arrays, so
it would be best to provide both.
It would be better if there were no special behavior for float arrays. For
performance, monomorphic FloatArray or even Float32Array, Float64Array,
etc., would be fine with me. And so would the same approach for strings
with different character types. It would get mildly annoying without
overloading, but maybe that would prompt the language designers to address
that issue.
> We I designing a language today, I'd have 63-bit array lengths- of course,
> I'd do it by not bothering to support 32-bit systems...
That would, even today, dramatically reduce the number of potential users
of your language.
> As for strings, I'd be inclined to make them immutable-
I strongly agree with that.
> the correct way to manipulate strings is with regular expressions.
But definitely not with that. REs have their place in the toolbox though.
-- Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: immutable strings II ([Caml-list] Array 4 MB size limit)
2006-05-20 18:32 ` brogoff
@ 2006-05-20 21:29 ` Oliver Bandel
2006-05-22 22:09 ` Aleksey Nogin
0 siblings, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:29 UTC (permalink / raw)
To: caml-list
On Sat, May 20, 2006 at 11:32:15AM -0700, brogoff wrote:
> On Fri, 19 May 2006, Brian Hurt wrote:
> > On Fri, 19 May 2006, Jon Harrop wrote:
> >
> > > Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> > > and strings as char arrays?
>
> Strings are random access character collections, so under the hood they might
> be arrays, but I think mutable strings as the default string are a mistake.
> One can imagine other representations for strings, like ropes. I'd like a
> language to allow many different underlying string representations.
I think it would break too much code when changing the string behaviour now.
There should be a keyword "immutable" to make strings immutable.
Your idea would be to make it like records immutable as default
and then the "mutable" word must be used for mutable strings.
This would also work, but then, as mentioned above, this would berak
a lot of aleady existing code.
A compiler switch, that per default uses mutable strings,
so as to provide backwards compatibility, but can be changed
to immutable strings as default.
This default string representation then could be changed with indivdual
used keywords "mutable" and "immutable".
I don't know how much development effort this would be, but it would be a fine thing.
============++=====================++=======================++
\ default || mutable strings || immutable strings ||
\-------+ || as default || as default ||
keyword \ || (compiler switch) || (compiler switch) ||
===========+++=====================++=======================++
./. || all strings are || all strings are ||
|| mutable || immutable ||
============++=====================++=======================++
mutable || all strings are || all strings that ||
|| mutable || are not declared as ||
|| || mutable are immutable ||
============++=====================++=======================++
immutable || all strings that || all strings are ||
|| are not declared as || immutable ||
|| immutable are || ||
|| mutable || ||
============++=====================++=======================++
In short words: The compiler flag would give the default string representation
in use, and the keywords "mutable" and "immutable" used for individual strings
would overwrite that default individally.
And as default for the compiler - for backward compatibility reasons -
would then be used the mutable string representation.
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: immutable strings II ([Caml-list] Array 4 MB size limit)
2006-05-20 21:29 ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
@ 2006-05-22 22:09 ` Aleksey Nogin
0 siblings, 0 replies; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-22 22:09 UTC (permalink / raw)
To: Caml List
On 20.05.2006 14:29, Oliver Bandel wrote:
> On Sat, May 20, 2006 at 11:32:15AM -0700, brogoff wrote:
>
>>Strings are random access character collections, so under the hood they might
>>be arrays, but I think mutable strings as the default string are a mistake.
>>One can imagine other representations for strings, like ropes. I'd like a
>>language to allow many different underlying string representations.
>
> I think it would break too much code when changing the string behaviour now.
>
> There should be a keyword "immutable" to make strings immutable.
> Your idea would be to make it like records immutable as default
> and then the "mutable" word must be used for mutable strings.
>
> This would also work, but then, as mentioned above, this would berak
> a lot of aleady existing code.
>
> A compiler switch, that per default uses mutable strings,
> so as to provide backwards compatibility, but can be changed
> to immutable strings as default.
> This default string representation then could be changed with indivdual
> used keywords "mutable" and "immutable".
I would really love to have such a compiler switch (even without the
keywords). We currently have a huge amount of code that assumes that all
the strings are immutable. This is "sort of OK" as long as we do not
interface with anything that would mutate strings, but, of course, I
would love for the compiler to actually enforce this with some sort of
"immutable strings" type.
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200
^ permalink raw reply [flat|nested] 67+ messages in thread
* immutable strings (Re: [Caml-list] Array 4 MB size limit)
2006-05-20 1:06 ` Brian Hurt
2006-05-20 18:32 ` brogoff
@ 2006-05-20 21:11 ` Oliver Bandel
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
1 sibling, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:11 UTC (permalink / raw)
To: caml-list
On Fri, May 19, 2006 at 09:06:27PM -0400, Brian Hurt wrote:
>
>
> On Fri, 19 May 2006, Jon Harrop wrote:
>
> >Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> >and strings as char arrays?
>
> Why not just run Ocaml as a 64-bit app on a 64-bit OS?
>
> We I designing a language today, I'd have 63-bit array lengths- of course,
> I'd do it by not bothering to support 32-bit systems...
>
> As for strings, I'd be inclined to make them immutable- the correct way to
> manipulate strings is with regular expressions. But I'm widely
> acknowledged to be an extremist.
IMHO strings should be possibly made immutable by using the "immutable"
keyword, which is the opposite of the "mutable" keyword as it is used for
records. So the user/programmer can chose the kind of strings he7she wants.
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: immutable strings (Re: Array 4 MB size limit)
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] " Oliver Bandel
@ 2006-05-25 4:32 ` Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 11:18 ` Brian Hurt
0 siblings, 2 replies; 67+ messages in thread
From: Stefan Monnier @ 2006-05-25 4:32 UTC (permalink / raw)
To: caml-list
> IMHO strings should be possibly made immutable by using the "immutable"
> keyword, which is the opposite of the "mutable" keyword as it is used for
> records. So the user/programmer can chose the kind of strings he7she wants.
I think it's OK to have (mutable) byte arrays, but strings should simply
always be immutable.
Stefan
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
@ 2006-05-25 5:56 ` Martin Jambon
2006-05-25 7:23 ` j h woodyatt
` (2 more replies)
2006-05-25 11:18 ` Brian Hurt
1 sibling, 3 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25 5:56 UTC (permalink / raw)
To: Stefan Monnier; +Cc: caml-list
On Thu, 25 May 2006, Stefan Monnier wrote:
>> IMHO strings should be possibly made immutable by using the "immutable"
>> keyword, which is the opposite of the "mutable" keyword as it is used for
>> records. So the user/programmer can chose the kind of strings he7she wants.
>
> I think it's OK to have (mutable) byte arrays, but strings should simply
> always be immutable.
OCaml strings are compact byte arrays which serve their purpose well.
Having a whole different type for immutable strings is in my opinion a
waste of energy. The problem is that freezing or unfreezing a string
safely involves a copy of the whole string. And obviously it's not
possible to handle only immutable strings since somehow you have to create
them, and unlike record fields, they won't be set in one operation but in
n operations, n being the length of the string.
So I'd really love to see actual examples where using immutable strings
would be such an improvement over mutable strings.
If the problem is just to ensure that string data won't be changed by the
user of a library, then it is trivial using module signatures and
String.copy for the conversions.
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
@ 2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
2006-05-25 11:14 ` Brian Hurt
2006-05-25 17:31 ` Aleksey Nogin
2 siblings, 2 replies; 67+ messages in thread
From: j h woodyatt @ 2006-05-25 7:23 UTC (permalink / raw)
To: The Caml Trade
On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
>
> So I'd really love to see actual examples where using immutable
> strings would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed
> by the user of a library, then it is trivial using module
> signatures and String.copy for the conversions.
I have no dog in this fight, but I can imagine a pragmatic approach
that might satisfy some of these concerns without introducing much in
the way of extra runtime complexity costs.
Change the way strings are boxed so that there is an additional tag
for immutable strings as opposed to the normal mutable ones. In all
respects, allow string objects with either tag to be treated the same
by functions that do not modify the content of the string. When the
"safe" variants of the string mutation functions are called on a
string object with the immutable tag, throw a runtime exception.
Finally, provide a function that changes the tag of a string from
mutable to immutable, i.e. like a special blessed case of
Obj.set_tag. Not from immutable to mutable, though-- that should be
disallowed. If the -unsafe is not set, then allow a new compiler
option -constliteral (or something) that would make all string
literals immutable. You'd need to use String.copy to get a mutable
string with contents copied from an immutable one.
Of course, I can't offer a convincing reason to believe this would
bring an improvement over what we have today. My proposal would not
give the programmer any assurance that contents of string values
passed as parameters to library functions are not modified by the
library, i.e. the "unsafe" functions would still work. It *might*
give a pragmatic benefit of surfacing unintentional programming
errors earlier than they might otherwise be found, but I'm not
prepared to argue in defense of that.
If you did immutable strings this way, it would be nice for the sake
of consistency to have a comparable feature for arrays.
—
j h woodyatt <jhw@conjury.org>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 7:23 ` j h woodyatt
@ 2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
1 sibling, 0 replies; 67+ messages in thread
From: Jon Harrop @ 2006-05-25 10:22 UTC (permalink / raw)
To: caml-list
On Thursday 25 May 2006 08:23, j h woodyatt wrote:
> Change the way strings are boxed so that there is an additional tag
> for immutable strings as opposed to the normal mutable ones. In all
> respects, allow string objects with either tag to be treated the same
> by functions that do not modify the content of the string. When the
> "safe" variants of the string mutation functions are called on a
> string object with the immutable tag, throw a runtime exception.
Blech. I didn't come all this way for dynamic typing... ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
@ 2006-05-25 19:28 ` Oliver Bandel
1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:28 UTC (permalink / raw)
To: caml-list
On Thu, May 25, 2006 at 12:23:51AM -0700, j h woodyatt wrote:
> On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
> >
> >So I'd really love to see actual examples where using immutable
> >strings would be such an improvement over mutable strings.
> >If the problem is just to ensure that string data won't be changed
> >by the user of a library, then it is trivial using module
> >signatures and String.copy for the conversions.
>
> I have no dog in this fight, but I can imagine a pragmatic approach
> that might satisfy some of these concerns without introducing much in
> the way of extra runtime complexity costs.
>
> Change the way strings are boxed so that there is an additional tag
> for immutable strings as opposed to the normal mutable ones. In all
> respects, allow string objects with either tag to be treated the same
> by functions that do not modify the content of the string. When the
> "safe" variants of the string mutation functions are called on a
> string object with the immutable tag, throw a runtime exception.
[...]
Compiletime error would be what I would prefer.
[...]
> It *might*
> give a pragmatic benefit of surfacing unintentional programming
> errors earlier than they might otherwise be found,
[...]
The earlier the better it is.
It's like non-matching types found by the compiler: it helps
a lot in development. Debugging normallay takes the most time
in development; and with OCaml debugging time decreases much - that
is what I experienced. And this comes from early getting a hint
to possible problems (possible with compilers that would accept
such code; im possible when the compiler says: "no! I don't accept
these sources!").
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
@ 2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
2006-05-25 17:31 ` Aleksey Nogin
2 siblings, 2 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-25 11:14 UTC (permalink / raw)
To: Martin Jambon; +Cc: Stefan Monnier, caml-list
On Wed, 24 May 2006, Martin Jambon wrote:
> And obviously it's not possible to handle only immutable strings since
> somehow you have to create them, and unlike record fields, they won't be
> set in one operation but in n operations, n being the length of the
> string.
This is the advantage of regular expressions- they can be implemented on
immutable strings (regular expressions don't modify the old strings, they
produce new strings), and can change large chunks of the string in one
"operation", limiting the number of unnecessary string copies you need to
do. Yes, the underlying implementation would be mutable- but so is they
underlying implementation of everything else. The semantics remain
applicative.
Note that this also eliminates an inefficiency of ocaml. When you use a
constant string, it has to allocate a new string for you. Consider
Ocaml's response to the expression:
"foo" == "foo";;
The reason ocaml returns false in the above expression is that when you
give a constant string in an expression, Ocaml first has to make a copy of
that string (in case you modify it), and then it returns the copy. So it
makes two copies of the string "foo" above, and then compares them for
referential equality- which, since they're different copies, returns
false.
Note that even pure-imperitive languages, like C/C++, hit this problem.
What is the output of the following peice of C code:
for (i = 0; i < 10; ++i) {
char * p = "foo";
p[0] += 1;
printf("%s\n", p);
}
C/C++ itself wants *some* strings to be immutable. Fortran used to let
you change the value of 2, which made for some interesting (in the bad
sense) code tricks. I comment that they've since fixed this. But mutable
strings allow you to change the value of "two", which is almost as bad.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:14 ` Brian Hurt
@ 2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:42 UTC (permalink / raw)
To: caml-list
On Thu, May 25, 2006 at 07:14:55AM -0400, Brian Hurt wrote:
[...]
> Note that even pure-imperitive languages, like C/C++, hit this problem.
> What is the output of the following peice of C code:
> for (i = 0; i < 10; ++i) {
> char * p = "foo";
> p[0] += 1;
> printf("%s\n", p);
> }
[...]
Heheh, this brings a "Bus Error" (which one might call an "exception" ;-)
(an exception th C'ish way)).
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
@ 2006-05-26 6:51 ` Alain Frisch
1 sibling, 0 replies; 67+ messages in thread
From: Alain Frisch @ 2006-05-26 6:51 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
Brian Hurt wrote:
> Note that this also eliminates an inefficiency of ocaml. When you use a
> constant string, it has to allocate a new string for you. Consider
> Ocaml's response to the expression:
>
> "foo" == "foo";;
>
> The reason ocaml returns false in the above expression is that when you
> give a constant string in an expression, Ocaml first has to make a copy
> of that string (in case you modify it), and then it returns the copy.
This is not true. The two strings above are physically different because
the two constants appear at different location of the source.
The expression
let f () = "foo" in f () == f ()
evaluates to true. The block for "foo" is allocated only once; the
string is not copied not each time the body of f is evaluated.
As a consequence,
let f () = "foo" in (f ()).[0] <- 'F'; f ()
returns "Foo".
-- Alain
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
2006-05-25 11:14 ` Brian Hurt
@ 2006-05-25 17:31 ` Aleksey Nogin
2006-05-25 19:54 ` Martin Jambon
2 siblings, 1 reply; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:31 UTC (permalink / raw)
To: Caml List; +Cc: Martin Jambon
On 24.05.2006 22:56, Martin Jambon wrote:
>> I think it's OK to have (mutable) byte arrays, but strings should simply
>> always be immutable.
>
> OCaml strings are compact byte arrays which serve their purpose well.
Yes, however immutable strings are also very useful and that
functionality is simply missing in OCaml. The usage I am very interested
in is essentially using strings as "printable tokens". In other words, a
data type that is easy to compare and has an obvious I/O representation.
> Having a whole different type for immutable strings is in my opinion a
> waste of energy. The problem is that freezing or unfreezing a string
> safely involves a copy of the whole string. And obviously it's not
> possible to handle only immutable strings since somehow you have to
> create them, and unlike record fields, they won't be set in one
> operation but in n operations, n being the length of the string.
This is not true. All I want is having a purely functional interface with:
- Constants (a compiler flag for turning "..." constants into immutable
strings instead of mutable ones).
- Inputing from a channel
- Concatenation
- Things like string_of_int for immutable string.
Of course, it might be the case that the standard library might have to
use some sort of "unsafe" operations that would "inappropriately" mutate
the newly created immutable string buffer, but this is IMHO no different
than how the unsafe operations are already used in standard library for
arrays and strings.
> So I'd really love to see actual examples where using immutable strings
> would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed by
> the user of a library, then it is trivial using module signatures and
> String.copy for the conversions.
Such a copy operation can be extremely prohibitive in a setting that
assumes that a data structure is immutable and tries really hard to
preserve sharing (including using functions like a sharing-preserving
version of map (*), etc). In such a setting, these extra copies can
potentially have a devastating effect on memory usage, cache
performance, etc. And this situation is exactly what we have in our
MetaPRL project - there we have resorted to simply using strings and
pretending they are immutable, but this is clearly suboptimal.
----
(*)
let rec smap f = function
[] -> []
| (hd :: tl) as l ->
let hd' = f hd in
let tl' = smap f tl in
if hd == hd' && tl == tl' then l else hd' :: tl'
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 17:31 ` Aleksey Nogin
@ 2006-05-25 19:54 ` Martin Jambon
0 siblings, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25 19:54 UTC (permalink / raw)
To: Caml List; +Cc: Martin Jambon
On Thu, 25 May 2006, Aleksey Nogin wrote:
> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>> OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is
> simply missing in OCaml. The usage I am very interested in is essentially
> using strings as "printable tokens". In other words, a data type that is easy
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a
>> waste of energy. The problem is that freezing or unfreezing a string safely
>> involves a copy of the whole string. And obviously it's not possible to
>> handle only immutable strings since somehow you have to create them, and
>> unlike record fields, they won't be set in one operation but in n
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.
Isn't it a bit limited? What if I want other functions?
But if it satisfies you, you can do the syntax part with an unsafe_freeze
function and a bit of camlp4. The rest is just plain old OCaml.
> Of course, it might be the case that the standard library might have to use
> some sort of "unsafe" operations that would "inappropriately" mutate the
> newly created immutable string buffer, but this is IMHO no different than how
> the unsafe operations are already used in standard library for arrays and
> strings.
I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of
your suggestions. This strongly constrasts with mistakes in array/string
indices which happen all the time.
>> So I'd really love to see actual examples where using immutable strings
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the
>> user of a library, then it is trivial using module signatures and
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes
> that a data structure is immutable and tries really hard to preserve sharing
> (including using functions like a sharing-preserving version of map (*),
> etc). In such a setting, these extra copies can potentially have a
> devastating effect on memory usage, cache performance, etc. And this
> situation is exactly what we have in our MetaPRL project - there we have
> resorted to simply using strings and pretending they are immutable, but this
> is clearly suboptimal.
Yes, so how do you avoid copies without using the "unsafe" conversions all
over the place?
> ----
> (*)
> let rec smap f = function
> [] -> []
> | (hd :: tl) as l ->
> let hd' = f hd in
> let tl' = smap f tl in
> if hd == hd' && tl == tl' then l else hd' :: tl'
In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it
doesn't break sharing:
let freeze s =
let s' = make_constant s (* using a copy! *) in
if s' is in the table then return the element from the table
else add s' and return s'
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
@ 2006-05-25 11:18 ` Brian Hurt
2006-05-25 17:34 ` Aleksey Nogin
1 sibling, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-25 11:18 UTC (permalink / raw)
To: caml-list
I wish to reiterate that my comments on immutable strings are solely in
the context of designing a new language, and should not be construed to
indicate a desire to change Ocaml. Changing this behavior in Ocaml would
break huge amounts of code (including some of my code).
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 11:18 ` Brian Hurt
@ 2006-05-25 17:34 ` Aleksey Nogin
2006-05-25 18:44 ` Tom
0 siblings, 1 reply; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:34 UTC (permalink / raw)
To: Caml List
On 25.05.2006 04:18, Brian Hurt wrote:
> I wish to reiterate that my comments on immutable strings are solely in
> the context of designing a new language, and should not be construed to
> indicate a desire to change Ocaml.
Well, I am advocating a change. IMHO the immutable strings is a very
useful concept and I would really like to see it in OCaml.
> Changing this behavior in Ocaml
> would break huge amounts of code (including some of my code).
It is obvious that at this point changing the default behavior would not
be possible. However, adding the new "immutable strings" type would
definitely be great and adding a compiler flag that would turn all
string constants, "^", etc in a given module into immutable string
operations would be nice as well and would not break any existing code.
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 17:34 ` Aleksey Nogin
@ 2006-05-25 18:44 ` Tom
2006-05-25 23:00 ` Jon Harrop
0 siblings, 1 reply; 67+ messages in thread
From: Tom @ 2006-05-25 18:44 UTC (permalink / raw)
To: Caml List
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
On 25/05/06, Aleksey Nogin <nogin@cs.caltech.edu> wrote:
>
> On 25.05.2006 04:18, Brian Hurt wrote:
>
> > Changing this behavior in Ocaml
> > would break huge amounts of code (including some of my code).
>
> It is obvious that at this point changing the default behavior would not
> be possible. However, adding the new "immutable strings" type would
> definitely be great and adding a compiler flag that would turn all
> string constants, "^", etc in a given module into immutable string
> operations would be nice as well and would not break any existing code.
But why don't you add such a type yourself?
[-- Attachment #2: Type: text/html, Size: 943 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 18:44 ` Tom
@ 2006-05-25 23:00 ` Jon Harrop
2006-05-25 23:15 ` Martin Jambon
0 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-25 23:00 UTC (permalink / raw)
To: caml-list
On Thursday 25 May 2006 19:44, Tom wrote:
[immutable strings]
> But why don't you add such a type yourself?
You would have to make your new type abstract in order to distinguish it from
the build-in mutable string type. Then you lose the ability to do pattern
matching and s.[i].
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-25 23:00 ` Jon Harrop
@ 2006-05-25 23:15 ` Martin Jambon
0 siblings, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25 23:15 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
On Fri, 26 May 2006, Jon Harrop wrote:
> On Thursday 25 May 2006 19:44, Tom wrote:
> [immutable strings]
>> But why don't you add such a type yourself?
>
> You would have to make your new type abstract in order to distinguish it from
> the build-in mutable string type. Then you lose the ability to do pattern
> matching and s.[i].
The former problem can be solved with camlp4 (which would use a "unsafe"
function to do the type conversion). The latter can be solved with camlp4
too, or simply by opening a module which defines a String module which
provides a "get" function.
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 20:08 ` Oliver Bandel
2006-05-19 21:26 ` Jon Harrop
@ 2006-05-20 0:57 ` Brian Hurt
2006-05-20 1:17 ` Frederick Akalin
` (2 more replies)
2 siblings, 3 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20 0:57 UTC (permalink / raw)
To: Jozef Kosoru; +Cc: caml-list
On Fri, 19 May 2006, Jozef Kosoru wrote:
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC
This is what boggles my imagination. The combination of obstinate
adherence to a limited platform (x86-32) in the same breath as a
recognition that we are approaching the limitations of that architecture.
It's the 640K limit all over again.
And no, segments will not help the situation. Every single process is
limited to 4G of address space, period. Read the Intel CPU docs. With
reasonable amounts of virtual memory we're well above that already- and
we're approaching that with real memory.
Now, I realize the core reasons for the delay in moving to 64-bits are
industry wide. Intel's Itanium fiasco delayed Intel introducing a 64-bit
chip at least 7 years. And Microsoft seems to be incapable of releasing a
new OS- 32-bit or 64-bit. But that, I think, is the core of the problem-
Ocaml's array limit is just one of many symptoms.
And that's my point- we should be looking to fix the underlying problem,
not looking to patch the symptoms. Because often times patching the
symptoms and not addressing the core problem simply makes the whole
situation worse- the underlying problem simply shows up in new ways, and
the fix for the specific symptom often causes new problems.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
@ 2006-05-20 1:17 ` Frederick Akalin
2006-05-20 1:52 ` Brian Hurt
2006-05-21 9:26 ` Richard Jones
2006-05-20 10:51 ` Jozef Kosoru
2006-05-20 15:15 ` Don Syme
2 siblings, 2 replies; 67+ messages in thread
From: Frederick Akalin @ 2006-05-20 1:17 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
On May 19, 2006, at 5:57 PM, Brian Hurt wrote:
> Now, I realize the core reasons for the delay in moving to 64-bits
> are industry wide. Intel's Itanium fiasco delayed Intel
> introducing a 64-bit chip at least 7 years. And Microsoft seems to
> be incapable of releasing a new OS- 32-bit or 64-bit. But that, I
> think, is the core of the problem- Ocaml's array limit is just one
> of many symptoms.
>
> And that's my point- we should be looking to fix the underlying
> problem, not looking to patch the symptoms. Because often times
> patching the symptoms and not addressing the core problem simply
> makes the whole situation worse- the underlying problem simply
> shows up in new ways, and the fix for the specific symptom often
> causes new problems.
>
> Brian
I think that's an awfully simplistic point of view. My problem is
that I want to store more than 4 million items in an array. You're
suggesting moving to 64 bits? And even if everyone magically moves
to 64 bits, I can imagine that in the future someone will want to
read more than 2^(64 - 10) items from a file into memory, as 64 bits
isn't quite "number of atoms in the universe" big yet.
Fred
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 1:17 ` Frederick Akalin
@ 2006-05-20 1:52 ` Brian Hurt
2006-05-20 9:08 ` Jozef Kosoru
2006-05-21 9:26 ` Richard Jones
1 sibling, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-20 1:52 UTC (permalink / raw)
To: Frederick Akalin; +Cc: caml-list
On Fri, 19 May 2006, Frederick Akalin wrote:
> I think that's an awfully simplistic point of view. My problem is that I
> want to store more than 4 million items in an array. You're suggesting
> moving to 64 bits? And even if everyone magically moves to 64 bits, I can
> imagine that in the future someone will want to read more than 2^(64 - 10)
> items from a file into memory, as 64 bits isn't quite "number of atoms in the
> universe" big yet.
And in 20 years or so, assuming Moore's law holds, we'll need to
transition to 128-bit architectures. Servers should probably transistion
a little sooner, maybe 15 years out.
Of course, a large hunk of my annoyance here is comming from C-99 and long
long- breaking conformant code in a desperate (and- might I add- futile)
attempt to patch broken and non-conformant code around what will be a
temporary problem (the only question left is the definition of temporary).
Eventually, just going to 64 bits will be the solution- just like just
going to 32 bits was the solution to the 640K limit.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 1:52 ` Brian Hurt
@ 2006-05-20 9:08 ` Jozef Kosoru
2006-05-20 10:12 ` skaller
[not found] ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
0 siblings, 2 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 9:08 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
On Fri, May 19, 2006 at 21:52:26 -0400, Brian Hurt wrote:
> >I think that's an awfully simplistic point of view. My problem is
> >that I want to store more than 4 million items in an array. You're
> >suggesting moving to 64 bits? And even if everyone magically moves
> >to 64 bits, I can imagine that in the future someone will want to
> >read more than 2^(64 - 10) items from a file into memory, as 64 bits
> >isn't quite "number of atoms in the universe" big yet.
>
> And in 20 years or so, assuming Moore's law holds, we'll need to
> transition to 128-bit architectures. Servers should probably
> transistion a little sooner, maybe 15 years out.
Sure. Average consumers in 20 years, servers in 15 years and OCaml users
in 5 years (remember N-10!).
jozef
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 9:08 ` Jozef Kosoru
@ 2006-05-20 10:12 ` skaller
2006-05-20 11:06 ` Jozef Kosoru
2006-05-20 21:42 ` Oliver Bandel
[not found] ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
1 sibling, 2 replies; 67+ messages in thread
From: skaller @ 2006-05-20 10:12 UTC (permalink / raw)
To: Jozef Kosoru; +Cc: Brian Hurt, caml-list
On Sat, 2006-05-20 at 11:08 +0200, Jozef Kosoru wrote:
> Sure. Average consumers in 20 years, servers in 15 years and OCaml users
> in 5 years (remember N-10!).
I'm not so sure it will happen like that. Look more at:
Today, newer desktops are all dual core. We've gone from
1 processor to 2. How long will it take to get 4?
It surely will not be 20 years.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 10:12 ` skaller
@ 2006-05-20 11:06 ` Jozef Kosoru
2006-05-20 12:02 ` skaller
2006-05-20 21:42 ` Oliver Bandel
1 sibling, 1 reply; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 11:06 UTC (permalink / raw)
To: skaller; +Cc: caml-list
On Sat, May 20, 2006 at 20:12:22 +1000, skaller wrote:
> > Sure. Average consumers in 20 years, servers in 15 years and OCaml
> > users in 5 years (remember N-10!).
>
> I'm not so sure it will happen like that. Look more at:
>
> Today, newer desktops are all dual core. We've gone from
> 1 processor to 2. How long will it take to get 4?
Yes, but that doesn't change the fact 99% of PCs out there still have a
single core CPU. Besides, adding more cores on the chip is backward
compatible, it isn't the quite same thing as changing the CPU
architecture.
> It surely will not be 20 years.
I think the migration from 64-bit to 128-bit architecture will take more
than 20 years. But such things are generally very hard to predict.
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 11:06 ` Jozef Kosoru
@ 2006-05-20 12:02 ` skaller
0 siblings, 0 replies; 67+ messages in thread
From: skaller @ 2006-05-20 12:02 UTC (permalink / raw)
To: Jozef Kosoru; +Cc: caml-list
On Sat, 2006-05-20 at 13:06 +0200, Jozef Kosoru wrote:
>
> > It surely will not be 20 years.
>
> I think the migration from 64-bit to 128-bit architecture will take more
> than 20 years. But such things are generally very hard to predict.
I'm not disputing that -- I was asking how long it will
take until we have 4-core CPU chips in commercial quantities
for desktops.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 10:12 ` skaller
2006-05-20 11:06 ` Jozef Kosoru
@ 2006-05-20 21:42 ` Oliver Bandel
2006-05-21 1:24 ` skaller
1 sibling, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:42 UTC (permalink / raw)
To: caml-list
On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:
> On Sat, 2006-05-20 at 11:08 +0200, Jozef Kosoru wrote:
>
> > Sure. Average consumers in 20 years, servers in 15 years and OCaml users
> > in 5 years (remember N-10!).
>
> I'm not so sure it will happen like that. Look more at:
>
> Today, newer desktops are all dual core. We've gone from
> 1 processor to 2. How long will it take to get 4?
>
> It surely will not be 20 years.
[...]
Two dual core G5:
http://www.apple.com/powermac/
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 21:42 ` Oliver Bandel
@ 2006-05-21 1:24 ` skaller
2006-05-21 14:10 ` Oliver Bandel
0 siblings, 1 reply; 67+ messages in thread
From: skaller @ 2006-05-21 1:24 UTC (permalink / raw)
To: Oliver Bandel; +Cc: caml-list
On Sat, 2006-05-20 at 23:42 +0200, Oliver Bandel wrote:
> On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:
> Two dual core G5:
>
> http://www.apple.com/powermac/
We're looking at how N will increase in N-core CPU chips.
There are plenty of boards that support
multiple CPUs (you can get a 8x Opteron board, that's 16 cores,
and not really that expensive -- the CPUs are though :).
'In theory' CISC should die, RISC should support higher N
on the same wafer.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-21 1:24 ` skaller
@ 2006-05-21 14:10 ` Oliver Bandel
0 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-21 14:10 UTC (permalink / raw)
To: caml-list
On Sun, May 21, 2006 at 11:24:34AM +1000, skaller wrote:
> On Sat, 2006-05-20 at 23:42 +0200, Oliver Bandel wrote:
> > On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:
>
> > Two dual core G5:
> >
> > http://www.apple.com/powermac/
>
> We're looking at how N will increase in N-core CPU chips.
>
> There are plenty of boards that support
> multiple CPUs (you can get a 8x Opteron board, that's 16 cores,
> and not really that expensive -- the CPUs are though :).
>
> 'In theory' CISC should die, RISC should support higher N
> on the same wafer.
[...]
Please do not forget that the G5 is a 64-Bit processor!
http://www.apple.com/g5processor/
So if you have a 2 x DualCore G5 then you have
four 64-Bit Cores on your computer.
Even when the Processors then switch to 64 Bits,
the next problem would be to have RAMs available
that are big enough to use that.
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>]
* Re: [Caml-list] Array 4 MB size limit
[not found] ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
@ 2006-05-20 19:52 ` Jozef Kosoru
2006-05-20 21:45 ` Oliver Bandel
0 siblings, 1 reply; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 19:52 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
On Sat, May 20, 2006 at 09:02:56 -0400, Brian Hurt wrote:
> >Sure. Average consumers in 20 years, servers in 15 years and OCaml
> >users in 5 years (remember N-10!).
>
> No. Ocaml users in about 20 years.
>
> The math:
>
> [...]
>
> Ocaml users actually won't start hitting problems until after that-
> probably more like 2050 or so. It's N-10, remember? It's not until
> memory sizes start hitting the peta-byte sizes- 2^51 elements being
> held in 2^54 bytes of memory, that 64-bit Ocaml starts hitting
> problems. 10 bits out of 32 is a much larger percentage than 10 bits
> out of 64.
>
> [...]
>
> And remember that this is assuming that Moore's law continues in it's
> exponential pace for the next 50-odd years or more. I'd have to rate
> this as *highly* unlikely. If we fall off of the exponential growth
> curve at any point much before 2040 or so, the switch to 128-bit
> architectures may not happen (or may not happen for centuries).
>
> Sorry, the alarmist "if 32 bits isn't enough, than neither is 64!"
> argument doesn't hold water.
Ok, perhaps you are correct. But that doesn't help programmers facing
this problem on 32-bit x86 architecture now.
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 1:17 ` Frederick Akalin
2006-05-20 1:52 ` Brian Hurt
@ 2006-05-21 9:26 ` Richard Jones
[not found] ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
1 sibling, 1 reply; 67+ messages in thread
From: Richard Jones @ 2006-05-21 9:26 UTC (permalink / raw)
To: caml-list
On Fri, May 19, 2006 at 06:17:38PM -0700, Frederick Akalin wrote:
> I think that's an awfully simplistic point of view. My problem is
> that I want to store more than 4 million items in an array. You're
> suggesting moving to 64 bits?
We moved our servers and development machines to 64 bits (AMD64
specifically) a while back and haven't regretted the decision at all.
You can buy low-end 64 bit processors and motherboards for peanuts
these days.
Rich.
--
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20 1:17 ` Frederick Akalin
@ 2006-05-20 10:51 ` Jozef Kosoru
2006-05-20 14:22 ` Brian Hurt
2006-05-20 22:07 ` Oliver Bandel
2006-05-20 15:15 ` Don Syme
2 siblings, 2 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 10:51 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
On Fri, May 19, 2006 at 20:57:35 -0400, Brian Hurt wrote:
> >Yes. 32-bit x86 platform is not going away anytime soon. Given that
> >512M RAM is now standard and 1G RAM is very common for an average PC
>
> This is what boggles my imagination. The combination of obstinate
> adherence to a limited platform (x86-32) in the same breath as a
> recognition that we are approaching the limitations of that
> architecture. It's the 640K limit all over again.
>
> And no, segments will not help the situation. Every single process is
> limited to 4G of address space, period. Read the Intel CPU docs.
> With reasonable amounts of virtual memory we're well above that
> already- and we're approaching that with real memory.
4G address space limit is off-topic here. Please note that the OCaml
array size limit is 4M not 4G.
> Now, I realize the core reasons for the delay in moving to 64-bits are
> industry wide. Intel's Itanium fiasco delayed Intel introducing a
> 64-bit chip at least 7 years. And Microsoft seems to be incapable of
> releasing a new OS- 32-bit or 64-bit. But that, I think, is the core
> of the problem- Ocaml's array limit is just one of many symptoms.
>
> And that's my point- we should be looking to fix the underlying
> problem, not looking to patch the symptoms. Because often times
> patching the symptoms and not addressing the core problem simply makes
> the whole situation worse- the underlying problem simply shows up in
> new ways, and the fix for the specific symptom often causes new
> problems.
I disagree. Actually I think an opposite. A suggestion to move to 64-bit
is just patching the symptoms of the real problem - a design flaw in the
OCaml programming language. And this issue will not go away on 64-bits.
Just like 22 bits for the array size is a completely artificial limit on
32-bit platforms, (N - 10) imposes once again such an unlogical 54 bits
limit on 64-bit computers. And I don't think "it makes complete sense on
a 64-bit architecture".
If you create OCaml programs for yourself then maybe you can switch to
64-bits overnight (and perpahs you already did so). But if you need to
distribute your application to customers/users then it's much more
complicated. And explaining them that they should upgrade all
workstations because your software has various 4M limitations here and
there sounds like a bad joke, doesn't it?
32-bit x86 platform is still good enough for many purposes and its 2^32
limit is acceptable in most cases while 2^22 is not.
Jozef
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 10:51 ` Jozef Kosoru
@ 2006-05-20 14:22 ` Brian Hurt
2006-05-20 18:41 ` j h woodyatt
` (2 more replies)
2006-05-20 22:07 ` Oliver Bandel
1 sibling, 3 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20 14:22 UTC (permalink / raw)
To: Jozef Kosoru; +Cc: caml-list
On Sat, 20 May 2006, Jozef Kosoru wrote:
> 4G address space limit is off-topic here. Please note that the OCaml
> array size limit is 4M not 4G.
It is entirely on-topic. How much headroom do you think you have?
Let's perform an experiment here. Let's assume that Xavier decides you're
right, and releases a new version of Ocaml with removes the 4M array
limit. How big of an array will you be able to create then? More than
4M, yes- but how much more?
The hardware imposes a hard and fast 4G limit. On 32-bit x86 you can get
a process larger than this. But note, this includes everything- not just
4G of data, but also all code, stack, shared libraries, address space set
aside for the OS, everything. The OS alone wants to grab at least 1G, if
not 2G of address space (remember that it needs to mmap that hot new 512M
graphics card you just bought- so claiming 1G as a minimum isn't extreme,
as 512M will go to the graphics card). To make life simple, let's say
that everything except the huge array will fit in the upper 2G, leaving 2G
for the huge array.
Now if it's a huge array of integers, each element only takes up 4 bytes,
which means we can fit 512M integers in this array. This sounds nice- on
paper. Wow, we could make an array 128 times as big as we can now!
Except the proper way to look at this is that it's a mean 7 bits larger.
And that's best case.
If we changed it to an array of floats, each float now takes 8 bytes, and
that 2G can only hold 256M floats. Our 128x improvement has now dropped
to a 64x improvement- and we're starting to get a bad feeling about where
this is going.
If we start trying to hold 3-dimensional pointers, that's 3 floats- and
our 64x improvement has now dropped to about a 21x improvement. Skaller's
about to pipe up right about here and proclaim the advantages of
single-precision floats, which would get you back up to about a 42x
improvement. But also note that I'm ignoring the boxing costs here-
adding the tag word for the structure plus the pointer in the array (a
structure is boxed) ups us from a size of 24 bytes to 32 bytes (for double
precision- 12 bytes to 20 bytes for single precision), making our 21x
improvement (42x improvement) a mere 16x (25x) improvement.
Now, what happens if we try to hold a triangle of three points (i.e. for
3D polygon mapping)? How many polygons can we hold? Well, a polygon
takes 9 floats (3 floats each for 3 points), so that's 72 bytes there.
Assume the polygon is a structure and each point is a structure, and that
means 4 tag words (1 for each of the 3 points + 1 for the polygon
structure) and 4 pointers of 4 bytes each, for a grand total of 104 bytes
per polygon. 2G can now hold about 19M unique polygons- a mere 5x
advantage.
The 4M limit is "artificial" and "low"- but not by as much as you might
think.
And the 4G limit starts hitting you in other ways. Consider the classic
programming pattern of reading the entire file into a list (or array) of
strings. Considering that I can get a 300G hard disk for about $110:
http://froogle.google.com/froogle_cluster?q=SATA+hard+disk&pid=4829187676318423498&oid=15465117802862862360&btnG=Search+Froogle&scoring=mrd&hl=en
It costs about $0.73USD to buy 2G of hard disk space. I try to slurp that
73 cent file into memory, and boom- welcome to the 4G limit. Heck, you
have to do special incantations just to open and read a file that big, let
only trying to store the whole thing in memory. This is why I claim that
once hard disks start getting larger than the address space, the address
space needs to increase.
And I'm not even going to go into "subtle" bugs, like ptrdiff_t overflows,
or the fact that as the adress space fills, it's more likely a wild
pointer will point to valid memory, and thus cause harder to debug
problems than simple segfaults, here.
On the other side, increasing the array size has a cost. Among other
things, it slows down garbage collection. If done badly, it could break
existing code. The C-99 standard did this- by introducing long long, it
*silently* broke conformant code. I've yet to forgive them for doing
this. More to the point, it silently broke *my* code. Which is why I'm
violently allergic to suggestions that we "patch around" the 32-bit
limitation. When people suggest this, I tend to hear "I want to break
your code". Because that's how it worked last time.
> I disagree. Actually I think an opposite. A suggestion to move to 64-bit
> is just patching the symptoms of the real problem - a design flaw in the
> OCaml programming language. And this issue will not go away on 64-bits.
> Just like 22 bits for the array size is a completely artificial limit on
> 32-bit platforms, (N - 10) imposes once again such an unlogical 54 bits
> limit on 64-bit computers. And I don't think "it makes complete sense on
> a 64-bit architecture".
By this logic, Ocaml should be able to support array sizes of 10^100th on
16-bit platforms. Why is Ocaml imposing stupid artificial limits?
I note with humor that the Java programming language a) defined ints to be
32 bits, and b) made the index type for arrays, vectors, and many other
data structures ints and not longs. When you move to 64 bits, it's Java,
much more so than Ocaml, that hits artificial and low limits on array
size. Ocaml will be able to have arrays millions of times larger than the
largest possible array in Java.
> If you create OCaml programs for yourself then maybe you can switch to
> 64-bits overnight (and perpahs you already did so). But if you need to
> distribute your application to customers/users then it's much more
> complicated. And explaining them that they should upgrade all
> workstations because your software has various 4M limitations here and
> there sounds like a bad joke, doesn't it?
You call them 32-bit limits, which they are. I admit that this is a
problem- but it's not one created by Ocaml.
>
> 32-bit x86 platform is still good enough for many purposes and its 2^32
> limit is acceptable in most cases while 2^22 is not.
Except that you're here comparing apples to oranges again. The 2^22 limit
is in *words*, while the 2^32 limit is in *bytes*. So it's really more of
a 2^24 vr.s 2^31 comparison.
Brian
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 14:22 ` Brian Hurt
@ 2006-05-20 18:41 ` j h woodyatt
2006-05-20 19:37 ` Jon Harrop
2006-05-20 20:47 ` Jozef Kosoru
2006-05-26 18:34 ` Ken Rose
2 siblings, 1 reply; 67+ messages in thread
From: j h woodyatt @ 2006-05-20 18:41 UTC (permalink / raw)
To: The Caml Trade
On May 20, 2006, at 7:22 AM, Brian Hurt wrote:
>
> Consider the classic programming pattern of reading the entire file
> into a list (or array) of strings. [...] boom- welcome to the 4G
> limit. Heck, you have to do special incantations just to open and
> read a file that big, let only trying to store the whole thing in
> memory. This is why I claim that once hard disks start getting
> larger than the address space, the address space needs to increase.
Sounds like it's an *anti*-pattern to me. The offset parameter to
mmap(2) is there for a reason.
There is, however, another interesting point I hope the Caml team at
INRIA keeps in mind. I hear rumors that 64-bit processes on
[redacted] may not have access to all the application support
libraries. Some of them, e.g. [redacted] and [redacted], may only be
available to 32-bit processes-- for technical reasons, not business
reasons. The engineering resources to make the libraries available
to 64-bit processes are there, but the technical case for making 64-
bit versions of the libraries is apparently losing. At least for the
time being. I would be unsurprised if technical decisions like this
are happening all over the application services programming world.
What this tells me is that 64-bit hardware and 64-bit operating
systems will very likely continue to have 32-bit processes running on
them for some time to come. We will be living with 32-bit address
spaces for many years after everyone gets 64-bit operating systems
and 64-bit desktop machines.
All that said, I've got little sympathy for the critics on this
issue. The 4 MB array size limit is a fact of OCaml life, but
there's a reasonable workaround that will get most application
programmers around it, i.e. careful use of Bigarray. The 4G size
limit is another story altogether, and it will plague everyone
equally-- not just the OCaml world.
—
j h woodyatt <jhw@conjury.org>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 18:41 ` j h woodyatt
@ 2006-05-20 19:37 ` Jon Harrop
0 siblings, 0 replies; 67+ messages in thread
From: Jon Harrop @ 2006-05-20 19:37 UTC (permalink / raw)
To: caml-list
On Saturday 20 May 2006 19:41, j h woodyatt wrote:
> The 4G size
> limit is another story altogether, and it will plague everyone
> equally-- not just the OCaml world.
When I've ported C++ programs to OCaml they often use ~2x as much memory, so
you could argue that a memory limit is more limiting for OCaml users.
However, my personal impression is that memory has gone from the scarce
resource that it was 20 years ago to a resource that is now commonly wasted
and moving from C++ to OCaml certainly fits in with that trend.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 14:22 ` Brian Hurt
2006-05-20 18:41 ` j h woodyatt
@ 2006-05-20 20:47 ` Jozef Kosoru
2006-05-26 18:34 ` Ken Rose
2 siblings, 0 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 20:47 UTC (permalink / raw)
To: Brian Hurt; +Cc: caml-list
On Sat, May 20, 2006 at 10:22:42 -0400, Brian Hurt wrote:
> >4G address space limit is off-topic here. Please note that the OCaml
> >array size limit is 4M not 4G.
>
> It is entirely on-topic. How much headroom do you think you have?
>
> Let's perform an experiment here. Let's assume that Xavier decides you're
> right, and releases a new version of Ocaml with removes the 4M array
> limit. How big of an array will you be able to create then? More than
> 4M, yes- but how much more?
>
> The hardware imposes a hard and fast 4G limit. [...] To make life
> simple, let's say that everything except the huge array will fit in
> the upper 2G, leaving 2G for the huge array.
>
> Now if it's a huge array of integers, each element only takes up 4
> bytes, which means we can fit 512M integers in this array. [...] Wow,
> we could make an array 128 times as big as we can now! Except the
> proper way to look at this is that it's a mean 7 bits larger. And
> that's best case.
>
> If we changed it to an array of floats, each float now takes 8 bytes, and
> that 2G can only hold 256M floats. Our 128x improvement has now dropped
> to a 64x improvement- and we're starting to get a bad feeling about where
> this is going.
>
> If we start trying to hold 3-dimensional pointers, that's 3 floats-
> and our 64x improvement has now dropped to about a 21x improvement.
> [...] making our 21x improvement (42x improvement) a mere 16x (25x)
> improvement.
>
> Now, what happens if we try to hold a triangle of three points (i.e.
> for 3D polygon mapping)? [...] 2G can now hold about 19M unique
> polygons- a mere 5x advantage.
Thank you for elaborating on this issue and providing us some
interesting numbers. I think your results are valid. So it is perfectly
possible to find examples in a practice where there is a need to create
arrays from 5x to 128x bigger than OCaml allows. And that is the
problem.
> [...]
>
> >I disagree. Actually I think an opposite. A suggestion to move to
> >64-bit is just patching the symptoms of the real problem - a design
> >flaw in the OCaml programming language. And this issue will not go
> >away on 64-bits. Just like 22 bits for the array size is a
> >completely artificial limit on 32-bit platforms, (N - 10) imposes
> >once again such an unlogical 54 bits limit on 64-bit computers. And I
> >don't think "it makes complete sense on a 64-bit architecture".
>
> By this logic, Ocaml should be able to support array sizes of 10^100th
> on 16-bit platforms. Why is Ocaml imposing stupid artificial limits?
No. An above mentioned logic doesn't imply that. It should be 2^16 on
16-bits of course.
> I note with humor that the Java programming language a) defined ints
> to be 32 bits, and b) made the index type for arrays, vectors, and
> many other data structures ints and not longs. When you move to 64
> bits, it's Java, much more so than Ocaml, that hits artificial and low
> limits on array size. Ocaml will be able to have arrays millions of
> times larger than the largest possible array in Java.
Yeah, but we don't expect a lot of Java fans on this mailing list anyhow
;)
> >If you create OCaml programs for yourself then maybe you can switch
> >to 64-bits overnight (and perpahs you already did so). But if you
> >need to distribute your application to customers/users then it's much
> >more complicated. And explaining them that they should upgrade all
> >workstations because your software has various 4M limitations here
> >and there sounds like a bad joke, doesn't it?
>
> You call them 32-bit limits, which they are. I admit that this is a
> problem- but it's not one created by Ocaml.
It's been created by OCaml. CPU architecture is not obliged to have word
wide enough to allow programming languages to store some meta data along
with array indexes. That was OCaml "invention". An argument that 32-bit
is a crappy platform doesn't really hold. As Xavier acknowledged - OCaml
was not designed with 32-bit CPUs in mind; and that's the thing.
> >32-bit x86 platform is still good enough for many purposes and its
> >2^32 limit is acceptable in most cases while 2^22 is not.
>
> Except that you're here comparing apples to oranges again. The 2^22
> limit is in *words*, while the 2^32 limit is in *bytes*. So it's
> really more of a 2^24 vr.s 2^31 comparison.
My point was that 2^32 (or 2^31 if you want) is usually enough to count
both apples and oranges whereas 2^22 is barely enough to count apple
trees.
But let's close this thread - I think we have discussed the problem into
great details and the final conclusion might be that consensus is not
possible. :)
Jozef
--
jozef kosoru
http://zyzstar.kosoru.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 14:22 ` Brian Hurt
2006-05-20 18:41 ` j h woodyatt
2006-05-20 20:47 ` Jozef Kosoru
@ 2006-05-26 18:34 ` Ken Rose
2 siblings, 0 replies; 67+ messages in thread
From: Ken Rose @ 2006-05-26 18:34 UTC (permalink / raw)
To: Brian Hurt; +Cc: Jozef Kosoru, caml-list
Brian Hurt wrote:
> On the other side, increasing the array size has a cost. Among other
> things, it slows down garbage collection. If done badly, it could break
> existing code. The C-99 standard did this- by introducing long long, it
> *silently* broke conformant code. I've yet to forgive them for doing
> this. More to the point, it silently broke *my* code. Which is why I'm
> violently allergic to suggestions that we "patch around" the 32-bit
> limitation. When people suggest this, I tend to hear "I want to break
> your code". Because that's how it worked last time.
Maybe I'm being dense here, and maybe this is off-topic, but how did
long long break things in C99?
Thanks
- ken
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 10:51 ` Jozef Kosoru
2006-05-20 14:22 ` Brian Hurt
@ 2006-05-20 22:07 ` Oliver Bandel
1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 22:07 UTC (permalink / raw)
To: caml-list
On Sat, May 20, 2006 at 12:51:08PM +0200, Jozef Kosoru wrote:
> On Fri, May 19, 2006 at 20:57:35 -0400, Brian Hurt wrote:
> > >Yes. 32-bit x86 platform is not going away anytime soon. Given that
> > >512M RAM is now standard and 1G RAM is very common for an average PC
> >
> > This is what boggles my imagination. The combination of obstinate
> > adherence to a limited platform (x86-32) in the same breath as a
> > recognition that we are approaching the limitations of that
> > architecture. It's the 640K limit all over again.
> >
> > And no, segments will not help the situation. Every single process is
> > limited to 4G of address space, period. Read the Intel CPU docs.
> > With reasonable amounts of virtual memory we're well above that
> > already- and we're approaching that with real memory.
>
> 4G address space limit is off-topic here. Please note that the OCaml
> array size limit is 4M not 4G.
>
> > Now, I realize the core reasons for the delay in moving to 64-bits are
> > industry wide. Intel's Itanium fiasco delayed Intel introducing a
> > 64-bit chip at least 7 years. And Microsoft seems to be incapable of
> > releasing a new OS- 32-bit or 64-bit. But that, I think, is the core
> > of the problem- Ocaml's array limit is just one of many symptoms.
> >
> > And that's my point- we should be looking to fix the underlying
> > problem, not looking to patch the symptoms. Because often times
> > patching the symptoms and not addressing the core problem simply makes
> > the whole situation worse- the underlying problem simply shows up in
> > new ways, and the fix for the specific symptom often causes new
> > problems.
>
> I disagree. Actually I think an opposite. A suggestion to move to 64-bit
> is just patching the symptoms of the real problem - a design flaw in the
> OCaml programming language.
[...]
No.
It would be a design flaw when future processors would make bigger problems.
You may have missed the historical note from Xavier: OCaml is a BACK-port.
What kind of problems would you run into, when backporting your current
applications to a 16 or 8 Bit processor?
Would you call that breaking code degned by flaw?
The year 2k problem we have left behind, but the POSIX timer (derived
from the Unix time) will overflow in 2038.
So, he have some years to switch to 64 Bit platforms,
before the year 2038 problem will make trouble ;-)
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [Caml-list] Array 4 MB size limit
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20 1:17 ` Frederick Akalin
2006-05-20 10:51 ` Jozef Kosoru
@ 2006-05-20 15:15 ` Don Syme
2006-05-20 22:15 ` Oliver Bandel
2 siblings, 1 reply; 67+ messages in thread
From: Don Syme @ 2006-05-20 15:15 UTC (permalink / raw)
To: Brian Hurt, Jozef Kosoru; +Cc: caml-list
> And Microsoft seems to be incapable of releasing a
> new OS- 32-bit or 64-bit.
64-bit editions of Windows XP and Windows 2003 Server have been
available for quite a while, e.g.
http://www.microsoft.com/windowsxp/64bit/default.mspx
The maximum memory sizes supported are listed here:
http://www.microsoft.com/windowsxp/64bit/facts/top10.mspx
Cheers!
Don
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 15:15 ` Don Syme
@ 2006-05-20 22:15 ` Oliver Bandel
2006-05-21 1:25 ` skaller
0 siblings, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 22:15 UTC (permalink / raw)
To: caml-list
On Sat, May 20, 2006 at 04:15:52PM +0100, Don Syme wrote:
>
> > And Microsoft seems to be incapable of releasing a
> > new OS- 32-bit or 64-bit.
>
> 64-bit editions of Windows XP and Windows 2003 Server have been
> available for quite a while, e.g.
What is Windows? :->
Linux also is available since a long time on 64-Bit machines. :)
Ciao,
Oliver
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [Caml-list] Array 4 MB size limit
2006-05-20 22:15 ` Oliver Bandel
@ 2006-05-21 1:25 ` skaller
0 siblings, 0 replies; 67+ messages in thread
From: skaller @ 2006-05-21 1:25 UTC (permalink / raw)
To: Oliver Bandel; +Cc: caml-list
On Sun, 2006-05-21 at 00:15 +0200, Oliver Bandel wrote:
> On Sat, May 20, 2006 at 04:15:52PM +0100, Don Syme wrote:
> >
> > > And Microsoft seems to be incapable of releasing a
> > > new OS- 32-bit or 64-bit.
> >
> > 64-bit editions of Windows XP and Windows 2003 Server have been
> > available for quite a while, e.g.
>
> What is Windows? :->
The dominant desktop platform. Linux is unlikely to challenge it.
Apple might.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-28 23:20 Harrison, John R
2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
0 siblings, 2 replies; 67+ messages in thread
From: Harrison, John R @ 2006-05-28 23:20 UTC (permalink / raw)
To: Martin Jambon, Caml List; +Cc: Harrison, John R
Hi Martin,
| I disagree: has it ever happened to you to mutate a string by
accident?
The point is not that I will mutate a string by accident. I've never
done
it by accident or by design. The point is that I can't depend on code
that I call, or code that calls mine, not to subsequently modify strings
that are passed as arguments. So if I really need to reliably fix them I
am forced into expensive copy operations.
In practice, the obvious library calls are safe, so like Aleksey, I use
the built-in strings for the sake of convenience and compatibility. But
it's unsatisfactory intellectually. Some of us want to program in a
primarily functional style, yet the implementation of one of the most
basic and useful datatypes is not functional.
| Yes, so how do you avoid copies without using the "unsafe" conversions
all
| over the place?
With immutable strings, you'd never need to do conversions at the module
interfaces. As with any other functional data structure, you only copy
when you want to change part of it.
John.
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
@ 2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
1 sibling, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-29 2:36 UTC (permalink / raw)
To: Harrison, John R; +Cc: Caml List
Hi John,
On Sun, 28 May 2006, Harrison, John R wrote:
> With immutable strings, you'd never need to do conversions at the module
> interfaces. As with any other functional data structure, you only copy
> when you want to change part of it.
OK, but let's be pragmatic: what kind of interface and implementation do
you have in mind?
(and then: isn't it possible to implement in OCaml?)
If anyone is interested:
Before posting I tried a polymorphic (wrt mutability) string type.
It was fun enough, but it doesn't scale very well. I put it there:
http://martin.jambon.free.fr/ocaml.html#gstring
Martin
--
Martin Jambon, PhD
http://martin.jambon.free.fr
Edit http://wikiomics.org, bioinformatics wiki
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
@ 2006-05-31 12:53 ` Jean-Christophe Filliatre
1 sibling, 0 replies; 67+ messages in thread
From: Jean-Christophe Filliatre @ 2006-05-31 12:53 UTC (permalink / raw)
To: Harrison, John R; +Cc: Martin Jambon, Caml List
Harrison, John R writes:
> The point is not that I will mutate a string by accident.
I once discovered a bug in the Coq proof assistant that was precisely
due to a string (an identifier) mutated by accident (may be I
shouldn't say it :-) A name was capitalized in-place somewhere in a
piece of code unrelated with the Coq kernel but of course it had
consequences all over the system (including the kernel).
So I'm definitely in favor of immutable strings, for the exact reasons
mentioned by John.
But I think an abstract data type is not really an issue, since one
does little pattern-matching on strings in practice. And having your
own abstract data type for immutable strings has other advantages,
such as the ability to share equal strings (using hash-consing) to
speedup names comparisons. Even printing is not painful provided a
suitable formatter-based printing function and %a.
--
Jean-Christophe Filliâtre (http://www.lri.fr/~filliatr)
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-29 20:52 Harrison, John R
0 siblings, 0 replies; 67+ messages in thread
From: Harrison, John R @ 2006-05-29 20:52 UTC (permalink / raw)
To: Martin Jambon, Caml List; +Cc: Harrison, John R
Hi Martin,
| OK, but let's be pragmatic: what kind of interface and implementation
do
| you have in mind?
I did indeed have a very specific example in mind, my theorem prover HOL
Light. I have an OCaml type of typed lambda-terms:
type term =
Var of string * hol_type
| Const of string * hol_type
| Comb of term * term
| Abs of term * term
The type "term" is private, and the abstract type interface only permits
you to construct well-typed terms, via interface functions like "mk_var"
and "mk_comb". For example, the call "mk_comb(s,t)" gives "Comb(s,t)"
provided the types agree, and fails otherwise.
I would like the user to be able to write "mk_var(x,ty)" and the net
result to be just one cons "Var(x,ty)" with "x" and "ty" identical to
the
input arguments. But with mutable strings, it is possible in principle
for the string "x" inside that object to get modified by other code.
Of course, it's a bit artificial, but I would like it to be impossible,
given that the principle of LCF provers is that the user should be able
to use arbitrary programs while having soundness enforced by the ML
type system.
Of course, I can use my own private type of strings, but then I need
to convert every time I use standard library functions, pattern matching
is a bit less convenient, etc.
John.
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2006-05-31 12:53 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
2006-05-15 21:27 ` akalin
2006-05-15 22:51 ` Oliver Bandel
2006-05-16 0:48 ` Brian Hurt
2006-05-16 9:57 ` Damien Doligez
2006-05-16 15:10 ` Markus Mottl
2006-05-16 8:01 ` Xavier Leroy
2006-05-16 8:20 ` Nicolas Cannasse
2006-05-19 17:13 ` Xavier Leroy
2006-05-19 5:57 ` Frederick Akalin
2006-05-19 6:21 ` Oliver Bandel
2006-05-19 12:15 ` Jon Harrop
2006-05-19 19:36 ` akalin
2006-05-19 20:17 ` Oliver Bandel
2006-05-19 16:28 ` Jozef Kosoru
2006-05-19 20:08 ` Oliver Bandel
2006-05-19 21:26 ` Jon Harrop
2006-05-20 1:06 ` Brian Hurt
2006-05-20 18:32 ` brogoff
2006-05-20 21:29 ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-22 22:09 ` Aleksey Nogin
2006-05-20 21:11 ` immutable strings (Re: [Caml-list] " Oliver Bandel
2006-05-25 4:32 ` immutable strings (Re: " Stefan Monnier
2006-05-25 5:56 ` [Caml-list] " Martin Jambon
2006-05-25 7:23 ` j h woodyatt
2006-05-25 10:22 ` Jon Harrop
2006-05-25 19:28 ` Oliver Bandel
2006-05-25 11:14 ` Brian Hurt
2006-05-25 19:42 ` Oliver Bandel
2006-05-26 6:51 ` Alain Frisch
2006-05-25 17:31 ` Aleksey Nogin
2006-05-25 19:54 ` Martin Jambon
2006-05-25 11:18 ` Brian Hurt
2006-05-25 17:34 ` Aleksey Nogin
2006-05-25 18:44 ` Tom
2006-05-25 23:00 ` Jon Harrop
2006-05-25 23:15 ` Martin Jambon
2006-05-20 0:57 ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20 1:17 ` Frederick Akalin
2006-05-20 1:52 ` Brian Hurt
2006-05-20 9:08 ` Jozef Kosoru
2006-05-20 10:12 ` skaller
2006-05-20 11:06 ` Jozef Kosoru
2006-05-20 12:02 ` skaller
2006-05-20 21:42 ` Oliver Bandel
2006-05-21 1:24 ` skaller
2006-05-21 14:10 ` Oliver Bandel
[not found] ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
2006-05-20 19:52 ` Jozef Kosoru
2006-05-20 21:45 ` Oliver Bandel
2006-05-21 9:26 ` Richard Jones
[not found] ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
2006-05-22 10:40 ` Richard Jones
2006-05-20 10:51 ` Jozef Kosoru
2006-05-20 14:22 ` Brian Hurt
2006-05-20 18:41 ` j h woodyatt
2006-05-20 19:37 ` Jon Harrop
2006-05-20 20:47 ` Jozef Kosoru
2006-05-26 18:34 ` Ken Rose
2006-05-20 22:07 ` Oliver Bandel
2006-05-20 15:15 ` Don Syme
2006-05-20 22:15 ` Oliver Bandel
2006-05-21 1:25 ` skaller
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29 2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-05-29 20:52 Harrison, John R
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox