* [Caml-list] Out_of_memory exception in output_value
@ 2004-05-28 9:10 Richard Jones
2004-05-28 14:31 ` Richard Jones
0 siblings, 1 reply; 6+ messages in thread
From: Richard Jones @ 2004-05-28 9:10 UTC (permalink / raw)
To: caml-list
I have a problem with a real program which tries to write an approx
200 MB data structure to disk using output_value. The output_value
statement throws an Out_of_memory exception, even though the Unix
process itself is only using around 350 MB on a machine with plenty of
physical RAM and swap.
The test program below reproduces the problem.
Is there any way I can tell the memory allocator that it's allowed to
keep allocating memory, or work around this bug in another way?
Rich.
---
(* ocamlopt -w s unix.cmxa memtest.ml -o memtest *)
let arraylen = 200000
let stringlen = 1000
let () =
prerr_endline "Creating big structure ...";
let data = Array.init arraylen (fun i -> String.create stringlen) in
prerr_endline "Sleeping for 5 seconds ...";
Unix.sleep 5;
prerr_endline "Saving big structure to a file ...";
let chan = open_out_bin "/tmp/test.dat" in
output_value chan data;
close_out chan;
ignore (Sys.command "ls -lh /tmp/test.dat")
---
$ ./memtest
Creating big structure ...
Sleeping for 5 seconds ...
Saving big structure to a file ...
Fatal error: exception Out_of_memory
During the sleep:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3098 rich 25 0 203m 198m 1640 S 43.4 42.0 0:03.21 memtest
Just before the crash:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3103 rich 18 0 333m 301m 1640 D 1.7 63.7 0:04.49 memtest
--
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://www.winwinsales.co.uk/ - CRM improvement consultancy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Out_of_memory exception in output_value
2004-05-28 9:10 [Caml-list] Out_of_memory exception in output_value Richard Jones
@ 2004-05-28 14:31 ` Richard Jones
2004-05-28 16:47 ` Jacques Carette
0 siblings, 1 reply; 6+ messages in thread
From: Richard Jones @ 2004-05-28 14:31 UTC (permalink / raw)
To: caml-list
Thanks to some people who helped me off-list. I've solved this now.
It turns out that during output_value, the OCaml code allocates a
block of memory to contain the marshalled representation of the data.
Each time it runs out of room, it doubles the size of that block using
realloc().
In my case the final allocation was something like 260MB because the
size of the data required was approx 144MB (ie. just over 130MB, so it
needed to double the block). realloc() was actually returning NULL
because the kernel couldn't allocate enough memory.
Allocating another 1GB of swap solves the problem nicely.
Rich.
--
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://www.winwinsales.co.uk/ - CRM improvement consultancy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [Caml-list] Out_of_memory exception in output_value
2004-05-28 14:31 ` Richard Jones
@ 2004-05-28 16:47 ` Jacques Carette
2004-05-28 19:44 ` skaller
0 siblings, 1 reply; 6+ messages in thread
From: Jacques Carette @ 2004-05-28 16:47 UTC (permalink / raw)
To: 'Richard Jones', caml-list
> It turns out that during output_value, the OCaml code allocates a
> block of memory to contain the marshalled representation of the data.
> Each time it runs out of room, it doubles the size of that block using
> realloc().
Wouldn't it be more system-friendly to try successively factors *2, *1.5,
*1.1, and *1.05 before actually failing? This would not cost any
performance penalty in successful cases, and could decrease the number of
failing cases. Seems win-win to me. And as long as it is a multiplicative
factor, this does not lead to algorithmic degenerescence.
Jacques
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [Caml-list] Out_of_memory exception in output_value
2004-05-28 16:47 ` Jacques Carette
@ 2004-05-28 19:44 ` skaller
2004-05-28 20:54 ` Eric Dahlman
0 siblings, 1 reply; 6+ messages in thread
From: skaller @ 2004-05-28 19:44 UTC (permalink / raw)
To: carette; +Cc: 'Richard Jones', caml-list
On Sat, 2004-05-29 at 02:47, Jacques Carette wrote:
> > It turns out that during output_value, the OCaml code allocates a
> > block of memory to contain the marshalled representation of the data.
> > Each time it runs out of room, it doubles the size of that block using
> > realloc().
>
> Wouldn't it be more system-friendly to try successively factors *2, *1.5,
> *1.1, and *1.05 before actually failing?
I have a published book chapter part of which deals with
this issue in some detail, including some performance
measurements.
The general solution is much cleaner -- to use
a user supplied allocation function something like:
new_size = f old_size requested_size
Doubling the size, or indeed even using a pure multiplier,
is only one possible option: simply adding a fixed amout
is another option, and an arbitrary function handles
both cases and many more. So a general solution is to make
the operation polymorphic by making the calculator function
an argument (in practice with a sensible default).
My experiments were with variable length arrays used
to hold big integers, so some operations produced
small increases (like addition) whereas other
produced increases much faster (like multiplication).
A quick and easy fix would be to use a global variable
containing the function which the client can reset.
Yeah, I hate global state but here the actual function
chosen has no semantic impact, it only affects performance
(unless you run out of memory .. of course). So this
time a global might be OK :)
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Out_of_memory exception in output_value
2004-05-28 19:44 ` skaller
@ 2004-05-28 20:54 ` Eric Dahlman
2004-05-29 7:13 ` skaller
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dahlman @ 2004-05-28 20:54 UTC (permalink / raw)
To: skaller; +Cc: carette, 'Richard Jones', caml-list
On May 28, 2004, at 2:44 PM, skaller wrote:
> On Sat, 2004-05-29 at 02:47, Jacques Carette wrote:
>>> It turns out that during output_value, the OCaml code allocates a
>>> block of memory to contain the marshalled representation of the data.
>>> Each time it runs out of room, it doubles the size of that block
>>> using
>>> realloc().
>>
>> Wouldn't it be more system-friendly to try successively factors *2,
>> *1.5,
>> *1.1, and *1.05 before actually failing?
I am not sure that it would have that much benefit for all of the
complexity it would introduce. I expect the set of interesting things
with a marshaled representation which is too large for the present
approach but still small enough to work for a smaller factors is really
small. Keep in mind that when the size is grown the assumption is that
you will have to copy the lower half of the already marshaled data into
the new object since it is almost guaranteed that it will not be
possible to grow the object in place as something else will almost
surely have been allocated after it. A better approach would be to
have an output_really_big_object which allocated a huge buffer in a
single go or an approach which streamed the marshaled data so as not to
have to keep it in memory. (I don't know if that approach is possible
using ocaml's marshaling, i didn't look.) At any rate this still
addresses what I believe to be a small class of useful situations, on
the whole I suspect that the data will be small enough to work with the
present approach or so big that the fractional approach will also fail.
> I have a published book chapter part of which deals with
> this issue in some detail, including some performance
> measurements.
>
> The general solution is much cleaner -- to use
> a user supplied allocation function something like:
>
> new_size = f old_size requested_size
>
> Doubling the size, or indeed even using a pure multiplier,
> is only one possible option: simply adding a fixed amout
> is another option, and an arbitrary function handles
> both cases and many more. So a general solution is to make
> the operation polymorphic by making the calculator function
> an argument (in practice with a sensible default).
Since you will have to copy the data on a realloc doubling has the nice
effect of giving us a constant bound on the costs of reallocation in a
calculation. This does not hold for approaches like increasing in
fixed increments which will convert an algorithm which in linear in the
size of the data into one which is quadratic. I think that doubling
might well be the best "guestimate" for a general library to make.
> My experiments were with variable length arrays used
> to hold big integers, so some operations produced
> small increases (like addition) whereas other
> produced increases much faster (like multiplication).
In this domain specific case you have a wealth of information and you
can calculate a very tight bound on the size of the resulting array so
you should never have to grow the result as you go. Just allocate it
correctly in the beginning or am I missing something?
> A quick and easy fix would be to use a global variable
> containing the function which the client can reset.
> Yeah, I hate global state but here the actual function
> chosen has no semantic impact, it only affects performance
> (unless you run out of memory .. of course). So this
> time a global might be OK :)
I don't know about this one, it really smacks of premature optimization
and abstraction inversion. The present approach to buffer growing has
a constant amortized cost. If one was in the position of being able to
significantly improve on that for a given problem domain by carefully
tweaking such an allocation function then I would almost guarantee that
it would be better to create a custom solution for that domain. This
would allow explicitly using that information to create a better
algorithm rather than just tweaking a general mechanism.
Just my two cents,
-Eric
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Out_of_memory exception in output_value
2004-05-28 20:54 ` Eric Dahlman
@ 2004-05-29 7:13 ` skaller
0 siblings, 0 replies; 6+ messages in thread
From: skaller @ 2004-05-29 7:13 UTC (permalink / raw)
To: Eric Dahlman; +Cc: carette, 'Richard Jones', caml-list
On Sat, 2004-05-29 at 06:54, Eric Dahlman wrote:
> On May 28, 2004, at 2:44 PM, skaller wrote:
> >> Wouldn't it be more system-friendly to try successively factors *2,
> >> *1.5,
> >> *1.1, and *1.05 before actually failing?
>
> I am not sure that it would have that much benefit for all of the
> complexity it would introduce.
I don't quite agree for the following reason: if something
fails when you're only using 50% of memory instead of 90%,
you're likely to be both puzzled and annoyed. In practice
this can make quite a difference at what sized problems
you can handle on your machine. It can also really trash out
a large server because malloc() is *required* to actually
allocate memory, not just address space.
Since allocations of this kind are rare the extra cost
doing a most sophisticated calculation isn't important.
As you point out though, the extra complexity is a real
problem: we could argue forever how to choose an optimial
calculation. Which is why I suggested the user be able
to do it. This delegates the complexity back to the
client and out of the library.
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-05-29 7:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-28 9:10 [Caml-list] Out_of_memory exception in output_value Richard Jones
2004-05-28 14:31 ` Richard Jones
2004-05-28 16:47 ` Jacques Carette
2004-05-28 19:44 ` skaller
2004-05-28 20:54 ` Eric Dahlman
2004-05-29 7:13 ` skaller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox