* [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
@ 2013-10-14 15:30 Enrico Tassi
2013-10-16 14:03 ` Jacques-Henri Jourdan
2013-10-22 9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
0 siblings, 2 replies; 6+ messages in thread
From: Enrico Tassi @ 2013-10-14 15:30 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 2057 bytes --]
Hello caml-list. I'm new here, It's my fist post, and I want to share
an hack I came up with to systematically solve some marshalling problems
I have in the context of Coq development. I hope the hack is of general
interest, maybe it is folklore, maybe not...
The Coq architecture is switching to a multi process one, where slave
processes are fed a system state, a portion of the proof script (an
entire proof at the moment) and are expected to return the resulting
proof term. Data is marshalled using the standard Marshal module.
The system state is made of pretty much anything, and is also user
extensible via plugins. Nothing prevents someone to stick in there
values that can hardly be marshalled, like callbacks, file descriptors,
lazy_t, and the like. Of course it is nice to be able to store
callbacks or lazy values in the system state, so forbidding all that is
not nice.
Luckily enough, I don't need this data in the slave processes.
Hence I could pre-process the system state and throw it away.
But the system state is big, and traversing it to prune out some bits is
likely to be expensive. Moreover I'm lazy, I don't want to code that
pruning function, hence the following solution.
Values that can't, or should not, be marshalled are stored in the system
state using a unique key, and given a key one can retrieve the
corresponding value. These (key,value) pairs are ephemeron, if the key
is not reachable, the value (if it has no extra references) will be
eventually collected too. When a key is marshalled, no error occurs
(keys are just doubly boxed integers). When a key is unmarshalled it
becomes invalid, i.e. the associated value cannot be retrieved anymore.
In some sense it is like if marshalling was forgetting the part of
the marshalled value that is stored as a key. All automatically, no
need to pre/post process the marshalled value and no extra memory
management hassle.
Thanks to Damien Doligez for his preliminary comments on the attached
files.
All sort of comments is welcome,
Cheers
--
Enrico Tassi
[-- Attachment #2: ephemeron.mli --]
[-- Type: text/plain, Size: 1638 bytes --]
(* Use case:
You have a data structure that needs to be marshalled but it contains
unmarshallable data (like a closure, or a file descriptor). Actually
you don't need this data to be preserved by marshalling, it just happens
to be there.
You could produced a trimmed down data structure, but then, once
unmarshalled, you can't used the very same code to process it, unless you
re-inject the trimmed down data structure into the standard one, using
dummy values for the unmarshallable stuff.
Similarly you could change your data structure turning all types [bad]
into [bad option], then just before marshalling you set all values of type
[bad option] to [None]. Still this pruning may be expensive and you have
to code it.
Desiderata:
The marshalling operation automatically discards values that cannot be
marshalled or cannot be properly unmarshalled.
Proposed solution:
Turn all occurrences of [bad] into [bad key] in your data structure.
Use [crate bad_val] to obtain a unique key [k] for [bad_val], and store
[k] in the data structure. Use [get k] to obtain [bad_val].
An ['a key] can always be marshalled. When marshalled, a key loses its
value. The function [get] raises Not_found on unmarshalled keys.
If a key is garbage collected, the corresponding value is garbage
collected too (unless extra references to it exist).
In short no memory management hassle, keys can just replace their
corresponding value in the data structure. *)
type 'a key
val create : 'a -> 'a key
val get : 'a key -> 'a
[-- Attachment #3: ephemeron.ml --]
[-- Type: text/plain, Size: 1484 bytes --]
type key_type = int
type boxed_key = key_type ref ref
let mk_key : unit -> boxed_key =
(* We should take a random value here, is there a random function in OCaml?*)
let bid = ref 0 in
(* According to OCaml Gc module documentation, Pervasives.ref is one of the
few ways of getting a boxed value the compiler will never alias. *)
fun () -> incr bid; Pervasives.ref (Pervasives.ref !bid)
(* A phantom type to preserve type safety *)
type 'a key = boxed_key
(* Comparing keys with == grants that if a key is unmarshalled (in the same
process where it was created or in another one) it is not mistaken for
an already existing one (unmarshal has no right to alias). If the initial
value of bid is taken at random, then one also avoids potential collisions *)
module HT = Hashtbl.Make(struct
type t = key_type ref
let equal k1 k2 = k1 == k2
let hash id = !id
end)
(* A key is the (unique) value inside a boxed key, hence it does not
keep its corresponding boxed key reachable (replacing key_type by boxed_key
would make the key always reachable) *)
let values : Obj.t HT.t = HT.create 1001
let collect k = HT.remove values !k
(* The only reference to the boxed key is the one returned, when the user drops
it the value eventually disappears from the values table above *)
let create (v : 'a) : 'a key =
let k = mk_key () in
HT.add values !k (Obj.repr v);
Gc.finalise collect k;
k
let get (k : 'a key) : 'a = Obj.obj (HT.find values !k)
[-- Attachment #4: test.ml --]
[-- Type: text/plain, Size: 839 bytes --]
module E = Ephemeron ;;
let test0 () =
let f = E.create (fun () -> prerr_endline "OK") in
try E.get f ()
with Not_found -> prerr_endline "WTF"
;;
let test1 () =
let f = E.create (fun () -> prerr_endline "OK") in
Gc.full_major ();
try E.get f ()
with Not_found -> prerr_endline "WTF"
;;
let test2 () =
let str =
let f = E.create (fun () -> prerr_endline "WTF") in
Marshal.to_string f [] in
Gc.full_major ();
let f : (unit -> unit) E.key = Marshal.from_string str 0 in
try E.get f ()
with Not_found -> prerr_endline "OK"
;;
let test3 () =
let str =
let f = E.create (fun () -> prerr_endline "WTF") in
Marshal.to_string f [] in
let f : (unit -> unit) E.key = Marshal.from_string str 0 in
try E.get f ()
with Not_found -> prerr_endline "OK"
;;
test0 ();;
test1 ();;
test2 ();;
test3 ();;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
@ 2013-10-16 14:03 ` Jacques-Henri Jourdan
2013-10-16 14:42 ` Enrico Tassi
2013-10-22 9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
1 sibling, 1 reply; 6+ messages in thread
From: Jacques-Henri Jourdan @ 2013-10-16 14:03 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 926 bytes --]
Hi,
Le 14/10/2013 17:30, Enrico Tassi a écrit :
> let collect k = HT.remove values !k
>
> (* The only reference to the boxed key is the one returned, when the user drops
> it the value eventually disappears from the values table above *)
> let create (v : 'a) : 'a key =
> let k = mk_key () in
> HT.add values !k (Obj.repr v);
> Gc.finalise collect k;
> k
In that piece of code, you should take care to the fact that the
finalizer can be called at /any/ moment.
In particular, it can be called while you are modifying your hash table.
If this situation happens, then you are modifying the hash table (by
removing elements) when it is in an unconsistent state.
Did you think about that ? Do I miss something ?
The authors of Why3 encountered similar problems. I suggest you to read
the Weakhtbl module, that gives a solution to this problem.
Regards,
--
Jacques-Henri Jourdan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 555 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
2013-10-16 14:03 ` Jacques-Henri Jourdan
@ 2013-10-16 14:42 ` Enrico Tassi
2013-10-16 15:30 ` François Bobot
0 siblings, 1 reply; 6+ messages in thread
From: Enrico Tassi @ 2013-10-16 14:42 UTC (permalink / raw)
To: caml-list
On Wed, Oct 16, 2013 at 04:03:18PM +0200, Jacques-Henri Jourdan wrote:
> The authors of Why3 encountered similar problems. I suggest you to read
> the Weakhtbl module, that gives a solution to this problem.
Thanks for the hint!
Indeed they use an imperative list to enqueue/dequeue deletions
that are hence never performed by the finalizer itself.
Ciao
--
Enrico Tassi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
2013-10-16 14:42 ` Enrico Tassi
@ 2013-10-16 15:30 ` François Bobot
2013-10-16 16:24 ` Alain Frisch
0 siblings, 1 reply; 6+ messages in thread
From: François Bobot @ 2013-10-16 15:30 UTC (permalink / raw)
To: caml-list
On 16/10/2013 16:42, Enrico Tassi wrote:
> On Wed, Oct 16, 2013 at 04:03:18PM +0200, Jacques-Henri Jourdan wrote:
>> The authors of Why3 encountered similar problems. I suggest you to read
>> the Weakhtbl module, that gives a solution to this problem.
>
> Thanks for the hint!
>
> Indeed they use an imperative list to enqueue/dequeue deletions
> that are hence never performed by the finalizer itself.
>
Another important points is that it allows concurrent access. The
finalizer can run when the actual deletions is performed (add can run
when iter_remove is run)
module ProdConsume :
sig
type 'a t
val create : unit -> 'a t
val add : 'a -> 'a t -> unit
val iter_remove : ('a -> unit) -> 'a t -> unit
end
--
François
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
2013-10-16 15:30 ` François Bobot
@ 2013-10-16 16:24 ` Alain Frisch
0 siblings, 0 replies; 6+ messages in thread
From: Alain Frisch @ 2013-10-16 16:24 UTC (permalink / raw)
To: François Bobot, caml-list
I can see that Damien created an "ephemeron" branch in the SVN. Is it
related to François' proposal presented in the 2011 OCaml User Conference ?
https://forge.ocamlcore.org/docman/view.php/77/134/memoization2011.pdf
Alain
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via
2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
2013-10-16 14:03 ` Jacques-Henri Jourdan
@ 2013-10-22 9:38 ` oleg
1 sibling, 0 replies; 6+ messages in thread
From: oleg @ 2013-10-22 9:38 UTC (permalink / raw)
To: enrico.tassi; +Cc: caml-list
> The system state is made of pretty much anything, and is also user
> extensible via plugins. Nothing prevents someone to stick in there
> values that can hardly be marshalled, like callbacks, file descriptors,
> lazy_t, and the like. Of course it is nice to be able to store
> callbacks or lazy values in the system state, so forbidding all that is
> not nice.
There is a similar problem when marshalling a captured delimited
continuation. The continuation captures a part of stack that points to
various closures in system libraries, which contain lots of
unserializable stuff. In addition, reference cells (t ref) can't be
meaningfully serialized as they are copied in the process of the
serialization, which breaks their semantics.
The library delimcc proposes a solution. Unlike your situation, I do
care to properly serialize `bad' value and properly restore. The
unmarshalled delimited continuation must work properly. I cannot
afford not to care what the unmarshalled value will look like.
The solution, which involves pre-processing a value before marshalling
and post-processing after unmarshalling, is described at
http://okmij.org/ftp/ML/ML.html#persistent-delim2cc
as well in Sec 8 of
http://okmij.org/ftp/continuations/caml-shift-journal.pdf
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-10-22 9:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
2013-10-16 14:03 ` Jacques-Henri Jourdan
2013-10-16 14:42 ` Enrico Tassi
2013-10-16 15:30 ` François Bobot
2013-10-16 16:24 ` Alain Frisch
2013-10-22 9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox