From: Enrico Tassi <enrico.tassi@inria.fr>
To: caml-list@inria.fr
Subject: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
Date: Mon, 14 Oct 2013 17:30:23 +0200 [thread overview]
Message-ID: <20131014153023.GA19032@birba.invalid> (raw)
[-- Attachment #1: Type: text/plain, Size: 2057 bytes --]
Hello caml-list. I'm new here, It's my fist post, and I want to share
an hack I came up with to systematically solve some marshalling problems
I have in the context of Coq development. I hope the hack is of general
interest, maybe it is folklore, maybe not...
The Coq architecture is switching to a multi process one, where slave
processes are fed a system state, a portion of the proof script (an
entire proof at the moment) and are expected to return the resulting
proof term. Data is marshalled using the standard Marshal module.
The system state is made of pretty much anything, and is also user
extensible via plugins. Nothing prevents someone to stick in there
values that can hardly be marshalled, like callbacks, file descriptors,
lazy_t, and the like. Of course it is nice to be able to store
callbacks or lazy values in the system state, so forbidding all that is
not nice.
Luckily enough, I don't need this data in the slave processes.
Hence I could pre-process the system state and throw it away.
But the system state is big, and traversing it to prune out some bits is
likely to be expensive. Moreover I'm lazy, I don't want to code that
pruning function, hence the following solution.
Values that can't, or should not, be marshalled are stored in the system
state using a unique key, and given a key one can retrieve the
corresponding value. These (key,value) pairs are ephemeron, if the key
is not reachable, the value (if it has no extra references) will be
eventually collected too. When a key is marshalled, no error occurs
(keys are just doubly boxed integers). When a key is unmarshalled it
becomes invalid, i.e. the associated value cannot be retrieved anymore.
In some sense it is like if marshalling was forgetting the part of
the marshalled value that is stored as a key. All automatically, no
need to pre/post process the marshalled value and no extra memory
management hassle.
Thanks to Damien Doligez for his preliminary comments on the attached
files.
All sort of comments is welcome,
Cheers
--
Enrico Tassi
[-- Attachment #2: ephemeron.mli --]
[-- Type: text/plain, Size: 1638 bytes --]
(* Use case:
You have a data structure that needs to be marshalled but it contains
unmarshallable data (like a closure, or a file descriptor). Actually
you don't need this data to be preserved by marshalling, it just happens
to be there.
You could produced a trimmed down data structure, but then, once
unmarshalled, you can't used the very same code to process it, unless you
re-inject the trimmed down data structure into the standard one, using
dummy values for the unmarshallable stuff.
Similarly you could change your data structure turning all types [bad]
into [bad option], then just before marshalling you set all values of type
[bad option] to [None]. Still this pruning may be expensive and you have
to code it.
Desiderata:
The marshalling operation automatically discards values that cannot be
marshalled or cannot be properly unmarshalled.
Proposed solution:
Turn all occurrences of [bad] into [bad key] in your data structure.
Use [crate bad_val] to obtain a unique key [k] for [bad_val], and store
[k] in the data structure. Use [get k] to obtain [bad_val].
An ['a key] can always be marshalled. When marshalled, a key loses its
value. The function [get] raises Not_found on unmarshalled keys.
If a key is garbage collected, the corresponding value is garbage
collected too (unless extra references to it exist).
In short no memory management hassle, keys can just replace their
corresponding value in the data structure. *)
type 'a key
val create : 'a -> 'a key
val get : 'a key -> 'a
[-- Attachment #3: ephemeron.ml --]
[-- Type: text/plain, Size: 1484 bytes --]
type key_type = int
type boxed_key = key_type ref ref
let mk_key : unit -> boxed_key =
(* We should take a random value here, is there a random function in OCaml?*)
let bid = ref 0 in
(* According to OCaml Gc module documentation, Pervasives.ref is one of the
few ways of getting a boxed value the compiler will never alias. *)
fun () -> incr bid; Pervasives.ref (Pervasives.ref !bid)
(* A phantom type to preserve type safety *)
type 'a key = boxed_key
(* Comparing keys with == grants that if a key is unmarshalled (in the same
process where it was created or in another one) it is not mistaken for
an already existing one (unmarshal has no right to alias). If the initial
value of bid is taken at random, then one also avoids potential collisions *)
module HT = Hashtbl.Make(struct
type t = key_type ref
let equal k1 k2 = k1 == k2
let hash id = !id
end)
(* A key is the (unique) value inside a boxed key, hence it does not
keep its corresponding boxed key reachable (replacing key_type by boxed_key
would make the key always reachable) *)
let values : Obj.t HT.t = HT.create 1001
let collect k = HT.remove values !k
(* The only reference to the boxed key is the one returned, when the user drops
it the value eventually disappears from the values table above *)
let create (v : 'a) : 'a key =
let k = mk_key () in
HT.add values !k (Obj.repr v);
Gc.finalise collect k;
k
let get (k : 'a key) : 'a = Obj.obj (HT.find values !k)
[-- Attachment #4: test.ml --]
[-- Type: text/plain, Size: 839 bytes --]
module E = Ephemeron ;;
let test0 () =
let f = E.create (fun () -> prerr_endline "OK") in
try E.get f ()
with Not_found -> prerr_endline "WTF"
;;
let test1 () =
let f = E.create (fun () -> prerr_endline "OK") in
Gc.full_major ();
try E.get f ()
with Not_found -> prerr_endline "WTF"
;;
let test2 () =
let str =
let f = E.create (fun () -> prerr_endline "WTF") in
Marshal.to_string f [] in
Gc.full_major ();
let f : (unit -> unit) E.key = Marshal.from_string str 0 in
try E.get f ()
with Not_found -> prerr_endline "OK"
;;
let test3 () =
let str =
let f = E.create (fun () -> prerr_endline "WTF") in
Marshal.to_string f [] in
let f : (unit -> unit) E.key = Marshal.from_string str 0 in
try E.get f ()
with Not_found -> prerr_endline "OK"
;;
test0 ();;
test1 ();;
test2 ();;
test3 ();;
next reply other threads:[~2013-10-14 15:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-14 15:30 Enrico Tassi [this message]
2013-10-16 14:03 ` Jacques-Henri Jourdan
2013-10-16 14:42 ` Enrico Tassi
2013-10-16 15:30 ` François Bobot
2013-10-16 16:24 ` Alain Frisch
2013-10-22 9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131014153023.GA19032@birba.invalid \
--to=enrico.tassi@inria.fr \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox