From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: oleg@okmij.org
Cc: "Goswin von Brederlow" <goswin-v-b@web.de>,
"caml users" <caml-list@inria.fr>,
"Frédéric Bour" <frederic.bour@lakaban.net>
Subject: Re: [Caml-list] Why List.map does not be implemented
Date: Wed, 29 Oct 2014 11:11:21 +0100 [thread overview]
Message-ID: <CAPFanBHQgD+octf9Qd=U+iL-LHms7Oy0a0RtWJ+HPX0gFXa=fg@mail.gmail.com> (raw)
In-Reply-To: <20141001102943.AA0C0C382A@www1.g3.pair.com>
During a discussion of Tail-Recursion Modulo Cons with Frédéric Bour,
I had an amusing idea of how to fix that problem pointed out by Oleg.
The idea is that we can detect at mutation time whether non-linear
control effect happened, and in that case do a deep copy of the
partially-filled structure. It sounds slightly crazy but works
beautifully.
Here is the current implementation style of `map` in Batteries; it is
merely an abstraction over Oleg's code example above (the decision to
implement destination-passing style was not made by Batteries, but in
the former Extlib project, but the small abstraction layer is due to
François Bérenger as part of Batteries).
type 'a mut_list = {
hd: 'a;
mutable tl: 'a list
}
external inj : 'a mut_list -> 'a list = "%identity"
module Acc1 = struct
let create x = { hd = x; tl = [] }
let accum acc x =
let cell = create x in
acc.tl <- inj cell;
cell
let return head = inj head
end
let map_dst f = function
| [] -> []
| x :: xs ->
let open Acc1 in
let rec loop f acc = function
| [] -> return acc
| x :: xs ->
loop f (accum acc (f x)) xs
in
let acc = create (f x) in
loop f acc xs
`accum` performs a mutation and returns the new destination in which
to produce the rest of the list. This is destination-passing-style,
and can also be understood as a manual TRMC optimization. It can also
be read as a "snoc" operation (adding a new element to the tail of a
list): it is possible to look at map_dst and think it is a pure
implementation. This pure implementation is optimized using mutation
under the hood, assuming (Mezzo-style) that the "map" function is the
unique owner of the intermediate list, which can thus be
mutated-in-place (even using strong mutation turning its type from a
mutable value to an immutable one).
Oleg explains that non-linear use of control operators (calling a
captured (delimited) continuation several times) allows to observe
this mutation, which thus destroys the probability computation of
HANSEI. Another way to see this problem is that non-linear use of
continuations breaks the unique-ownership invariant justifying the
optimization.
Yet it is possible to dynamically detect when we do not uniquely own
the list anymore (a previous invocation of the current continuation
has already mutated it), by filling the yet-to-be-mutated holes with a
secret canary value (instead of []). An accumulator is "fresh"
(uniquely owned) when its tail is (physically) equal to the canary
value. If we notice a mutator is not fresh, we perform a deep copy of
the list (from the head to the accumulator) and start accumulating
again from this copy.
Here is the new implementation of "map" using an Acc2 module
implementing freshness-check and deep-copy (code at the end of this
mail):
let rec map_dst2 f = function
| [] -> []
| x :: xs ->
let open Acc2 in
(* precondition: acc is fresh *)
let rec loop f head acc = function
| [] -> return head acc
| x :: xs ->
let next = f x in
(* control effects during (f x) may destroy freshness *)
if fresh acc then loop f head (accum acc next) xs
else begin
let head, acc = copy head (inj acc) in
(* fresh after copy *)
loop f head (accum acc next) xs
end
in
let head = create (f x) in
(* head is fresh *)
loop f head head xs
The code is slightly more verbose because the deep-copy iteration
needs to be passed both the current accumulator and the head of the
list (it wouldn't know where to copy from otherwise). Notice that the
fast path (in absence of non-linear control) only has an additional
freshness check. I benchmarked the two implementations and the
difference is negligible -- in particular, it does not perform any
additional allocation in the fast case.
This version works fine with HANSEI.
(I'm not claiming this implementation alone is what people should use
for List.map. It does not behave particularly well on small lists,
which is the most important thing for most use-cases. The only point
is that the destination-passing-style can be made to work with
non-linear control.)
Finally, here is the "protected" Acc module implementing canary-check
and deep copy:
module Acc2 = struct
let rec canary = Obj.magic () :: canary
let copy head limit =
let rec copy li limit tail =
match li with
| [] -> assert false
| hd::tl ->
let cell = { hd; tl = canary } in
tail.tl <- inj cell;
if li == limit then cell
else copy tl limit cell
in
let new_head = { hd = head.hd; tl = canary } in
if inj head == limit then (new_head, new_head)
else begin
let tail = copy head.tl limit new_head in
new_head, tail
end
let create x = { hd = x; tl = canary }
let fresh acc = acc.tl == canary
let accum acc x =
let cell = { hd = x; tl = canary } in
acc.tl <- inj cell;
cell
let return head acc =
acc.tl <- [];
inj head
end
Full source for this experiment is available here:
https://www.gitorious.org/gasche-snippets/destination-passing-style-and-control-effects/source/HEAD:map.ml
On Wed, Oct 1, 2014 at 12:29 PM, <oleg@okmij.org> wrote:
>
> Gabriel Scherer wrote
>> My intuition would be that the mutation should not be observable in this
>> case, as it is only done on "private" data that the List.map function
>> allocate, owns, and which never escapes in its immutable form.
>
> Let me explain, on the example of the simplest Hansei function
> val flip : float -> bool
> that flips (a possibly unfair) coin; flip 0.5 flips the fair coin.
> A good way to understand the meaning of this function is to take the
> frequentist approach. There are two ways the function call (flip 0.5)
> can return: it can return with true or with false. There are total two
> choices, in one of them the function returns true, in another
> false. So, the probability of returning true is 1/2. In general, the
> complete distribution -- which is the meaning of the program
> (flip 0.5) -- is as follows
>
> # exact_reify (fun () -> flip 0.5);;
> - : bool Ptypes.pV = [(0.5, Ptypes.V true); (0.5, Ptypes.V false)]
>
> which is what Hansei computes. Let's take a more complex example,
> closer to the topic: two independent coin flips
>
> # let model () = List.map flip [0.5; 0.5]
> val model : unit -> bool list = <fun>
>
> # exact_reify model;;
> - : bool list Ptypes.pV =
> [(0.25, Ptypes.V [true; true]); (0.25, Ptypes.V [true; false]);
> (0.25, Ptypes.V [false; true]); (0.25, Ptypes.V [false; false])]
>
> The result correctly tells that there are four possible choices.
>
> Let us consider the function map from the Batteries. After desugaring,
> it has the following form
>
> type 'a mut_list = {
> hd: 'a;
> mutable tl: 'a list
> }
>
> let map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function
> | [] -> []
> | h :: t ->
> let rec loop dst = function
> | [] -> ()
> | h :: t ->
> let cell = {hd = f h; tl = []} in
> dst.tl <- Obj.magic cell;
> loop cell t
> in
> let r = {hd = f h; tl = []} in
> loop r t;
> Obj.magic r
>
> Using this function, we obtain
>
> # let model1 () = map flip [0.5; 0.5]
> val model1 : unit -> bool list = <fun>
>
> # exact_reify model1
> - : bool list Ptypes.pV =
> [(0.5, Ptypes.V [true; false]); (0.5, Ptypes.V [false; false])]
>
> Something went wrong, didn't it? Where are the other two possible
> choices -- with true for the second flip -- for the list of two
> independent coin flips?
>
> To understand the problem, let us recall the main principle stated
> earlier: ``There are two ways the function call (flip 0.5)
> can return: it can return with true or with false.'' That is, the
> function call (flip 0.5) returns *twice*. Therefore, the call (f h)
> in the let cell statement returns twice. Therefore, the assignment
> dst.tl <- Obj.magic cell;
> will be executed twice. And the second execution of that assignment
> will be with a different cell, which will override and destroy the
> result of the first assignment. That is how the "true" choices became
> lost.
>
>
>
next prev parent reply other threads:[~2014-10-29 10:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-28 19:31 [Caml-list] Why List.map does not be implemented tail-recursively? Shuai Wang
2014-09-28 19:36 ` Gabriel Scherer
2014-09-28 19:45 ` Anthony Tavener
2014-09-29 12:08 ` Goswin von Brederlow
2014-09-29 14:02 ` Pierre Chambart
2014-09-29 15:44 ` Yaron Minsky
2014-09-29 21:00 ` Gabriel Scherer
2014-09-30 8:46 ` [Caml-list] Why List.map does not be implemented oleg
2014-09-30 9:07 ` Gabriel Scherer
2014-10-01 10:29 ` oleg
2014-10-01 12:00 ` Gerd Stolpmann
2014-10-29 10:11 ` Gabriel Scherer [this message]
2014-10-02 10:09 ` [Caml-list] Why List.map does not be implemented tail-recursively? Stephen Dolan
2015-06-01 12:02 ` Jon Harrop
2015-06-02 12:04 ` Stephen Dolan
2015-06-05 10:21 ` Goswin von Brederlow
2014-09-30 6:29 ` Goswin von Brederlow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPFanBHQgD+octf9Qd=U+iL-LHms7Oy0a0RtWJ+HPX0gFXa=fg@mail.gmail.com' \
--to=gabriel.scherer@gmail.com \
--cc=caml-list@inria.fr \
--cc=frederic.bour@lakaban.net \
--cc=goswin-v-b@web.de \
--cc=oleg@okmij.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox