Re: [Caml-list] Why List.map does not be implemented

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: oleg@okmij.org
Cc: "Goswin von Brederlow" <goswin-v-b@web.de>,
	"caml users" <caml-list@inria.fr>,
	"Frédéric Bour" <frederic.bour@lakaban.net>
Subject: Re: [Caml-list] Why List.map does not be implemented
Date: Wed, 29 Oct 2014 11:11:21 +0100	[thread overview]
Message-ID: <CAPFanBHQgD+octf9Qd=U+iL-LHms7Oy0a0RtWJ+HPX0gFXa=fg@mail.gmail.com> (raw)
In-Reply-To: <20141001102943.AA0C0C382A@www1.g3.pair.com>

During a discussion of Tail-Recursion Modulo Cons with Frédéric Bour,
I had an amusing idea of how to fix that problem pointed out by Oleg.
The idea is that we can detect at mutation time whether non-linear
control effect happened, and in that case do a deep copy of the
partially-filled structure. It sounds slightly crazy but works
beautifully.

Here is the current implementation style of `map` in Batteries; it is
merely an abstraction over Oleg's code example above (the decision to
implement destination-passing style was not made by Batteries, but in
the former Extlib project, but the small abstraction layer is due to
François Bérenger as part of Batteries).

type 'a mut_list = {
  hd: 'a;
  mutable tl: 'a list
}
external inj : 'a mut_list -> 'a list = "%identity"

module Acc1 = struct
  let create x = { hd = x; tl = [] }

  let accum acc x =
    let cell = create x in
    acc.tl <- inj cell;
    cell

  let return head = inj head
end

let map_dst f = function
  | [] -> []
  | x :: xs ->
     let open Acc1 in
     let rec loop f acc = function
       | [] -> return acc
       | x :: xs ->
          loop f (accum acc (f x)) xs
     in
     let acc = create (f x) in
     loop f acc xs

`accum` performs a mutation and returns the new destination in which
to produce the rest of the list. This is destination-passing-style,
and can also be understood as a manual TRMC optimization. It can also
be read as a "snoc" operation (adding a new element to the tail of a
list): it is possible to look at map_dst and think it is a pure
implementation. This pure implementation is optimized using mutation
under the hood, assuming (Mezzo-style) that the "map" function is the
unique owner of the intermediate list, which can thus be
mutated-in-place (even using strong mutation turning its type from a
mutable value to an immutable one).

Oleg explains that non-linear use of control operators (calling a
captured (delimited) continuation several times) allows to observe
this mutation, which thus destroys the probability computation of
HANSEI. Another way to see this problem is that non-linear use of
continuations breaks the unique-ownership invariant justifying the
optimization.

Yet it is possible to dynamically detect when we do not uniquely own
the list anymore (a previous invocation of the current continuation
has already mutated it), by filling the yet-to-be-mutated holes with a
secret canary value (instead of []). An accumulator is "fresh"
(uniquely owned) when its tail is (physically) equal to the canary
value. If we notice a mutator is not fresh, we perform a deep copy of
the list (from the head to the accumulator) and start accumulating
again from this copy.

Here is the new implementation of "map" using an Acc2 module
implementing freshness-check and deep-copy (code at the end of this
mail):

let rec map_dst2 f = function
  | [] -> []
  | x :: xs ->
     let open Acc2 in
     (* precondition: acc is fresh *)
     let rec loop f head acc = function
       | [] -> return head acc
       | x :: xs ->
      let next = f x in
      (* control effects during (f x) may destroy freshness *)
      if fresh acc then loop f head (accum acc next) xs
      else begin
         let head, acc = copy head (inj acc) in
         (* fresh after copy *)
         loop f head (accum acc next) xs
      end
     in
     let head = create (f x) in
     (* head is fresh *)
     loop f head head xs

The code is slightly more verbose because the deep-copy iteration
needs to be passed both the current accumulator and the head of the
list (it wouldn't know where to copy from otherwise). Notice that the
fast path (in absence of non-linear control) only has an additional
freshness check. I benchmarked the two implementations and the
difference is negligible -- in particular, it does not perform any
additional allocation in the fast case.

This version works fine with HANSEI.

(I'm not claiming this implementation alone is what people should use
for List.map. It does not behave particularly well on small lists,
which is the most important thing for most use-cases. The only point
is that the destination-passing-style can be made to work with
non-linear control.)

Finally, here is the "protected" Acc module implementing canary-check
and deep copy:

module Acc2 = struct
  let rec canary = Obj.magic () :: canary

  let copy head limit =
    let rec copy li limit tail =
      match li with
        | [] -> assert false
        | hd::tl ->
           let cell = { hd; tl = canary } in
           tail.tl <- inj cell;
           if li == limit then cell
           else copy tl limit cell
    in
    let new_head = { hd = head.hd; tl = canary } in
    if inj head == limit then (new_head, new_head)
    else begin
      let tail = copy head.tl limit new_head in
      new_head, tail
    end

  let create x = { hd = x; tl = canary }

  let fresh acc = acc.tl == canary

  let accum acc x =
    let cell = { hd = x; tl = canary } in
    acc.tl <- inj cell;
    cell

  let return head acc =
    acc.tl <- [];
    inj head
end

Full source for this experiment is available here:
  https://www.gitorious.org/gasche-snippets/destination-passing-style-and-control-effects/source/HEAD:map.ml

On Wed, Oct 1, 2014 at 12:29 PM,  <oleg@okmij.org> wrote:
>
> Gabriel Scherer wrote
>> My intuition would be that the mutation should not be observable in this
>> case, as it is only done on "private" data that the List.map function
>> allocate, owns, and which never escapes in its immutable form.
>
> Let me explain, on the example of the simplest Hansei function
>         val flip : float -> bool
> that flips (a possibly unfair) coin; flip 0.5 flips the fair coin.
> A good way to understand the meaning of this function is to take the
> frequentist approach. There are two ways the function call (flip 0.5)
> can return: it can return with true or with false. There are total two
> choices, in one of them the function returns true, in another
> false. So, the probability of returning true is 1/2. In general, the
> complete distribution -- which is the meaning of the program
> (flip 0.5) -- is as follows
>
> # exact_reify (fun () -> flip 0.5);;
> - : bool Ptypes.pV = [(0.5, Ptypes.V true); (0.5, Ptypes.V false)]
>
> which is what Hansei computes. Let's take a more complex example,
> closer to the topic: two independent coin flips
>
> # let model () = List.map flip [0.5; 0.5]
> val model : unit -> bool list = <fun>
>
> # exact_reify model;;
> - : bool list Ptypes.pV =
> [(0.25, Ptypes.V [true; true]); (0.25, Ptypes.V [true; false]);
>  (0.25, Ptypes.V [false; true]); (0.25, Ptypes.V [false; false])]
>
> The result correctly tells that there are four possible choices.
>
> Let us consider the function map from the Batteries. After desugaring,
> it has the following form
>
> type 'a mut_list = {
>     hd: 'a;
>     mutable tl: 'a list
>   }
>
> let map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function
>   | [] -> []
>   | h :: t ->
>       let rec loop dst = function
>         | [] -> ()
>         | h :: t ->
>             let cell = {hd = f h; tl = []} in
>             dst.tl <- Obj.magic cell;
>             loop cell t
>       in
>       let r = {hd = f h; tl = []} in
>       loop r t;
>       Obj.magic r
>
> Using this function, we obtain
>
> # let model1 () = map flip [0.5; 0.5]
> val model1 : unit -> bool list = <fun>
>
> # exact_reify model1
> - : bool list Ptypes.pV =
> [(0.5, Ptypes.V [true; false]); (0.5, Ptypes.V [false; false])]
>
> Something went wrong, didn't it? Where are the other two possible
> choices -- with true for the second flip -- for the list of two
> independent coin flips?
>
> To understand the problem, let us recall the main principle stated
> earlier: ``There are two ways the function call (flip 0.5)
> can return: it can return with true or with false.'' That is, the
> function call (flip 0.5) returns *twice*. Therefore, the call (f h)
> in the let cell statement returns twice. Therefore, the assignment
>             dst.tl <- Obj.magic cell;
> will be executed twice. And the second execution of that assignment
> will be with a different cell, which will override and destroy the
> result of the first assignment. That is how the "true" choices became
> lost.
>
>
>

next prev parent reply	other threads:[~2014-10-29 10:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-28 19:31 [Caml-list] Why List.map does not be implemented tail-recursively? Shuai Wang
2014-09-28 19:36 ` Gabriel Scherer
2014-09-28 19:45 ` Anthony Tavener
2014-09-29 12:08   ` Goswin von Brederlow
2014-09-29 14:02     ` Pierre Chambart
2014-09-29 15:44       ` Yaron Minsky
2014-09-29 21:00       ` Gabriel Scherer
2014-09-30  8:46         ` [Caml-list] Why List.map does not be implemented oleg
2014-09-30  9:07           ` Gabriel Scherer
2014-10-01 10:29             ` oleg
2014-10-01 12:00               ` Gerd Stolpmann
2014-10-29 10:11               ` Gabriel Scherer [this message]
2014-10-02 10:09         ` [Caml-list] Why List.map does not be implemented tail-recursively? Stephen Dolan
2015-06-01 12:02           ` Jon Harrop
2015-06-02 12:04             ` Stephen Dolan
2015-06-05 10:21               ` Goswin von Brederlow
2014-09-30  6:29       ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPFanBHQgD+octf9Qd=U+iL-LHms7Oy0a0RtWJ+HPX0gFXa=fg@mail.gmail.com' \
    --to=gabriel.scherer@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=frederic.bour@lakaban.net \
    --cc=goswin-v-b@web.de \
    --cc=oleg@okmij.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox