From: Arthur Breitman <arthur.breitman@gmail.com>
To: Goswin von Brederlow <goswin-v-b@web.de>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Tue, 15 Apr 2014 07:16:22 -0600 [thread overview]
Message-ID: <CAAYUt0PLMS0j9dvQvMtAc-kOhNpk8oFCskaNfKiRKwspKFLOeg@mail.gmail.com> (raw)
In-Reply-To: <20140415090542.GB10918@frosties>
[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]
On Apr 15, 2014 5:06 AM, "Goswin von Brederlow" <goswin-v-b@web.de> wrote:
>
> On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote:
> > Responses in-line
> >
> > On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de
>wrote:
> >
> > > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > > > Hello all,
> > >
> > > Another structure I'm dealing with is a key value store, whose size
is on
> > > > the same order of magnitude as the size of the data above.
Initially,
> > > this
> > > > key-value store is empty.
> > >
> > > You probably don't expect to keep 1TB data in ram. Does the key-value
> > > store just reference values in the DB?
> >
> >
> > Indeed it does. That's part of the difficulty.
> >
> >
> > > I am given an "apply" function that, for a given key-value store and
for
> > > a
> > > > given node in the tree produces a new key-value store. Typically by
> > > > modifying a few keys in the key-value store. The amount of bits
changed
> > > is
> > > > on the same order of magnitude as the size of the node. When I
reach a
> > > leaf
> > > > of the tree, I can compute the optimality of that leaf from the
state of
> > > > the key-value store. I want to find the best leaf.
> > >
> > > Not sure what you mean there. Maybe you could give an example with a
> > > small data set (maybe 10 keys).
> > >
> >
> > It would be hard to make it a small example, but abstractly I have:
> >
> > type node = {value: string; children: node list option}
>
> A list can be [], no need for an option usualy. Or is there a
> difference between None and Some []?
1) yes the "option" part is useless, I missed it in the wrote
> > module type Context = begin
> > type t
> > val apply: t -> node -> t
> > val get: t -> string -> string option
> > val set: t -> string -> string -> t
> > val fitness -> int
> > val empty -> t
> > end
>
> Shouldn't that be:
>
> module type Context = begin
> type t
> type db_ref
> val apply: t -> node -> t
> val get: t -> string -> db_ref option
> val set: t -> string -> db_ref -> t
> val fitness -> int
> val empty -> t
> end
>
2) I'm not sure what you mean by a "db_ref", but I don't think the module
signature you describe is what I want. I stand by the signature I wrote.
What I want is a functional map that just happens to be so large it needs a
disk database to be stored.
> > Given a tree of nodes, and starting with Context.empty find the leaf
node
> > with the highest fitness. You can think of every node in the tree as an
> > operation acting on the context.
> >
> >
> > >
> > > Wouldn't a simple Map work as your key-value store? The apply function
> > > would then modify the map, creating a new map that shares most of the
> > > unchanged data with the original map. That way keeping the results of
> > > many apply functions in memory should be possible and applying should
> > > be fast.
> > >
> >
> > It would if I didn't potentially have a terabyte of data.
>
> But you wouldn't put the data into the map, only the reference to the
> DB.
That would work if I had a small number of keys with large value types.
Instead I have on the order of a billion keyvalue pairs.
>
> > > But if memory becomes a problem then you probably want to use some
> > > form of B-Tree to.
> > >
> > > MfG
> > > Goswin
> > --
> > Arthur Breitman
>
> MfG
> Goswin
>
> --
> Caml-list mailing list. Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
[-- Attachment #2: Type: text/html, Size: 5264 bytes --]
next prev parent reply other threads:[~2014-04-15 13:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-14 5:25 Arthur Breitman
2014-04-14 5:43 ` Francois Berenger
2014-04-14 6:07 ` Arthur Breitman
2014-04-14 8:12 ` Goswin von Brederlow
2014-04-14 19:32 ` Arthur Breitman
2014-04-15 9:05 ` Goswin von Brederlow
2014-04-15 13:16 ` Arthur Breitman [this message]
2014-04-15 13:50 ` Thomas Gazagnaire
2014-04-16 11:02 ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAYUt0PLMS0j9dvQvMtAc-kOhNpk8oFCskaNfKiRKwspKFLOeg@mail.gmail.com \
--to=arthur.breitman@gmail.com \
--cc=caml-list@inria.fr \
--cc=goswin-v-b@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox