From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Tue, 15 Apr 2014 11:05:42 +0200 [thread overview]
Message-ID: <20140415090542.GB10918@frosties> (raw)
In-Reply-To: <CAAYUt0OA_59j+E70TWAgDq+2dWStFOLgNa=NJku-jtQ7y_7nXA@mail.gmail.com>
On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote:
> Responses in-line
>
> On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
>
> > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > > Hello all,
> >
> > Another structure I'm dealing with is a key value store, whose size is on
> > > the same order of magnitude as the size of the data above. Initially,
> > this
> > > key-value store is empty.
> >
> > You probably don't expect to keep 1TB data in ram. Does the key-value
> > store just reference values in the DB?
>
>
> Indeed it does. That's part of the difficulty.
>
>
> > I am given an "apply" function that, for a given key-value store and for
> > a
> > > given node in the tree produces a new key-value store. Typically by
> > > modifying a few keys in the key-value store. The amount of bits changed
> > is
> > > on the same order of magnitude as the size of the node. When I reach a
> > leaf
> > > of the tree, I can compute the optimality of that leaf from the state of
> > > the key-value store. I want to find the best leaf.
> >
> > Not sure what you mean there. Maybe you could give an example with a
> > small data set (maybe 10 keys).
> >
>
> It would be hard to make it a small example, but abstractly I have:
>
> type node = {value: string; children: node list option}
A list can be [], no need for an option usualy. Or is there a
difference between None and Some []?
> module type Context = begin
> type t
> val apply: t -> node -> t
> val get: t -> string -> string option
> val set: t -> string -> string -> t
> val fitness -> int
> val empty -> t
> end
Shouldn't that be:
module type Context = begin
type t
type db_ref
val apply: t -> node -> t
val get: t -> string -> db_ref option
val set: t -> string -> db_ref -> t
val fitness -> int
val empty -> t
end
> Given a tree of nodes, and starting with Context.empty find the leaf node
> with the highest fitness. You can think of every node in the tree as an
> operation acting on the context.
>
>
> >
> > Wouldn't a simple Map work as your key-value store? The apply function
> > would then modify the map, creating a new map that shares most of the
> > unchanged data with the original map. That way keeping the results of
> > many apply functions in memory should be possible and applying should
> > be fast.
> >
>
> It would if I didn't potentially have a terabyte of data.
But you wouldn't put the data into the map, only the reference to the
DB. And maybe have a cache of recently used DB references so you don't
have to lookup the same object over and over.
> > But if memory becomes a problem then you probably want to use some
> > form of B-Tree to.
> >
> > MfG
> > Goswin
> --
> Arthur Breitman
MfG
Goswin
next prev parent reply other threads:[~2014-04-15 9:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-14 5:25 Arthur Breitman
2014-04-14 5:43 ` Francois Berenger
2014-04-14 6:07 ` Arthur Breitman
2014-04-14 8:12 ` Goswin von Brederlow
2014-04-14 19:32 ` Arthur Breitman
2014-04-15 9:05 ` Goswin von Brederlow [this message]
2014-04-15 13:16 ` Arthur Breitman
2014-04-15 13:50 ` Thomas Gazagnaire
2014-04-16 11:02 ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140415090542.GB10918@frosties \
--to=goswin-v-b@web.de \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox