From: "John Caml" <camljohn42@gmail.com>
To: "Richard Jones" <rich@annexia.org>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] large hash tables
Date: Sat, 23 Feb 2008 21:39:53 -0800 [thread overview]
Message-ID: <33d2b3f70802232139x6c4cb118ycdccd42c90d8644e@mail.gmail.com> (raw)
In-Reply-To: <20080222003315.GA5326@annexia.org>
Richard, Thank you so much revising my program. I learned a lot from
reading over your changes, and the program works very nicely now. 1.2
GB for all 1 million items, which is efficient enough for all
practical purposes. Thanks again.
John
On Thu, Feb 21, 2008 at 4:33 PM, Richard Jones <rich@annexia.org> wrote:
> Mine version's a bit longer than your version, but hopefully more
> idiomatic and easier to understand.
>
> Program - http://www.annexia.org/tmp/movies.ml
> Create the test file - http://www.annexia.org/tmp/make_movies.ml
>
> It's best to read the program like this:
>
> (1) Start with the _interface_ ('signature') of the new ExtArray1
> module & type. _Ignore_ the implementation of this module for now.
>
> (2) Then look at the main part of the program (from where we allocate
> the result array down through the loop which reads the data).
>
> (3) Then look at the implementation of the module. The main
> complexity is that you can't just extend a Bigarray, but you have to
> keep reallocating it (in large chunks for efficiency).
>
> I measured it as taking some 230 MB for a 10 million line data file,
> but that doesn't necessarily mean it'll take 2 GB for 100 million
> lines because there's some space overhead which will decline as a
> proportion of the total memory used.
>
>
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>
next prev parent reply other threads:[~2008-02-24 5:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-19 23:01 John Caml
2008-02-19 23:34 ` [Caml-list] " Gabriel Kerneis
2008-02-19 23:36 ` Gerd Stolpmann
2008-02-19 23:51 ` Francois Rouaix
2008-02-20 9:37 ` Berke Durak
2008-02-20 9:56 ` Berke Durak
2008-02-20 12:48 ` Richard Jones
2008-02-20 15:54 ` Oliver Bandel
2008-02-21 22:45 ` John Caml
2008-02-22 0:33 ` Richard Jones
2008-02-24 5:39 ` John Caml [this message]
2008-02-22 14:19 ` Brian Hurt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33d2b3f70802232139x6c4cb118ycdccd42c90d8644e@mail.gmail.com \
--to=camljohn42@gmail.com \
--cc=caml-list@yquem.inria.fr \
--cc=rich@annexia.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox