From: "John Caml" <camljohn42@gmail.com>
To: caml-list@yquem.inria.fr
Subject: Re: large hash tables
Date: Tue, 19 Feb 2008 21:18:32 -0800 [thread overview]
Message-ID: <33d2b3f70802192118p4d887212mcf76e34447c54e52@mail.gmail.com> (raw)
In-Reply-To: <55e81f00-5ef7-4946-9272-05595299e114@41g2000hsc.googlegroups.com>
Thank you all for the assistance.
I've resolved the Stack_overflow problem by using an Array instead of
a Hashtbl; my keys were just consecutive integers, so this later
approach is clearly preferable.
However, the memory usage is still pretty bad...it takes nearly an
order of magnitude more memory than the equivalent C++ program. While
the C++ program required 800 MB, my ocaml program requires roughly 6
GB. Am I doing something very inefficiently? My revised code appears
below.
Also, if you have any other coding suggestions I'd appreciate hearing
them. I'm a long-time coder but new to Ocaml and eager to learn.
--------------
exception SplitError
let loadWholeFile filename =
let infile = open_in filename
and movieMajor = Array.make 17770 [] in
let rec loadLines count =
let line = input_line infile in
let murList = Pcre.split line in
match murList with
| m::u::r::[] ->
let rFloat = float_of_string r
and mInt = int_of_string m
and uInt = int_of_string u in
let newElement = (uInt, rFloat)
and oldList = movieMajor.(mInt) in
let newList = List.rev_append [newElement] oldList in
Array.set movieMajor mInt newList;
if (count mod 1000000) == 0 then begin
Printf.printf "count: %d\n" count;
flush stdout;
end;
loadLines (count + 1)
| _ -> raise SplitError
in
try
loadLines 0
with
End_of_file -> close_in infile;
movieMajor
;;
let filename = Sys.argv.(1);;
let str = loadWholeFile filename;;
next parent reply other threads:[~2008-02-20 5:18 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.XXbywsQknpl7bhlesWN8vFLM58c@ifi.uio.no>
[not found] ` <55e81f00-5ef7-4946-9272-05595299e114@41g2000hsc.googlegroups.com>
2008-02-20 5:18 ` John Caml [this message]
2008-02-20 6:11 ` [Caml-list] " Francois Rouaix
2008-02-20 8:37 ` David Allsopp
2008-02-20 8:44 ` Alain Frisch
2008-02-20 13:37 ` Damien Doligez
2008-02-20 14:37 ` Oliver Bandel
2008-02-20 16:02 ` Christopher L Conway
2008-02-21 13:54 ` Damien Doligez
2008-02-21 16:40 ` Christopher L Conway
2008-02-19 23:01 John Caml
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33d2b3f70802192118p4d887212mcf76e34447c54e52@mail.gmail.com \
--to=camljohn42@gmail.com \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox