From: Sam Steingold <sds@podval.org>
To: caml-list@inria.fr
Subject: zcat vs CamlZip
Date: Tue, 29 Aug 2006 14:40:23 -0400 [thread overview]
Message-ID: <44F48A17.5080005@podval.org> (raw)
I read through a huge *.gz file.
I have two versions of the code:
1. use Unix.open_process_in "zcat foo.gz".
2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09.
it turns out that the zcat version is 3(!) times as fast as the gzip.mli
one:
Run time: 189.435840 sec
Self: 189.435840 sec
sys: 183.447465 sec
user: 5.988375 sec
Children: 0.000000 sec
sys: 0.000000 sec
user: 0.000000 sec
GC: minor: 169778
major: 478
compactions: 3
Allocated: 5510457762.0 words
Wall clock: 206 sec (00:03:26)
vs
Run time: 58.471655 sec
Self: 54.855429 sec
sys: 48.527033 sec
user: 6.328396 sec
Children: 3.616226 sec
sys: 3.168198 sec
user: 0.448028 sec
GC: minor: 43174
major: 229
compactions: 5
Allocated: 1401290543.0 words
Wall clock: 78 sec (00:01:18)
since gzip.mli lacks input_line function, I had to roll my own:
let buf = Buffer.create 1024
let gz_input_line gz_in char_counter line_counter =
Buffer.clear buf;
let finish () = incr line_counter; Buffer.contents buf in
let rec loop () =
let ch = Gzip.input_char gz_in in
char_counter := Int64.succ !char_counter;
if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop ();
) in
try loop ()
with End_of_file ->
if Buffer.length buf = 0 then raise End_of_file else finish ()
is there something wrong with my gz_input_line?
is this a know performance issue with the CamlZip library?
thanks.
Sam.
next reply other threads:[~2006-08-29 18:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-29 18:40 Sam Steingold [this message]
2006-08-29 18:54 ` Bardur Arantsson
2006-08-29 19:01 ` [Caml-list] " Florian Hars
2006-08-29 19:15 ` Sam Steingold
2006-08-29 19:48 ` Bárður Árantsson
2006-08-29 19:54 ` [Caml-list] " Gerd Stolpmann
2006-08-29 20:04 ` Gerd Stolpmann
2006-08-30 0:44 ` malc
2006-08-30 0:53 ` Jonathan Roewen
2006-08-29 19:37 ` John Carr
2006-08-29 19:11 ` [Caml-list] " Eric Cooper
2006-08-30 6:12 ` Jeff Henrikson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44F48A17.5080005@podval.org \
--to=sds@podval.org \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox