From: Sam Steingold <sds@podval.org>
To: Bardur Arantsson <spam@scientician.net>, caml-list@inria.fr
Subject: Re: zcat vs CamlZip
Date: Tue, 29 Aug 2006 15:15:51 -0400 [thread overview]
Message-ID: <44F49267.9080904@podval.org> (raw)
In-Reply-To: <ed22gp$un$1@sea.gmane.org>
Bardur Arantsson wrote:
> Sam Steingold wrote:
>> I read through a huge *.gz file.
>> I have two versions of the code:
> [--snip--]
>>
>> let buf = Buffer.create 1024
>> let gz_input_line gz_in char_counter line_counter =
>> Buffer.clear buf;
>> let finish () = incr line_counter; Buffer.contents buf in
>> let rec loop () =
>> let ch = Gzip.input_char gz_in in
>
> This is your most likely culprit. Any kind of "do this for every
> character" is usually insanely expensive when you can do it in bulk.
> (This is especially true when needing to do system calls, or if the
> called function cannot be inlined.)
>
yes, I thought about it, but I assumed that the ocaml gzip module
inlines Gzip.input_char (obviously the gzip module needs an internal
cache so Gzip.input_char does not _always_ translate to a system call,
most of the time it just pops a char from the internal buffer).
at any rate, do you really expect that using Gzip.input and then
searching the result for a newline, slicing and dicing to get the
individual input lines, &c &c would be faster?
Sam.
next prev parent reply other threads:[~2006-08-29 19:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-29 18:40 Sam Steingold
2006-08-29 18:54 ` Bardur Arantsson
2006-08-29 19:01 ` [Caml-list] " Florian Hars
2006-08-29 19:15 ` Sam Steingold [this message]
2006-08-29 19:48 ` Bárður Árantsson
2006-08-29 19:54 ` [Caml-list] " Gerd Stolpmann
2006-08-29 20:04 ` Gerd Stolpmann
2006-08-30 0:44 ` malc
2006-08-30 0:53 ` Jonathan Roewen
2006-08-29 19:37 ` John Carr
2006-08-29 19:11 ` [Caml-list] " Eric Cooper
2006-08-30 6:12 ` Jeff Henrikson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44F49267.9080904@podval.org \
--to=sds@podval.org \
--cc=caml-list@inria.fr \
--cc=spam@scientician.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox