From: Lauri Alanko <la@iki.fi>
To: caml-list@inria.fr
Subject: [Caml-list] input_line (Re: pervasives)
Date: Sat, 27 Apr 2002 19:02:09 +0300 [thread overview]
Message-ID: <20020427160209.GB675@la.iki.fi> (raw)
In-Reply-To: <3CCA2C6E.5020007@ozemail.com.au>
On Sat, Apr 27, 2002 at 02:43:26PM +1000, John Max Skaller wrote:
> I have a philosophy .. a bit extreme perhaps .. I NEVER read anything
> other than lines (or whole files).
Vaguely related to this, I have some minor gripes about input_line.
Here's the implementation in 3.04:
let rec input_line chan =
let n = input_scan_line chan in
if n = 0 then (* n = 0: we are at EOF *)
raise End_of_file
else if n > 0 then begin (* n > 0: newline found in buffer *)
let res = string_create (n-1) in
ignore (unsafe_input chan res 0 (n-1));
ignore (input_char chan); (* skip the newline *)
res
end else begin (* n < 0: newline not found *)
let beg = string_create (-n) in
ignore(unsafe_input chan beg 0 (-n));
try
beg ^ input_line chan
with End_of_file ->
beg
end
It's obvious that this doesn't handle obnoxiously large newlineless
inputs very gracefully. Its complexity is quadratic and it's not tail
recursive. And worst of all, there are no limits on the size of string
to be created. So a maliciously designed huge input could blow either
the stack or the heap. I wouldn't want to use input_line in a network
application.
(All right, on 32-bit architectures input_line will terminate at ~16M
when string_create fails, but I wouldn't call that a solution.)
So here's an alternative implementation. It's tail recursive, its
amortized cpu usage is linear, and the space usage can be bounded:
exception Buffer_overflow of string
let expand_buf buf old_size expand =
if old_size == 0 then
string_create expand
else
let new_buf = string_create (old_size + expand) in
string_blit buf 0 new_buf 0 old_size;
new_buf
let rec input_bounded_line_to_buf chan buf offset maxlen =
let n = input_scan_line chan in
if n > maxlen + 1 || n < (-maxlen) then
let err_buf = expand_buf buf offset maxlen in
ignore (unsafe_input chan err_buf offset maxlen);
raise (Buffer_overflow err_buf)
else if n > 0 then
let ret_buf = expand_buf buf offset (n - 1) in
ignore (unsafe_input chan ret_buf offset (n - 1));
ignore (input_char chan);
ret_buf
else if n < 0 then
let m = (-n) in
let new_offset = offset + m in
let old_len = string_length buf in
let new_buf =
if new_offset > old_len then
expand_buf buf offset (new_offset + old_len * 2)
else
buf
in
ignore (unsafe_input chan new_buf offset m);
input_bounded_line_to_buf chan new_buf new_offset (maxlen - m)
else if offset = 0 then
raise End_of_file
else
expand_buf buf offset 0
let input_bounded_line chan maxlen =
input_bounded_line_to_buf chan "" 0 maxlen
let input_line chan = input_bounded_line chan max_int
I hope that something similar to this could be included in Pervasives in
the future.
Lauri Alanko
la@iki.fi
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2002-04-27 21:42 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-04-23 10:41 [Caml-list] How to read three integers from a text-file... ? Jacek Chrzaszcz
2002-04-24 10:44 ` Stefano Lanzavecchia
2002-04-24 18:46 ` Tomasz Zielonka
2002-04-24 11:16 ` Jacques Garrigue
2002-04-24 13:40 ` Tomasz Zielonka
2002-04-25 5:30 ` pervasives (was: Re: [Caml-list] How to read three integers from a text-file... ?) Chris Hecker
2002-04-25 6:33 ` Tomasz Zielonka
2002-04-25 17:54 ` Chris Hecker
2002-04-27 4:43 ` John Max Skaller
2002-04-27 16:02 ` Lauri Alanko [this message]
2002-04-30 12:07 ` [Caml-list] input_line Xavier Leroy
2002-05-03 0:13 ` Lauri Alanko
2002-05-03 11:27 ` Florian Hars
2002-04-24 21:23 ` [Caml-list] How to read three integers from a text-file... ? Tomasz Zielonka
2002-04-25 1:51 ` John Max Skaller
2002-04-25 8:55 ` Daniel de Rauglaudre
2002-04-25 11:19 ` Markus Mottl
2002-04-25 11:33 ` Jérôme Marant
2002-04-25 11:43 ` Markus Mottl
2002-04-25 17:56 ` Chris Hecker
2002-04-25 20:52 ` John Prevost
2002-04-25 23:32 ` Jacques Garrigue
2002-04-26 7:25 ` Jérôme Marant
2002-04-26 12:16 ` Jacques Garrigue
2002-05-02 8:48 ` Jacques Garrigue
2002-04-26 1:39 ` Daniel de Rauglaudre
2002-04-29 6:44 ` Francois Pottier
2002-04-30 11:07 ` Dave Berry
2002-04-30 12:20 ` Francois Pottier
2002-04-30 13:54 ` T. Kurt Bond
2002-05-03 22:12 ` Dave Berry
2002-04-30 14:42 ` Jocelyn Sérot
2002-05-02 7:34 ` [Caml-list] Extensible tuple types Francois Pottier
2002-05-02 9:42 ` Alain Frisch
2002-05-02 11:03 ` Francois Pottier
[not found] ` <6ECF4649-5C48-11D6-AC27-0003934491C2@lasmea.univ-bpclermon t.fr>
2002-05-03 21:58 ` [Caml-list] How to read three integers from a text-file... ? Dave Berry
2002-05-06 0:53 ` Eray Ozkural
2002-05-06 6:40 ` Florian Hars
2002-04-30 23:30 ` [Caml-list] Danvy "Functional Unparsing" style output in OCaml [was: How to read three integers from a text-file... ?] T. Kurt Bond
2002-05-13 14:11 ` [Caml-list] RE: Danvy "Functional Unparsing" style output in OCaml T. Kurt Bond
2002-05-13 19:59 ` [Caml-list] "Functional Unparsing" benchmark results links fixed [Was: Danvy "Functional Unparsing" style output in OCaml] T. Kurt Bond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020427160209.GB675@la.iki.fi \
--to=la@iki.fi \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox