From: Nuutti Kotivuori <naked+caml@naked.iki.fi>
To: Eric Dahlman <edahlman@atcorp.com>
Cc: skaller@users.sourceforge.net, caml-list@pauillac.inria.fr
Subject: Re: [Caml-list] Bug with really_input under cygwin
Date: Wed, 10 Mar 2004 17:25:42 +0200 [thread overview]
Message-ID: <87hdww4tc9.fsf@aka.i.naked.iki.fi> (raw)
In-Reply-To: <1078888018.2452.52.camel@pelican.wigram> (skaller@users.sourceforge.net's message of "10 Mar 2004 14:06:59 +1100")
skaller@users.sourceforge.net wrote:
> On Wed, 2004-03-10 at 09:30, Eric Dahlman wrote:
>> Howdy all,
>>
>> I have some code which is reads in a whole file in and returns it
>> as a string.
If you have a master's degree in reading in between the rant, you
probably picked out the right answer from the text below. But here it
is as a simple answer:
Loop doing 'input' on the file, until 'input' returns zero.
'really_input' is ofcourse nice and easy, but since you have no really
proper way of knowing how large the entire file is going to be in the
end, you need to make a decision with the buffer size anyway.
Binary or non-binary mode only affects the \r\n -> \n translation while
reading the file - and vice versa while writing.
> The only correct way to do this is to read a block at a time
> until you get a partial block.
>
> This is so EVEN in 'binary' mode, which is just another
> ill conceived Unix hack :-)
[...]
> It is unfortunate that C and Unix do not provide a coherent
> abstraction in this area. Even binary I/O is ill-conceived:
[...]
> C has been plagued by extremely ill considered functions.
> Even the basic IO operation is not correctly defined.
[...]
> There is no such thing as 'the number of characters
> in a file'. Perhaps there is a number of bytes in a file.
[...]
> In MS-DOS, files *always* consist of a number of 256
> byte blocks. It is impossible to have a file with
> a non-256 byte multiple size. Of course, text files
> uses an encoding with a Ctrl-Z at the end.
[...]
> Under Linux, the Standard for text encoding is UTF-8.
[...]
> I personally believe the easiest way to work around this
> quagmire of malspecification is to
>
> (a) ONLY use 8 bit binary I/O
> (b) ALWAYS read and write bytes
>
> even if you're processing text. Never depend on the
> language or OS conversion functions, its very unlikely
> they'll be right. Do all the conversions needed yourself.
> At least when you find a problem you're not handling
> correctly you can fix it.
Luckily not everybody sees the world as glum :-)
-- Naked
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2004-03-10 15:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-09 22:30 Eric Dahlman
2004-03-09 22:52 ` Karl Zilles
2004-03-10 3:06 ` skaller
2004-03-10 4:10 ` David Brown
2004-03-10 13:14 ` Richard Zidlicky
2004-03-11 4:11 ` skaller
2004-03-11 3:24 ` skaller
2004-03-10 15:25 ` Nuutti Kotivuori [this message]
2004-03-11 3:42 ` skaller
2004-03-11 5:02 ` Nuutti Kotivuori
2004-03-11 15:21 ` skaller
2004-03-11 6:32 ` james woodyatt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87hdww4tc9.fsf@aka.i.naked.iki.fi \
--to=naked+caml@naked.iki.fi \
--cc=caml-list@pauillac.inria.fr \
--cc=edahlman@atcorp.com \
--cc=skaller@users.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox