* [Caml-list] Unix.lseek versus Pervasives.pos
@ 2003-03-17 22:45 Shivkumar Chandrasekaran
2003-03-18 6:54 ` Basile STARYNKEVITCH
0 siblings, 1 reply; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-17 22:45 UTC (permalink / raw)
To: caml-list
Hi,
Currently I am trying to handle "LargeFiles" while marshalling caml
values and I have run into this incidental problem (nothing to do with
LargeFile). If I open a file with "open_out_bin", write to it using
"output_value" and then try to determine the position in the file using
"pos", I get the correct value. However, if I use Unix.lseek thus
Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR
I get a different value (so far always 0) than the one I get from
pos fd_out
The manual does not seem to help. Any advice will be appreciated.
Thanks,
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Caml-list] Unix.lseek versus Pervasives.pos
2003-03-17 22:45 [Caml-list] Unix.lseek versus Pervasives.pos Shivkumar Chandrasekaran
@ 2003-03-18 6:54 ` Basile STARYNKEVITCH
0 siblings, 0 replies; 10+ messages in thread
From: Basile STARYNKEVITCH @ 2003-03-18 6:54 UTC (permalink / raw)
To: caml-list
>>>>> "Shivkumar" == Shivkumar Chandrasekaran <shiv@ece.ucsb.edu> writes:
Shivkumar> Hi, Currently I am trying to handle "LargeFiles" while
Shivkumar> marshalling caml values and I have run into this
Shivkumar> incidental problem (nothing to do with LargeFile). If I
Shivkumar> open a file with "open_out_bin", write to it using
Shivkumar> "output_value"
You apparently forgot to flush the channel.
Shivkumar> and then try to determine the position
Shivkumar> in the file using "pos", I get the correct
Shivkumar> value. However, if I use Unix.lseek thus
Shivkumar> Unix.lseek (Unix.descr_of_out_channel fd_out) 0
Shivkumar> Unix.SEEK_CUR
Shivkumar> I get a different value (so far always 0)
Forgetting to flush files on most systems should give similar
errors. Flushing files is not Ocaml specific, but a general issue (at
least under Unix, and probably Windows).
Perhaps the Ocaml manual could add as a hint to never forget flushing
files (but the tip might be there already), but this hint is very
basic & generic and is not Ocaml specific.
Regards.
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>]
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
[not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
@ 2003-03-18 17:35 ` Shivkumar Chandrasekaran
2003-03-18 17:39 ` Shivkumar Chandrasekaran
1 sibling, 0 replies; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-18 17:35 UTC (permalink / raw)
To: caml-list
I went back to my code and put flushes after all writes. It still did
not help. Furthermore, once I replaced output_value by Unix.write (not
followed by flushes) lseek worked perfectly well! So I am not sure
whether the problem is due to non-flushing or not. Furthermore I
observed that in the Unix module there is no way to flush/sync a file.
Is it not needed? Apparently not.
--shiv--
On Monday, March 17, 2003, at 11:21 PM, Francois Rouaix (and similarly
Basile STARYNKEVITCH) wrote:
> You may need to flush the channel. If the data is still in the
> buffers, the fd position will not have been updated.
>
> --f
>
> On Monday, Mar 17, 2003, at 23:45 Europe/Paris, Shivkumar
> Chandrasekaran wrote:
>
>> Hi,
>>
>> Currently I am trying to handle "LargeFiles" while marshalling caml
>> values and I have run into this incidental problem (nothing to do
>> with LargeFile). If I open a file with "open_out_bin", write to it
>> using "output_value" and then try to determine the position in the
>> file using "pos", I get the correct value. However, if I use
>> Unix.lseek > thus
>>
>> Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR
>>
>> I get a different value (so far always 0) than the one I get from
>>
>> pos fd_out
>>
>> The manual does not seem to help. Any advice will be appreciated.
>> Thanks,
>>
>> --shiv--
>>
>> -------------------
>> To unsubscribe, mail caml-list-request@inria.fr Archives:
>> http://caml.inria.fr
>> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
>> http://caml.inria.fr/FAQ/
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>
>>
>
>
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
[not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
2003-03-18 17:35 ` Shivkumar Chandrasekaran
@ 2003-03-18 17:39 ` Shivkumar Chandrasekaran
2003-03-19 20:27 ` Xavier Leroy
1 sibling, 1 reply; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-18 17:39 UTC (permalink / raw)
To: caml-list
It would seem to me that it would be convenient to have 64 bit versions
of seek_in, seek_out, pos_in, pos_out in the Pervasives module. This
would help decouple the Pervasives I/O module a little more from the
Unix module.
--shiv--
On Monday, March 17, 2003, at 11:21 PM, Francois Rouaix wrote:
> You may need to flush the channel. If the data is still in the
> buffers, the fd position will not have been updated.
>
> --f
>
> On Monday, Mar 17, 2003, at 23:45 Europe/Paris, Shivkumar
> Chandrasekaran wrote:
>
>> Hi,
>>
>> Currently I am trying to handle "LargeFiles" while marshalling caml
>> values and I have run into this incidental problem (nothing to do
>> with LargeFile). If I open a file with "open_out_bin", write to it
>> using "output_value" and then try to determine the position in the
>> file using "pos", I get the correct value. However, if I use
>> Unix.lseek > thus
>>
>> Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR
>>
>> I get a different value (so far always 0) than the one I get from
>>
>> pos fd_out
>>
>> The manual does not seem to help. Any advice will be appreciated.
>> Thanks,
>>
>> --shiv--
>>
>> -------------------
>> To unsubscribe, mail caml-list-request@inria.fr Archives:
>> http://caml.inria.fr
>> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
>> http://caml.inria.fr/FAQ/
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>
>>
>
>
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
@ 2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: cashin @ 2003-03-19 18:36 UTC (permalink / raw)
To: caml-list
Sorry if this shows up as a duplicate.
Basile STARYNKEVITCH <basile@starynkevitch.net> writes:
...
> You apparently forgot to flush the channel.
Flushes are for writes, but even when using a test program that just
reads, zero is returned when it appears that it shouldn't return zero.
Compare the short ocaml program below to the comparable C version.
The ocaml version has lseek returning position zero after reading 10
bytes from the file.
ecashin@meili seek-tell$ ./test
after reading 10 chars: "let main =", position is 0
ecashin@meili seek-tell$ cat main.ml
let main =
let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0
and buf = String.create 1024 in
Printf.printf "after reading %d chars: \"%s\", position is %d\n"
(UnixLabels.read fd ~buf ~pos:0 ~len:10)
buf
(UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
;;
main
... but in the C version you get the expected position reported.
ecashin@meili seek-tell$ ./test
after reading "#include <" lseek returns 10
ecashin@meili seek-tell$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd = open("main.c", O_RDONLY);
char buf[1024];
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
memset(buf, '\0', sizeof(buf));
read(fd, buf, 10);
printf("after reading \"%s\" lseek returns %d\n",
buf, (int) lseek(fd, 0, SEEK_CUR));
return 0;
}
--
--Ed L Cashin | PGP public key:
ecashin@uga.edu | http://noserose.net/e/pgp/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
2003-03-19 18:36 cashin
@ 2003-03-19 18:48 ` Nicolas George
2003-03-19 19:01 ` cashin
2003-03-19 18:55 ` Ken Rose
2003-03-19 19:08 ` Basile STARYNKEVITCH
2 siblings, 1 reply; 10+ messages in thread
From: Nicolas George @ 2003-03-19 18:48 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 793 bytes --]
Le nonidi 29 ventôse, an CCXI, cashin@cs.uga.edu a écrit :
> Printf.printf "after reading %d chars: \"%s\", position is %d\n"
> (UnixLabels.read fd ~buf ~pos:0 ~len:10)
> buf
> (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
Use strace, and you'll see something like that :
open("main.ml", O_RDONLY|O_LARGEFILE) = 3
_llseek(3, 0, [0], SEEK_CUR) = 0
read(3, " let main", 10) = 10
write(1, "after reading 10 chars: \" let m"..., 1066) = 1066
So you can see that the lseek is done before the read. And indeed, your
calls to read and lseek can occur in an unspecified order. I guess that
if you write
let len = UnixLabels.read ... in
let pos = UnixLabels.lseek ... in
Printf.printf ...
you will get the right result.
[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
@ 2003-03-19 18:55 ` Ken Rose
2003-03-19 19:08 ` Basile STARYNKEVITCH
2 siblings, 0 replies; 10+ messages in thread
From: Ken Rose @ 2003-03-19 18:55 UTC (permalink / raw)
To: cashin; +Cc: caml-list
cashin@cs.uga.edu wrote:
>
> Flushes are for writes, but even when using a test program that just
> reads, zero is returned when it appears that it shouldn't return zero.
> Compare the short ocaml program below to the comparable C version.
>
> The ocaml version has lseek returning position zero after reading 10
> bytes from the file.
>
> ecashin@meili seek-tell$ ./test
> after reading 10 chars: "let main =", position is 0
> ecashin@meili seek-tell$ cat main.ml
> let main =
> let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0
> and buf = String.create 1024 in
> Printf.printf "after reading %d chars: \"%s\", position is %d\n"
> (UnixLabels.read fd ~buf ~pos:0 ~len:10)
> buf
> (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
> ;;
>
> main
>
It looks like you're getting bitten by the order of evaluation of
function arguments.
$ cat main.ml
let main =
let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0
and buf = String.create 1024 in
let r = (UnixLabels.read fd ~buf ~pos:0 ~len:10) in
Printf.printf "after reading %d chars: \"%s\", position is %d\n"
r
buf
(UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
;;
main
$ ./a.out
after reading 10 chars: "let main =", position is 10
$
- ken
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
2003-03-19 18:55 ` Ken Rose
@ 2003-03-19 19:08 ` Basile STARYNKEVITCH
2 siblings, 0 replies; 10+ messages in thread
From: Basile STARYNKEVITCH @ 2003-03-19 19:08 UTC (permalink / raw)
To: cashin; +Cc: caml-list
>>>>> "cashin" == cashin <cashin@cs.uga.edu> writes:
cashin> Sorry if this shows up as a duplicate. Basile
cashin> STARYNKEVITCH <basile@starynkevitch.net> writes:
Basile>> You apparently forgot to flush the channel.
Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.
cashin> Flushes are for writes, but even when using a test program
cashin> that just reads, zero is returned when it appears that it
cashin> shouldn't return zero. Compare the short ocaml program
cashin> below to the comparable C version.
Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as
struct channel {
int fd; /* Unix file descriptor */
file_offset offset; /* Absolute position of fd in the file */
char * end; /* Physical end of the buffer */
char * curr; /* Current position in the buffer */
char * max; /* Logical end of the buffer (for input) */
void * mutex; /* Placeholder for mutex (for systhreads) */
struct channel * next; /* Linear chaining of channels (flush_all) */
int revealed; /* For Cash only */
int old_revealed; /* For Cash only */
int refcount; /* For flush_all and for Cash */
char buff[IO_BUFFER_SIZE]; /* The buffer itself */
};
where IO_BUFFER_SIZE is usually 4096 bytes.
The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:
/* file main.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main(void)
{
FILE *f = fopen("main.c", "r");
char buf[1024];
int fd = fileno(f);
memset(buf, '\0', sizeof(buf));
fread(buf, 1, 10, f);
printf("after reading \"%s\" lseek returns %d\n",
buf, (int) lseek(fd, 0, SEEK_CUR));
return 0;
}
When I run above file with tcc (www.tinycc.org) I get
after reading " /* file " lseek returns 483
which is messy as I was expecting.
In a short sentence, never mix Unix.read (or other Unix IO) &
Pervasive.* channel operations.
As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do
Unix.read operations on it, but I commented this code (opensource code
in Poesia monitor) with
(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
Unix file descriptor; no output takes actually place on the output
channel; all output is thru Unix.write *)
and later
(** the reply channel from filter to monitor [don't use the
Pervasives.channel; using Unix] *)
The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-03-19 20:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-17 22:45 [Caml-list] Unix.lseek versus Pervasives.pos Shivkumar Chandrasekaran
2003-03-18 6:54 ` Basile STARYNKEVITCH
[not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
2003-03-18 17:35 ` Shivkumar Chandrasekaran
2003-03-18 17:39 ` Shivkumar Chandrasekaran
2003-03-19 20:27 ` Xavier Leroy
2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
2003-03-19 19:01 ` cashin
2003-03-19 18:55 ` Ken Rose
2003-03-19 19:08 ` Basile STARYNKEVITCH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox