From: "Rémi Dewitte" <remi@gide.net>
To: yminsky@gmail.com
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Threads performance issue.
Date: Tue, 17 Feb 2009 08:40:11 +0100 [thread overview]
Message-ID: <2184b2340902162340s540c5ac7g9f42b59d03f643cb@mail.gmail.com> (raw)
In-Reply-To: <2184b2340902160937i53b8f3fbga01eaf14ed829f8f@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 6548 bytes --]
I have made some further experiments.
I have a functional version of the reading algorithm. I have the original
imperative version of the algorithm.
Either it is linked to thread (T) or not (X). Either it uses extlib (E) or
not (X).
Results are.
XX TX XE TE
Imperative | 3.37 | 7.80 | 3.56 | 8.40
Functional | 4.20 | 8.28 | 4.47 | 9.08
test.csv is a 21mo file with ~13k rows and a thousands of columns on a 15rpm
disk.
ocaml version : 3.11.0
uname -a gives
Linux localhost 2.6.28.4-server-1mnb #1 SMP Mon Feb 9 09:05:19 EST 2009 i686
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GNU/Linux
While I think I have to find improvements to the functional version, I
struggle finding a rationale behind this high loss of performance while I am
not even using threads, just linking to...
Cheers,
Rémi
On Mon, Feb 16, 2009 at 18:37, Rémi Dewitte <remi@gide.net> wrote:
> Yaron,
>
> I use a slightly modified version of the CSV library's load_rows . Here is
> the main code which is highly imperative style. I might transform it in
> purely functional style ?
>
> The main program is :
>
> open Printf;;
> open Sys;;
> let timed_exec start_message f =
> print_string start_message;
> let st1 = time () in
> let r = f () in
> print_endline ("done in " ^ (string_of_float ((time ()) -. st1)) );
> r;;
>
> (* This line enabled makes the program really slow ! *)
> let run_threaded f = Thread.create (fun () -> f (); Thread.exit ()) ()
>
> let () = timed_exec "Reading data " (fun () ->
> load_rows (fun _ -> ()) (open_in "file1.csv");
> load_rows (fun _ -> ()) (open_in "file2.csv");
> ()
> )
>
> The load_rows :
> let load_rows ?(separator = ',') ?(nread = -1) f chan =
> let nr = ref 0 in
> let row = ref [] in (* Current row. *)
> let field = ref [] in (* Current field. *)
> let state = ref StartField in (* Current state. *)
> let end_of_field () =
> let field_list = List.rev !field in
> let field_len = List.length field_list in
> let field_str = String.create field_len in
> let rec loop i = function
> [] -> ()
> | x :: xs ->
> field_str.[i] <- x;
> loop (i+1) xs
> in
> loop 0 field_list;
> row := (Some field_str) :: !row;
> field := [];
> state := StartField
> in
> let empty_field () =
> row := None :: !row;
> field := [];
> state := StartField
> in
> let end_of_row () =
> let row_list = List.rev !row in
> f row_list;
> row := [];
> state := StartField;
> nr := !nr + 1;
> in
> let rec loop () =
> let c = input_char chan in
> if c != '\r' then ( (* Always ignore \r characters. *)
> match !state with
> StartField -> (* Expecting quote or other char. *)
> if c = '"' then (
> state := InQuotedField;
> field := []
> ) else if c = separator then (* Empty field. *)
> empty_field ()
> else if c = '\n' then ( (* Empty field, end of row. *)
> empty_field ();
> end_of_row ()
> ) else (
> state := InUnquotedField;
> field := [c]
> )
> | InUnquotedField -> (* Reading chars to end of field. *)
> if c = separator then (* End of field. *)
> end_of_field ()
> else if c = '\n' then ( (* End of field and end of row. *)
> end_of_field ();
> end_of_row ()
> ) else
> field := c :: !field
> | InQuotedField -> (* Reading chars to end of field. *)
> if c = '"' then
> state := InQuotedFieldAfterQuote
> else
> field := c :: !field
> | InQuotedFieldAfterQuote ->
> if c = '"' then ( (* Doubled quote. *)
> field := c :: !field;
> state := InQuotedField
> ) else if c = '0' then ( (* Quote-0 is ASCII NUL. *)
> field := '\000' :: !field;
> state := InQuotedField
> ) else if c = separator then (* End of field. *)
> end_of_field ()
> else if c = '\n' then ( (* End of field and end of row. *)
> end_of_field ();
> end_of_row ()
> ) else ( (* Bad single quote in field. *)
> field := c :: '"' :: !field;
> state := InQuotedField
> )
> ); (* end of match *)
> if( nread < 0 or !nr < nread) then loop () else ()
> in
> try
> loop ()
> with
> End_of_file ->
> (* Any part left to write out? *)
> (match !state with
> StartField ->
> if !row <> [] then
> ( empty_field (); end_of_row () )
> | InUnquotedField | InQuotedFieldAfterQuote ->
> end_of_field (); end_of_row ()
> | InQuotedField ->
> raise (Bad_CSV_file "Missing end quote after quoted field.")
> )
>
>
> Thanks,
> Rémi
>
>
> On Mon, Feb 16, 2009 at 17:47, Yaron Minsky <yminsky@gmail.com> wrote:
>
>> 2009/2/16 Rémi Dewitte <remi@gide.net>
>>
>>> Hello,
>>>
>>> I would like to read two files in two different threads.
>>>
>>> I have made a first version reading the first then the second and it
>>> takes 2.8s (native).
>>>
>>> I decided to make a threaded version and before any use of thread I
>>> realized that just linking no even using it to the threads library makes my
>>> first version of the program to run in 12s !
>>
>>
>> Do you have a short benchmark you can post? The idea that the
>> thread-overhead would make a difference like that, particularly for IO-bound
>> code (which I'm guessing this is) is pretty surprising.
>>
>> y
>>
>>
>>>
>>> I use pcre, extlib, csv libraries as well.
>>>
>>> I guess it might come from GC slowing down thinks here, doesn't it ?
>>> Where can it come from otherwise ? Is there a workaround or something I
>>> should know ?
>>>
>>> Can ocaml use multiple cores ?
>>>
>>> Do you have few pointers on libraries to make parallel I/Os ?
>>>
>>> Thanks,
>>> Rémi
>>>
>>> _______________________________________________
>>> Caml-list mailing list. Subscription management:
>>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>>> Archives: http://caml.inria.fr
>>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>>
>>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 11086 bytes --]
[-- Attachment #2: transf.ml --]
[-- Type: application/octet-stream, Size: 3457 bytes --]
(* open ExtLib *)
(** Slithly modified copy from module CSV *)
exception Bad_CSV_file of string
type state_t = StartField
| InUnquotedField
| InQuotedField
| InQuotedFieldAfterQuote
let load_rows ?(separator = ',') ?(nread = -1) f chan =
let entry = (StartField,[],[],0) in
(* let st = ref entry in *)
let string_of_field field =
let field_list = List.rev field in
let field_len = List.length field_list in
let field_str = String.create field_len in
let rec loop i = function
[] -> ()
| x :: xs ->
field_str.[i] <- x;
loop (i+1) xs
in
loop 0 field_list;
field_str
in
let end_of_field field row nr =
let sf = (string_of_field field) in
(StartField,[],(Some sf :: row),nr)
in
let empty_field row nr =
(StartField,[],None :: row,nr)
in
let end_of_row row nr =
f (List.rev row);
(StartField,[],[],nr + 1)
in
let empty_field_and_end_row row nr =
end_of_row (None :: row) nr
in
let end_of_field_and_row field row nr =
let sf = (string_of_field field) in
end_of_row (Some sf :: row) nr
in
let rec read_char chan = try
let c = input_char chan in (if c != '\r' then Some c else read_char chan)
with End_of_file -> None
in
let rec loop st =
let (state,field,row,nr) = st in
if(nr = nread) then
()
else (
let co = read_char chan in
match co with
| None ->
(match state with
| StartField -> if (row <> []) then (let _ = empty_field_and_end_row row nr in ()) else ()
| InUnquotedField | InQuotedFieldAfterQuote ->
let _ = empty_field_and_end_row row nr in ()
| InQuotedField ->
raise (Bad_CSV_file "Missing end quote after quoted field.")
)
| Some c ->
(let stn = (match state with
StartField -> (* Expecting quote or other char. *)
if c = '"' then (
(InQuotedField,[],row,nr)
) else if c = separator then (* Empty field. *)
empty_field row nr
else if c = '\n' then ( (* Empty field, end of row. *)
empty_field_and_end_row row nr
) else (
(InUnquotedField,[c],row,nr)
)
| InUnquotedField -> (* Reading chars to end of field. *)
if c = separator then (* End of field. *)
end_of_field field row nr
else if c = '\n' then ( (* End of field and end of row. *)
end_of_field_and_row field row nr
) else
(state,c :: field,row,nr)
| InQuotedField -> (* Reading chars to end of field. *)
if c = '"' then
(InQuotedFieldAfterQuote,field,row,nr)
else
(state,c::field,row,nr)
| InQuotedFieldAfterQuote ->
if c = '"' then ( (* Doubled quote. *)
(InQuotedField,c::field,row,nr)
) else if c = '0' then ( (* Quote-0 is ASCII NUL. *)
(InQuotedField,'\000' :: field,row,nr)
) else if c = separator then (* End of field. *)
end_of_field field row nr
else if c = '\n' then ( (* End of field and end of row. *)
end_of_field_and_row field row nr
) else ( (* Bad single quote in field. *)
(InQuotedField,c :: '"' :: field,row,nr)
)
) in loop stn )
)
in
loop entry
(* let run_threaded f = Thread.create (fun () -> f (); Thread.exit ()) () *)
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
[-- Attachment #3: transi.ml --]
[-- Type: application/octet-stream, Size: 3311 bytes --]
(* open ExtLib *)
(** Slithly modified copy from module CSV *)
exception Bad_CSV_file of string
type state_t = StartField
| InUnquotedField
| InQuotedField
| InQuotedFieldAfterQuote
let load_rows ?(separator = ',') ?(nread = -1) f chan =
let nr = ref 0 in
let row = ref [] in (* Current row. *)
let field = ref [] in (* Current field. *)
let state = ref StartField in (* Current state. *)
let end_of_field () =
let field_list = List.rev !field in
let field_len = List.length field_list in
let field_str = String.create field_len in
let rec loop i = function
[] -> ()
| x :: xs ->
field_str.[i] <- x;
loop (i+1) xs
in
loop 0 field_list;
row := (Some field_str) :: !row;
field := [];
state := StartField
in
let empty_field () =
row := None :: !row;
field := [];
state := StartField
in
let end_of_row () =
let row_list = List.rev !row in
f row_list;
row := [];
state := StartField;
nr := !nr + 1;
in
let rec loop () =
let c = input_char chan in
if c != '\r' then ( (* Always ignore \r characters. *)
match !state with
StartField -> (* Expecting quote or other char. *)
if c = '"' then (
state := InQuotedField;
field := []
) else if c = separator then (* Empty field. *)
empty_field ()
else if c = '\n' then ( (* Empty field, end of row. *)
empty_field ();
end_of_row ()
) else (
state := InUnquotedField;
field := [c]
)
| InUnquotedField -> (* Reading chars to end of field. *)
if c = separator then (* End of field. *)
end_of_field ()
else if c = '\n' then ( (* End of field and end of row. *)
end_of_field ();
end_of_row ()
) else
field := c :: !field
| InQuotedField -> (* Reading chars to end of field. *)
if c = '"' then
state := InQuotedFieldAfterQuote
else
field := c :: !field
| InQuotedFieldAfterQuote ->
if c = '"' then ( (* Doubled quote. *)
field := c :: !field;
state := InQuotedField
) else if c = '0' then ( (* Quote-0 is ASCII NUL. *)
field := '\000' :: !field;
state := InQuotedField
) else if c = separator then (* End of field. *)
end_of_field ()
else if c = '\n' then ( (* End of field and end of row. *)
end_of_field ();
end_of_row ()
) else ( (* Bad single quote in field. *)
field := c :: '"' :: !field;
state := InQuotedField
)
); (* end of match *)
if( nread < 0 or !nr < nread) then loop () else ()
in
try
loop ()
with
End_of_file ->
(* Any part left to write out? *)
(match !state with
StartField ->
if !row <> [] then
( empty_field (); end_of_row () )
| InUnquotedField | InQuotedFieldAfterQuote ->
end_of_field (); end_of_row ()
| InQuotedField ->
raise (Bad_CSV_file "Missing end quote after quoted field.")
)
(* let run_threaded f = Thread.create (fun () -> f (); Thread.exit ()) () *)
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
let () = let i = open_in "test.csv" in load_rows (fun _ -> ()) i; close_in i
next prev parent reply other threads:[~2009-02-17 7:40 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-16 15:15 Rémi Dewitte
2009-02-16 15:28 ` [Caml-list] " Michał Maciejewski
2009-02-16 15:32 ` Rémi Dewitte
2009-02-16 15:42 ` David Allsopp
2009-02-16 16:07 ` Rémi Dewitte
2009-02-16 16:32 ` Sylvain Le Gall
2009-02-17 13:52 ` [Caml-list] " Frédéric Gava
2009-02-16 16:47 ` [Caml-list] " Yaron Minsky
2009-02-16 17:37 ` Rémi Dewitte
2009-02-17 7:40 ` Rémi Dewitte [this message]
2009-02-17 8:59 ` Mark Shinwell
2009-02-17 9:09 ` Rémi Dewitte
2009-02-17 9:53 ` Jon Harrop
2009-02-17 10:07 ` Sylvain Le Gall
2009-02-17 10:26 ` [Caml-list] " Mark Shinwell
2009-02-17 10:50 ` Rémi Dewitte
2009-02-17 10:56 ` Mark Shinwell
2009-02-17 11:33 ` Jon Harrop
2009-02-17 12:20 ` Yaron Minsky
2009-02-17 12:26 ` Rémi Dewitte
2009-02-17 17:14 ` Sylvain Le Gall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2184b2340902162340s540c5ac7g9f42b59d03f643cb@mail.gmail.com \
--to=remi@gide.net \
--cc=caml-list@yquem.inria.fr \
--cc=yminsky@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox