From: Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at>
To: Michal Moskal <malekith@pld-linux.org>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Reading a file
Date: Wed, 21 May 2003 08:11:40 +0200 [thread overview]
Message-ID: <3ECB189C.5090400@stud.uni-graz.at> (raw)
In-Reply-To: <20030520132032.GA9564@roke.freak>
Michal Moskal wrote:
>
>If you expand each line of megabyte file to list of characters -- it
>cannot be fast.
>
Enclosed the OCaml version in question:
'split' has been pinched up from comp.lang.functional. A year ago I had
a conversation there and someone posted this split function tailored to
my request: split "nil,2.23,3.34,nil" (-1.0) = [-1.0,2.23,3.34,-1.0]
'extractFloats' opens a file and applies split to every line and stores
the result into a list:
==
let split s c =
let rec loop start acc =
try
let next = String.index_from s start c in
let substring = String.sub s start (next-start) in
loop (next+1) (substring :: acc)
with
Not_found ->
let len = String.length s in
let substring = String.sub s start (len-start) in
List.rev (substring :: acc)
in loop 0 []
;;
let frob userval s =
match s with
| "n/a" -> userval
| "nil" -> userval
| _ -> float_of_string s
;;
let extractFloats file del nanProxy =
let rec readLoop i acc =
try
let line = input_line file in
let floatL = List.map (frob nanProxy) (split line del) in
readLoop (i+1) (floatL :: acc)
with
End_of_file ->
List.rev acc
in
readLoop 0 []
;;
let f = open_in "/home/gonzi/test.txt";;
let erg2 = extractFloats f ',' (-1.0);;
let rows = List.length erg2;;
rows;;
====
Enclosed also the Clean function. This version would be way more
readable than the Ocaml version. But I do not know how to translate it
to OCaml. My Clean function reads line after line and passes this
string-line on to RealsFromString. The latter function converts the
string-line to a char-list: [x\\x <-: string-line] and uses takeWhile,
toString, dropWhile and toReal in order to get the double numbers. As I
said the function is incredibly fast and takes for a 50MB file about 15
seconds.
Ocaml takes 8minutes. If I try to read the file line by line only
(without the conversion to double numbers) then Ocaml would take
about 1 minutes. Where is the bottleneck here? List.map or what?
I think everybody has one specific task which he tries to implement in
every programming language he encounters. My specific task is this
floating-point extraction from string-files.
I didn't play around with different OCaml solutions, because I
had to play a bit with OCaml's Psilab implementation (if you need
something like Python+Numeric+Dislin you could give Psilab a try).
If you need the whole Clean program drop me a note. By the way: my
Scheme version is clumsy and is more or less similar to the OCaml
version. I wrote this verbose Scheme (Bigloo) version a year ago when I
was a beginner of Scheme. The performance of the Scheme (Bigloo) version
is about 30 seconds for this 50MB file and is therefore similar to the
C++-template version which takes about 30 seconds.
Oh yes: do not draw to close out a comment when I write "clumsy" which
implies OCaml is clumsy too; I have the strong believing that OCaml's
exception handling mechanism is more or less better than Clean's one
because Clean does not posses such a thing as exception handling, so to
speak.
S. Gonzi
====
////////////////////////////////////////////////
// The dead as Latin functional language
// whith the most readable syntax out there
// and one of the /fastest functional languages/:
// Clean (In the meantime open(source?)
// for Linux/Unix). But as life plays:
// nobody jumps onto the Clean-bandwagon. Is this
// a pity or a bless? Why doesn't the "most"
// readable syntax plays a role in real life?
// Do not get me wrong, but why does always the
// "punctuation syntax" win in real life?
////////////////////////////////////////////////
FExtractReals:: HeaderKeys File -> [[Real]]
FExtractReals h file
| sfend file = []
# (line,nextline) = sfreadline file
= [(RealsFromString line h.del h.nan h.nanProxy) :
(FExtractReals h nextline)]
RealsFromString:: String Char String Real -> [Real]
RealsFromString line del nan nanProxy= searchDel [x\\x<-:line]
where
searchDel:: [Char] -> [Real]
searchDel [] = []
searchDel linerest
# val = toString( takeWhile notDelNl linerest )
# rest = dropWhile ((<>)del) linerest
= [toRealNaN val nan : searchDel (drop 1 rest)]
notDelNl::Char -> Bool
notDelNl x
| x==del = False
| x==' ' = False
| x=='\t' = False
| x=='\n' = False
= True
toRealNaN:: String String -> Real
toRealNaN s nan
| s==nan = nanProxy
= toReal(s)
====
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2003-05-21 7:23 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ocaml@tagger.yapper.org>
2003-03-31 16:51 ` [Caml-list] How can I check for the use of polymorphic equality? Neel Krishnaswami
2003-03-31 17:33 ` brogoff
2003-04-03 19:44 ` Jason Hickey
2003-04-03 20:40 ` Pierre Weis
2003-04-03 20:53 ` Chris Hecker
2003-04-04 8:46 ` Pierre Weis
2003-04-04 19:05 ` Jason Hickey
2003-04-04 9:10 ` Andreas Rossberg
2003-05-14 11:43 ` [Caml-list] ocaml and large development projects Traudt, Mark
2003-05-14 15:52 ` Jason Hickey
2003-05-18 5:32 ` Chris Hecker
2003-05-18 5:44 ` David Brown
2003-05-18 6:10 ` Chris Hecker
2003-05-18 11:13 ` John Carr
2003-05-18 16:51 ` Ed L Cashin
2003-05-18 18:08 ` Lex Stein
2003-05-18 19:08 ` Ed L Cashin
2003-05-18 19:55 ` Lex Stein
2003-05-19 8:13 ` Markus Mottl
2003-05-19 8:33 ` Nicolas Cannasse
2003-06-02 21:59 ` John Max Skaller
2003-05-18 23:19 ` Chris Hecker
2003-05-18 14:38 ` David Brown
2003-05-18 16:00 ` Ville-Pertti Keinonen
2003-05-19 15:36 ` Brian Hurt
2003-05-19 19:31 ` Chris Hecker
2003-05-19 23:39 ` Seth Kurtzberg
2003-05-20 8:07 ` [Caml-list] ocaml as *.so (was: ...and large development projects) Wolfgang Müller
2003-05-20 8:42 ` [Caml-list] Reading a file Siegfried Gonzi
2003-05-20 10:21 ` Mattias Waldau
2003-05-20 10:48 ` Nicolas Cannasse
2003-05-20 10:55 ` Markus Mottl
2003-05-20 13:20 ` Michal Moskal
2003-05-20 12:21 ` Siegfried Gonzi
2003-05-21 6:11 ` Siegfried Gonzi [this message]
2003-05-21 6:48 ` Siegfried Gonzi
2003-05-21 6:53 ` Siegfried Gonzi
2003-05-21 9:16 ` Markus Mottl
2003-05-21 10:04 ` Eray Ozkural
2003-05-21 16:20 ` brogoff
2003-05-21 8:21 ` Michal Moskal
2003-05-21 7:24 ` [Caml-list] PsiLAB works fine under Linux SuSE 8 Siegfried Gonzi
2003-05-21 9:11 ` [Caml-list] Reading a file Markus Mottl
2003-05-22 6:27 ` Siegfried Gonzi
2003-05-22 10:26 ` Markus Mottl
2003-05-23 5:59 ` Siegfried Gonzi
2003-05-23 6:04 ` Siegfried Gonzi
2003-05-20 10:45 ` [Caml-list] ocaml as *.so (was: ...and large development projects) Nicolas Cannasse
2003-05-20 11:17 ` Wolfgang Müller
2003-05-20 11:31 ` Nicolas Cannasse
2003-05-20 11:40 ` Wolfgang Müller
2003-06-02 22:40 ` John Max Skaller
2003-06-03 13:26 ` [Caml-list] ocaml as *.so Remi Vanicat
2003-06-02 22:42 ` [Caml-list] ocaml and large development projects John Max Skaller
2003-06-02 21:24 ` John Max Skaller
2003-06-02 21:12 ` John Max Skaller
2003-06-03 0:31 ` Chris Hecker
2003-06-03 10:13 ` Michal Moskal
2003-06-03 18:12 ` Chris Hecker
2003-06-03 14:31 ` art yerkes
2003-06-03 21:55 ` Jason Hickey
2003-06-03 22:42 ` Chris Hecker
2003-06-06 23:46 ` John Max Skaller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3ECB189C.5090400@stud.uni-graz.at \
--to=siegfried.gonzi@stud.uni-graz.at \
--cc=caml-list@inria.fr \
--cc=malekith@pld-linux.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox