* [Caml-list] Better option to read a file
@ 2004-03-16 21:28 Agustín Valverde
2004-03-17 3:48 ` Pietro Abate
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Agustín Valverde @ 2004-03-16 21:28 UTC (permalink / raw)
To: caml-list
Hi
In a program I need to read the input data from a file and I have
written several options. I want to obtain the string of characters from
a text file and I don't know what is the better option. Among others, I
have written the following:
First option:
let leer_file fl =
let form = ref "" in
let arch = open_in fl in
let long = in_channel_length arch in
form := String.create (long-1);
really_input arch (!form) 0 (long-1);
close_in arch;
!form;;
Second:
let rec unir c ac = unir ac^(Char.escaped c);;
let leer2 fl =
let form = ref "" in
let c = ref '-' in
let arch = open_in fl in
(try
(while true do (c := input_char arch); (if !c != '\n' then (form
:= unir !c !form) else ()) done)
with End_of_file -> close_in arch);
!form;;
I also have a parser to convert the string, could I to improve these
functions merging them with the parser in some way?
Thanks for your help
Agustín Valverde
Department of Applied Mathematics
University of Malaga, Spain
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-16 21:28 [Caml-list] Better option to read a file Agustín Valverde
@ 2004-03-17 3:48 ` Pietro Abate
2004-03-17 7:31 ` Christoph Bauer
2004-03-17 8:22 ` Jean-Christophe Filliatre
2 siblings, 0 replies; 10+ messages in thread
From: Pietro Abate @ 2004-03-17 3:48 UTC (permalink / raw)
To: caml-list
what's about ....
let file_ch = (open_in input_file) in
let read_lines =
let read_new_line n =
try Some (input_line file_ch)
with End_of_file -> None
in
Stream.from read_new_line
in
let rec get_line () =
match Stream.next read_lines with
|s when Str.string_match (Str.regexp "^[\n\t ]+$") s 0 -> get_line ()
|s -> s
in
p
On Tue, Mar 16, 2004 at 10:28:37PM +0100, Agust?n Valverde wrote:
> Hi
>
> In a program I need to read the input data from a file and I have
> written several options. I want to obtain the string of characters from
> a text file and I don't know what is the better option. Among others, I
> have written the following:
>
> First option:
>
> let leer_file fl =
> let form = ref "" in
> let arch = open_in fl in
> let long = in_channel_length arch in
> form := String.create (long-1);
> really_input arch (!form) 0 (long-1);
> close_in arch;
> !form;;
>
> Second:
>
> let rec unir c ac = unir ac^(Char.escaped c);;
>
> let leer2 fl =
> let form = ref "" in
> let c = ref '-' in
> let arch = open_in fl in
> (try
> (while true do (c := input_char arch); (if !c != '\n' then (form
> := unir !c !form) else ()) done)
> with End_of_file -> close_in arch);
> !form;;
>
> I also have a parser to convert the string, could I to improve these
> functions merging them with the parser in some way?
>
> Thanks for your help
>
> Agust?n Valverde
> Department of Applied Mathematics
> University of Malaga, Spain
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
--
++ If you have an apple and I have an apple and we exchange apples
then you and I will still each have one apple. But if you have
an idea and I have an idea and we exchange these ideas, then each
of us will have two ideas. -- George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-16 21:28 [Caml-list] Better option to read a file Agustín Valverde
2004-03-17 3:48 ` Pietro Abate
@ 2004-03-17 7:31 ` Christoph Bauer
2004-03-17 16:42 ` Agustin Valverde Ramos
2004-03-17 8:22 ` Jean-Christophe Filliatre
2 siblings, 1 reply; 10+ messages in thread
From: Christoph Bauer @ 2004-03-17 7:31 UTC (permalink / raw)
To: OCaml List
Hi Agustín,
I'm not quite sure, what you want. `leer' reads a whole file (this
whitout a upper limit on `long' a bad idea), `unir' doesn't work and
`leer2' is `read the first line of file and don't close the file'.
(Do you mean `let leer2 = input_line'?)
Because of `leer2', I assume you have a parser operating on lines. As
Pietro I suggest the use of streams. A slightly more general approach
could improve the reuse of your code.
A file is a stream of chars but mostly you want to access whole lines
or words. Therefore we write this three functions:
let splitted_stream ~on_words char_stream =
let buf = Buffer.create 256 in
let this_string () =
let str = Buffer.contents buf in
Buffer.reset buf; str in
let rec accumulate =
parser
[< ''\n'; rest >] -> [< '(this_string ()); remove_cr rest >]
| [< '' ' | '\t' as c; rest >] ->
if on_words then [< '(this_string ()); accumulate rest >]
else (
Buffer.add_char buf c;
accumulate rest
)
| [< 'c; rest >] ->
Buffer.add_char buf c;
accumulate rest
| [< >] -> [< >]
and remove_cr =
parser
[< ''\r'; rest >] -> accumulate rest
| [< rest >] -> accumulate rest
in accumulate char_stream
let line_stream = splitted_stream ~on_words:false
let word_stream = splitted_stream ~on_words:true
Please note, that the input Stream could be obtained by Stream.of_channel,
Stream.of_string or Stream.from. The next function should apply your parser.
let rec stream_map f =
parser
[< 'a; rest >] -> [< '(f a); iter f rest >]
| [< >] -> [< >]
The functions so far should be put in a generic library.
Assummed your parser is a function
parse_line: string -> 'a
then you can get the desired 'a Stream with
let astream = stream_map parse_line (line_stream (Stream.of_channel (open_in fl)))
Regards,
Christoph Bauer
> Hi
>
> In a program I need to read the input data from a file and I have
> written several options. I want to obtain the string of characters
> from a text file and I don't know what is the better option. Among
> others, I have written the following:
>
> First option:
>
> let leer_file fl =
> let form = ref "" in
> let arch = open_in fl in
> let long = in_channel_length arch in
> form := String.create (long-1);
> really_input arch (!form) 0 (long-1);
> close_in arch;
> !form;;
>
> Second:
>
> let rec unir c ac = unir ac^(Char.escaped c);;
>
> let leer2 fl =
> let form = ref "" in
> let c = ref '-' in
> let arch = open_in fl in
> (try
> (while true do (c := input_char arch); (if !c != '\n' then
> (form := unir !c !form) else ()) done)
> with End_of_file -> close_in arch);
> !form;;
>
> I also have a parser to convert the string, could I to improve these
> functions merging them with the parser in some way?
>
> Thanks for your help
>
> Agustín Valverde
> Department of Applied Mathematics
> University of Malaga, Spain
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
--
beginfig(1)u=3cm;draw fullcircle scaled 2u;x0=x1=y1=x2=y3=0;-y0=y2=x3=1u;
filldraw z0..{left}z1{left}..z2{curl 1}..z3..z0..cycle;def t(expr p)=fullcircle
scaled .25u shifted(0,p*u);enddef;unfill t(.5);fill t(-.5);endfig;bye
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-17 7:31 ` Christoph Bauer
@ 2004-03-17 16:42 ` Agustin Valverde Ramos
2004-03-17 17:46 ` Markus Mottl
0 siblings, 1 reply; 10+ messages in thread
From: Agustin Valverde Ramos @ 2004-03-17 16:42 UTC (permalink / raw)
To: caml-list
El 17/03/2004, a las 8:31, Christoph Bauer escribió:
> I'm not quite sure, what you want. `leer' reads a whole file (this
> whitout a upper limit on `long' a bad idea),
The unlimited long is the unique problem or is there another drawbacks?
> `unir' doesn't work and
Yes, I wrote bad the definition in the email (let rec unir c ac =
ac^(Char.escaped c);;), but I have understood that, any case, this is a
bad idea.
> `leer2' is `read the first line of file and don't close the file'.
> (Do you mean `let leer2 = input_line'?)
>
> Because of `leer2', I assume you have a parser operating on lines.
No, my parser works over all the file content. I want to read formulas
like the following:
((p | (r -> t)) &
(q | (t -> s)))
-> ((p & -(q -> -t)) |
(r -> ((q -> (s | r)) & s)))
For example, I have a file of 752Kb with 770028 characters and I forget
the newline char to reduce the size of the resulting string. So I don't
know if I can apply the Chirstoph and Pietro ideas directly.
I think that the Markus suggestion is better for me, because I have
never worked with ocamllex. By the way, can I obtain benefits in
efficiency using ocamllex? because in this case I'll learn to use it.
Thanks for all answers
*******************************
* Agustín Valverde Ramos
* Dept. Matemática Aplicada
* E.T.S. de Ingeniería Informática
* Universidad de Málaga
* Campus de Teatinos
* 29071 Málaga (España)
* ---------------------------------
* Tel: (+34) 952132878
* Fax: (+34) 952132746
* mailto:a_valverde@ctima.uma.es
* http://www.AgustinValverde.com
* ---------------------------------
* Soy miembro de GIMAC:
* "Grupo de Investigación
* en Matemática Aplicada para la Computación"
* http://batllo.informatica.uma.es/aciego/gimac-home.html
*
* I am member of GIMAC:
* "Research Group in Applied Mathematics for
* Computer Science"
* http://batllo.informatica.uma.es/aciego/gimac-home-eng.html
*******************************
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-17 16:42 ` Agustin Valverde Ramos
@ 2004-03-17 17:46 ` Markus Mottl
2004-03-17 18:20 ` Agustin Valverde Ramos
0 siblings, 1 reply; 10+ messages in thread
From: Markus Mottl @ 2004-03-17 17:46 UTC (permalink / raw)
To: Agustin Valverde Ramos; +Cc: caml-list
On Wed, 17 Mar 2004, Agustin Valverde Ramos wrote:
> No, my parser works over all the file content. I want to read formulas
> like the following:
[snip]
> I think that the Markus suggestion is better for me, because I have
> never worked with ocamllex. By the way, can I obtain benefits in
> efficiency using ocamllex? because in this case I'll learn to use it.
Please do yourself a favor, save a lot of work and use ocamllex and
ocamlyacc. These tools are the Right Thing (tm) for the job. Since I had
a very similar project handy, here are the files you need to get going:
file: ast.ml
Contains abstract syntax tree and pretty printer for logical expressions
---------------------------------------------------------------------------
open Format
type expr =
| Id of string
| Not of expr
| And of expr * expr
| Or of expr * expr
| Imp of expr * expr
let rec pp_expr ppf = function
| Id id -> pp_print_string ppf id
| Not e -> fprintf ppf "-%a" pp_expr e
| And (e1, e2) -> fprintf ppf "(@[%a &@ %a@])" pp_expr e1 pp_expr e2
| Or (e1, e2) -> fprintf ppf "(@[%a |@ %a@])" pp_expr e1 pp_expr e2
| Imp (e1, e2) -> fprintf ppf "(@[%a ->@ %a@])" pp_expr e1 pp_expr e2
---------------------------------------------------------------------------
file: parser.mly
Contains the translator: tokens -> abstract syntax tree
---------------------------------------------------------------------------
%token <string> ID
%token NOT AND OR IMP LPAREN RPAREN EOF
%start main
%type <Ast.expr> main
%%
main : expr EOF { $1 }
expr
: ID { Ast.Id $1 }
| LPAREN expr bin_op expr RPAREN { $3 $2 $4 }
| NOT expr { Ast.Not $2 }
bin_op
: AND { fun arg1 arg2 -> Ast.And (arg1, arg2) }
| OR { fun arg1 arg2 -> Ast.Or (arg1, arg2) }
| IMP { fun arg1 arg2 -> Ast.Imp (arg1, arg2) }
---------------------------------------------------------------------------
file: lexer.mll
Contains the translator: string -> tokens
---------------------------------------------------------------------------
{ open Parser }
rule token = parse
| [' ' '\t' '\n'] { token lexbuf }
| ['a' - 'z']+ as id { ID id }
| '&' { AND }
| '|' { OR }
| "->" { IMP }
| "-" { NOT }
| '(' { LPAREN }
| ')' { RPAREN }
| eof { EOF }
{
let lexbuf = Lexing.from_channel stdin in
let ast = Parser.main token lexbuf in
Format.printf "%a@." Ast.pp_expr ast
}
---------------------------------------------------------------------------
file: Makefile
Requires OCamlMakefile
---------------------------------------------------------------------------
SOURCES = ast.ml parser.mly lexer.mll
RESULT = parse
include OCamlMakefile
---------------------------------------------------------------------------
Just type "make" and the resulting "parse"-program will read in your
logical expressions from stdin until EOF and pretty-print them (note:
the topmost expression also needs parenthesis!). E.g.:
file: test.dat
---------------------------------------------------------------------------
(
(
(p | (r -> t)) &
(q | (t -> s))
) ->
(
(p & -(q -> -t)) |
(r -> ((q -> (s | r)) & s))
)
)
---------------------------------------------------------------------------
Running "parse < test.dat" should yield:
(((p | (r -> t)) & (q | (t -> s))) ->
((p & -(q -> -t)) | (r -> ((q -> (s | r)) & s))))
Have fun!
Regards,
Markus
--
Markus Mottl http://www.oefai.at/~markus markus@oefai.at
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-16 21:28 [Caml-list] Better option to read a file Agustín Valverde
2004-03-17 3:48 ` Pietro Abate
2004-03-17 7:31 ` Christoph Bauer
@ 2004-03-17 8:22 ` Jean-Christophe Filliatre
2004-03-17 10:11 ` Markus Mottl
2 siblings, 1 reply; 10+ messages in thread
From: Jean-Christophe Filliatre @ 2004-03-17 8:22 UTC (permalink / raw)
To: Agustín Valverde; +Cc: caml-list
Agustín Valverde writes:
> Second:
>
> let rec unir c ac = unir ac^(Char.escaped c);;
>
> let leer2 fl =
> let form = ref "" in
> let c = ref '-' in
> let arch = open_in fl in
> (try
> (while true do (c := input_char arch); (if !c != '\n' then (form
> := unir !c !form) else ()) done)
> with End_of_file -> close_in arch);
> !form;;
Note that this function is very inefficient: you are indeed building a
lot of intermediate strings with "unir", resulting in a quadratic
space (even if the final string occupies linear space).
Using a buffer as suggested by Christoph is clearly better (the module
Buffer from ocaml standard library is doubling its internal string
buffer as needed, without you to worry about it).
> I also have a parser to convert the string, could I to improve these
> functions merging them with the parser in some way?
The use of ocamllex in combination with a buffer is both easy and
efficient. I sketch it:
======================================================================
{
open Lexing
let buf = Buffer.create 1024
}
rule read = parse
| "\\n" { Buffer.add_char buf '\n'; read lexbuf }
| "\\t" { Buffer.add_char buf '\t'; read lexbuf }
| "\\\\" { Buffer.add_char buf '\\'; read lexbuf }
| _ { Buffer.add_string buf (lexeme lexbuf); read lexbuf }
| eof { Buffer.contents buf }
{
let read_file f =
let cin = open_in f in
let lb = from_channel cin in
let s = read lb in
close_in cin;
s
}
======================================================================
--
Jean-Christophe Filliâtre (http://www.lri.fr/~filliatr)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Better option to read a file
2004-03-17 8:22 ` Jean-Christophe Filliatre
@ 2004-03-17 10:11 ` Markus Mottl
0 siblings, 0 replies; 10+ messages in thread
From: Markus Mottl @ 2004-03-17 10:11 UTC (permalink / raw)
To: OCaml
Hi,
I usually use one of the two functions below to read in whole strings.
Function "read_file" does the obvious: read a file as fast as possible
into a string.
Function "read_channel" reads a channel of unbounded size (as long as the
maximum string length is not exceeded, of course). It also takes the
optional argument "buf_size", which you can set depending on the kind
of channel you read from (the default 4096 bytes are somewhat optimal
when reading from files on Linux).
---------------------------------------------------------------------------
let rec copy_lst res ofs = function
| [] -> res
| (str, len) :: t ->
let pos = ofs - len in
String.unsafe_blit str 0 res pos len;
copy_lst res pos t
let read_channel ?(buf_size = 4096) =
let rec loop len lst ch =
let buf = String.create buf_size in
let n = input ch buf 0 buf_size in
if n <> 0 then loop (len + n) ((buf, n) :: lst) ch
else copy_lst (String.create len) len lst in
loop 0 []
let read_file name =
let file = open_in name in
let size = in_channel_length file in
try
let buf = String.create size in
really_input file buf 0 size;
close_in file;
buf
with exc ->
(try close_in file with _ -> ());
raise exc
---------------------------------------------------------------------------
Regards,
Markus Mottl
--
Markus Mottl http://www.oefai.at/~markus markus@oefai.at
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Caml-list] Better option to read a file
@ 2004-03-16 20:38 Agustín Valverde
0 siblings, 0 replies; 10+ messages in thread
From: Agustín Valverde @ 2004-03-16 20:38 UTC (permalink / raw)
To: caml-list
Hi
In a program I need to read the input data from a file and I have
written several options. I want to obtain the string of characters from
a text file and I don't know what is the better option. Among others, I
have written the following:
First option:
let leer_file fl =
let form = ref "" in
let arch = open_in fl in
let long = in_channel_length arch in
form := String.create (long-1);
really_input arch (!form) 0 (long-1);
close_in arch;
!form;;
Second:
let rec unir c ac = unir ac^(Char.escaped c);;
let leer2 fl =
let form = ref "" in
let c = ref '-' in
let arch = open_in fl in
(try
(while true do (c := input_char arch); (if !c != '\n' then (form
:= unir !c !form) else ()) done)
with End_of_file -> close_in arch);
!form;;
I also have a parser to convert the string, could I to improve these
functions merging them with the parser in some way?
Thanks for your help
Agustín Valverde
Department of Applied Mathematics
University of Malaga, Spain
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-03-17 18:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-16 21:28 [Caml-list] Better option to read a file Agustín Valverde
2004-03-17 3:48 ` Pietro Abate
2004-03-17 7:31 ` Christoph Bauer
2004-03-17 16:42 ` Agustin Valverde Ramos
2004-03-17 17:46 ` Markus Mottl
2004-03-17 18:20 ` Agustin Valverde Ramos
2004-03-17 18:54 ` Markus Mottl
2004-03-17 8:22 ` Jean-Christophe Filliatre
2004-03-17 10:11 ` Markus Mottl
-- strict thread matches above, loose matches on Subject: below --
2004-03-16 20:38 Agustín Valverde
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox