Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: oliver <oliver@first.in-berlin.de>
To: Philippe Veber <philippe.veber@gmail.com>
Cc: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] Efficient scanning of large strings from files
Date: Mon, 19 Mar 2012 00:56:35 +0100	[thread overview]
Message-ID: <20120318235635.GA11051@siouxsie> (raw)
In-Reply-To: <CAOOOohQV+nYWyTXFU-sKdLkJSNAAkZH+=UCkwKEpnxjAiE=ORg@mail.gmail.com>

On Fri, Mar 16, 2012 at 02:03:38PM +0100, Philippe Veber wrote:
> Dear camlers,
> 
> Say that you'd like to search a regexp on a file with lines so long that
> you'd rather not load them entirely at once. If you can bound the size of a
> match by k << length of a line, then you know that you can only keep a
> small portion of the line in memory to search the regexp. Typically you'd
> like to access substrings of size k from left to right. I guess such a
> thing should involve buffered inputs and avoid copying strings as much as
> possible. My question is as follows: has anybody written a library to
> access these substrings gracefully and with decent performance?
> Cheers,
>   Philippe.

To your question of such a library: I don't know such a lib.
I wonder if your lines would fill some GB or RAM...?!

Not sure if it matches your question, but if there is no such lib,
you maybe want to implement the Regexp-serach by yourself...?!

==> http://swtch.com/~rsc/regexp/regexp1.html


For fast input the Buffe Module is really a performance boost,
compared to normal string-appending operations.

==> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Buffer.html


Ciao,
   Oliver

      parent reply	other threads:[~2012-03-18 23:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-16 13:03 Philippe Veber
2012-03-16 14:14 ` Edgar Friendly
2012-03-16 14:48   ` Philippe Veber
2012-03-16 17:02     ` Edgar Friendly
2012-03-19  9:08       ` Philippe Veber
2012-03-19 13:44         ` Edgar Friendly
2012-03-21  7:21           ` Philippe Veber
2012-03-16 17:23   ` Francois????Charles Matthieu????Berenger
2012-03-17 16:53     ` oliver
2012-03-19  9:08     ` Philippe Veber
2012-03-16 14:49 ` Jérémie Dimino
2012-03-18 21:11   ` Török Edwin
2012-03-19  9:11     ` Philippe Veber
2012-03-16 20:11 ` oliver
2012-03-18 23:56 ` oliver [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120318235635.GA11051@siouxsie \
    --to=oliver@first.in-berlin.de \
    --cc=caml-list@inria.fr \
    --cc=philippe.veber@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox