From: oliver <oliver@first.in-berlin.de>
To: Philippe Veber <philippe.veber@gmail.com>
Cc: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] Efficient scanning of large strings from files
Date: Fri, 16 Mar 2012 21:11:23 +0100 [thread overview]
Message-ID: <20120316201123.GC21643@siouxsie> (raw)
In-Reply-To: <CAOOOohQV+nYWyTXFU-sKdLkJSNAAkZH+=UCkwKEpnxjAiE=ORg@mail.gmail.com>
On Fri, Mar 16, 2012 at 02:03:38PM +0100, Philippe Veber wrote:
> Dear camlers,
>
> Say that you'd like to search a regexp on a file with lines so long that
> you'd rather not load them entirely at once. If you can bound the size of a
> match by k << length of a line, then you know that you can only keep a
> small portion of the line in memory to search the regexp. Typically you'd
> like to access substrings of size k from left to right. I guess such a
> thing should involve buffered inputs and avoid copying strings as much as
> possible. My question is as follows: has anybody written a library to
> access these substrings gracefully and with decent performance?
> Cheers,
> Philippe.
[...]
I think, the regexp itself also has an impact on
how fast and/or how easy this can be achieved.
The more complex the Regexp, the more ressources
you will need.
If you can make your regexp becoming boult down to something
easy parseable, the length of lines might be of no importance.
Ciao,
Oliver
next prev parent reply other threads:[~2012-03-16 20:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-16 13:03 Philippe Veber
2012-03-16 14:14 ` Edgar Friendly
2012-03-16 14:48 ` Philippe Veber
2012-03-16 17:02 ` Edgar Friendly
2012-03-19 9:08 ` Philippe Veber
2012-03-19 13:44 ` Edgar Friendly
2012-03-21 7:21 ` Philippe Veber
2012-03-16 17:23 ` Francois????Charles Matthieu????Berenger
2012-03-17 16:53 ` oliver
2012-03-19 9:08 ` Philippe Veber
2012-03-16 14:49 ` Jérémie Dimino
2012-03-18 21:11 ` Török Edwin
2012-03-19 9:11 ` Philippe Veber
2012-03-16 20:11 ` oliver [this message]
2012-03-18 23:56 ` oliver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120316201123.GC21643@siouxsie \
--to=oliver@first.in-berlin.de \
--cc=caml-list@inria.fr \
--cc=philippe.veber@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox