From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q2GD3vai029355
	for <caml-list@sympa-roc.inria.fr>; Fri, 16 Mar 2012 14:04:00 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AnUCAPM4Y0/RVdy2mGdsb2JhbABCti0IIgEBAQEBCAkNBxQnggAFHQIsARseAxIIAQddAREBBQEiNYUmgjASmVKCXQqMEoJxhRw/iHQBBQuQcgSVZo5IPYQJ
X-IronPort-AV: E=Sophos;i="4.73,597,1325458800"; 
   d="scan'208";a="149775395"
Received: from mail-vx0-f182.google.com ([209.85.220.182])
  by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 16 Mar 2012 14:03:59 +0100
Received: by vcmm1 with SMTP id m1so8315927vcm.27
        for <caml-list@inria.fr>; Fri, 16 Mar 2012 06:03:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:from:date:message-id:subject:to:content-type;
        bh=nx1+QLY9wROYi04phP/Wlw6KWWEmMAhz90A3so8iXDo=;
        b=cOyIdZZU0ayJRI0WI2gVcupl7R58uXMh6DVR7nsiLB6+XBSdZ0+Vel8A5lmeglKsi+
         v+3+J3QHORDGZn7sfcLN24B5BjWopY4jiOkP/pwxHRIWNR9EGp+jUfb78CpSot9ZZ4rW
         ifzSCVWg797w4DWmsskvwx96dOM6N1Xv6Y7mqh6DLaP/niZxh0ukz2ntMVMW5Y5/xJeK
         A4drbT8xmijEtywOCyE5QxZ70uzybUAH+jSZpT3M9JNPobA5qmaTD6NJsEm4u0BW1/I+
         AGJo/a1f11RukANnFkxjEULa4O0Di9adn0icPH2rlOrHv7/x/IMyGZZovUiOSnhNcfM+
         EMLg==
Received: by 10.52.34.212 with SMTP id b20mr1521071vdj.3.1331903038787; Fri,
 16 Mar 2012 06:03:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.52.31.136 with HTTP; Fri, 16 Mar 2012 06:03:38 -0700 (PDT)
From: Philippe Veber <philippe.veber@gmail.com>
Date: Fri, 16 Mar 2012 14:03:38 +0100
Message-ID: <CAOOOohQV+nYWyTXFU-sKdLkJSNAAkZH+=UCkwKEpnxjAiE=ORg@mail.gmail.com>
To: caml users <caml-list@inria.fr>
Content-Type: multipart/alternative; boundary=20cf307812fe80907804bb5bd701
X-Validation-by: philippe.veber@gmail.com
Subject: [Caml-list] Efficient scanning of large strings from files


--20cf307812fe80907804bb5bd701
Content-Type: text/plain; charset=ISO-8859-1

Dear camlers,

Say that you'd like to search a regexp on a file with lines so long that
you'd rather not load them entirely at once. If you can bound the size of a
match by k << length of a line, then you know that you can only keep a
small portion of the line in memory to search the regexp. Typically you'd
like to access substrings of size k from left to right. I guess such a
thing should involve buffered inputs and avoid copying strings as much as
possible. My question is as follows: has anybody written a library to
access these substrings gracefully and with decent performance?
Cheers,
  Philippe.

--20cf307812fe80907804bb5bd701
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Dear camlers,<br><br>Say that you&#39;d like to search a regexp on a file w=
ith lines so long that you&#39;d rather not load them entirely at once. If =
you can bound the size of a match by k &lt;&lt; length of a line, then you =
know that you can only keep a small portion of the line in memory to search=
 the regexp. Typically you&#39;d like to access substrings of size k from l=
eft to right. I guess such a thing should involve buffered inputs and avoid=
 copying strings as much as possible. My question is as follows: has anybod=
y written a library to access these substrings gracefully and with decent p=
erformance?<br>

Cheers,<br>=A0 Philippe.<br><br>

--20cf307812fe80907804bb5bd701--