From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q2INuh8R030747 for ; Mon, 19 Mar 2012 00:56:43 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ai4DAOx1Zk/AbSoIe2dsb2JhbABDFoUxsQUiAQEWJgQjggkBAQUjVhALCQ8CAiYCAhQYMYgdBAesEYknE4EcjjczYwSOAIdnkxA X-IronPort-AV: E=Sophos;i="4.73,608,1325458800"; d="scan'208";a="150050370" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Mar 2012 00:56:37 +0100 X-Envelope-From: oliver@first.in-berlin.de Received: from first (e178029242.adsl.alicedsl.de [85.178.29.242]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id q2INuZIM001218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 19 Mar 2012 00:56:36 +0100 Received: by first (Postfix, from userid 1000) id C38AD1540144; Mon, 19 Mar 2012 00:56:35 +0100 (CET) Date: Mon, 19 Mar 2012 00:56:35 +0100 From: oliver To: Philippe Veber Cc: caml users Message-ID: <20120318235635.GA11051@siouxsie> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 Subject: Re: [Caml-list] Efficient scanning of large strings from files On Fri, Mar 16, 2012 at 02:03:38PM +0100, Philippe Veber wrote: > Dear camlers, > > Say that you'd like to search a regexp on a file with lines so long that > you'd rather not load them entirely at once. If you can bound the size of a > match by k << length of a line, then you know that you can only keep a > small portion of the line in memory to search the regexp. Typically you'd > like to access substrings of size k from left to right. I guess such a > thing should involve buffered inputs and avoid copying strings as much as > possible. My question is as follows: has anybody written a library to > access these substrings gracefully and with decent performance? > Cheers, > Philippe. To your question of such a library: I don't know such a lib. I wonder if your lines would fill some GB or RAM...?! Not sure if it matches your question, but if there is no such lib, you maybe want to implement the Regexp-serach by yourself...?! ==> http://swtch.com/~rsc/regexp/regexp1.html For fast input the Buffe Module is really a performance boost, compared to normal string-appending operations. ==> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Buffer.html Ciao, Oliver