From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 119F8BC69 for ; Tue, 10 Apr 2007 15:03:47 +0200 (CEST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3AD3koY005791 for ; Tue, 10 Apr 2007 15:03:46 +0200 Received: from [84.59.96.61] (helo=gate.lan.gerd-stolpmann.de) by mrelayeu.kundenserver.de (node=mrelayeu7) with ESMTP (Nemesis), id 0ML2xA-1HbG0g0m5j-0007V3; Tue, 10 Apr 2007 15:03:46 +0200 Received: from flakew.lan.gerd-stolpmann.de (fw.lan.gerd-stolpmann.de [192.168.1.1]) by gate.lan.gerd-stolpmann.de (Postfix) with ESMTP id E5862C094; Tue, 10 Apr 2007 15:03:45 +0200 (CEST) Subject: Re: [Caml-list] Variable encoding with netulex From: Gerd Stolpmann To: Till Varoquaux Cc: ocaml ml In-Reply-To: <9d3ec8300704100516x4ea11e69tf871bb5aefd491c@mail.gmail.com> References: <9d3ec8300704100516x4ea11e69tf871bb5aefd491c@mail.gmail.com> Content-Type: text/plain Date: Tue, 10 Apr 2007 15:03:43 +0200 Message-Id: <1176210224.8988.30.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX1+xwQAaFkL71JfyMeboNCEfPcNQpv9+LqLucmm nxEm5UmX/n7mfguMKTbZTdArcMmCIyocRkBrp1QUgvshYxcWGp D7qx2HS5uYcCHARG6w34NFAPFQBgFMt X-Miltered: at discorde with ID 461B8B32.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; gerd:01 stolpmann:01 buffer:01 gerd:01 stolpmann:01 viktoriastr:01 64293:01 darmstadt:01 6151:01 6151:01 dienstag:98 behaviour:01 caml-list:01 emulate:01 motivation:02 Am Dienstag, den 10.04.2007, 14:16 +0200 schrieb Till Varoquaux: > I would like to parse a file where the encoding could vary on the fly. > Using Ulex that was quite easy (using form var_enc_channel). What > would be the cannonical way to emulate such a behaviour with netulex? I would say this is not possible. Netulex always reads characters in advance (to improve performance). There is a set_encoding function, but it only affects the next refill (after all read characters are scanned), but it is almost impossible to control when this happens. As the read-ahead buffer was the main motivation to write Netulex, my advice is to stick to plain Ulex. Buffers and encoding changes are hard to get right at the same time. Gerd -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------