From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id DFD6A7EF1E for ; Fri, 22 Jul 2016 23:46:54 +0200 (CEST) IronPort-PHdr: 9a23:njiGWBdnYilN7TkybSq14qDZlGMj4u6mDksu8pMizoh2WeGdxc68ZR7h7PlgxGXEQZ/co6odzbGH6+a9BidZvM3JmUtBWaQEbwUCh8QSkl5oK+++Imq/EsTXaTcnFt9JTl5v8iLzG0FUHMHjew+a+SXqvnYsExnyfTB4Ov7yUtaLyZ/mj6bvpNaKPl4ArQH+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6Lpyv/JHBInzeaU1SYtymDI0N2F9sMHisxjOSU2F+3YaQGEXuhdSGQHZ7QjnU9H6sn2pmPB63XyhJczsSqt8dDCj6L9sVRvvk29TLDM98WbPjdFYg6dSoRbnrBt6ld2HKLqJPeZzK/uONegRQnBMC55c Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=seliopou@gmail.com; spf=Pass smtp.mailfrom=seliopou@gmail.com; spf=None smtp.helo=postmaster@mail-lf0-f50.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of seliopou@gmail.com) identity=pra; client-ip=209.85.215.50; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="seliopou@gmail.com"; x-sender="seliopou@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of seliopou@gmail.com designates 209.85.215.50 as permitted sender) identity=mailfrom; client-ip=209.85.215.50; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="seliopou@gmail.com"; x-sender="seliopou@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-lf0-f50.google.com) identity=helo; client-ip=209.85.215.50; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="seliopou@gmail.com"; x-sender="postmaster@mail-lf0-f50.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0APAwC0k5JXhjLXVdFehBUtTwatHYY6hQWBeyOFeQKBKwc4FAEBAQEBAQEBEQEBAQgLCwkZL4IyFYIWAQQBEhEEGQEbHQEDAQsGBQs3AgIiAREBBQEcBhMih3MBAw8IDp8jgTI+MYs7gWqCWgWEIAoZJw1UgyMBAQEBAQEEAQEBAQEBGQIBBRCGGoRNhG2CVIJaBZkmhhaIWI86jmISHoEPHoJSgXMgMohzAQEB X-IPAS-Result: A0APAwC0k5JXhjLXVdFehBUtTwatHYY6hQWBeyOFeQKBKwc4FAEBAQEBAQEBEQEBAQgLCwkZL4IyFYIWAQQBEhEEGQEbHQEDAQsGBQs3AgIiAREBBQEcBhMih3MBAw8IDp8jgTI+MYs7gWqCWgWEIAoZJw1UgyMBAQEBAQEEAQEBAQEBGQIBBRCGGoRNhG2CVIJaBZkmhhaIWI86jmISHoEPHoJSgXMgMohzAQEB X-IronPort-AV: E=Sophos;i="5.28,405,1464645600"; d="scan'208,217";a="227650568" Received: from mail-lf0-f50.google.com ([209.85.215.50]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 22 Jul 2016 23:46:54 +0200 Received: by mail-lf0-f50.google.com with SMTP id l69so93901670lfg.1 for ; Fri, 22 Jul 2016 14:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=2TQOKWC5GzvbSYbixQ4Wy+ofyLH42kjaC2KebgW3YYo=; b=D6xGL9q2o9KwUXsZqI9Mx59ciY8px2R5XjQYaQ7Ilna6dmP31ZkPkBggnWBcNu+3Qr dtux4uFsEXKA2LFYO7UeE8r9OIy0wMke49tx9bV4rZStjv+QYkSdjCjtspumqrYgnuST YzbJVGOTwepgUMoG5PUa0PY5U83k/M2b8/dTaRC+QsS1JIQd8J3tSV8S59UBh1+Kld1E p3mgbRgfavhivw321lZ7P+0ExNwJ0ZEQHQvjt5L1DulWf2xjS4PWWJO+5vZ4xqcDRHsD rHpcYbqtY3mP5ktbtWHw3kSHZpepU43Nt25xC4Hrvu5fcO0QITyICGjSEQlegs7nYhOJ SiVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=2TQOKWC5GzvbSYbixQ4Wy+ofyLH42kjaC2KebgW3YYo=; b=CB+f2ectIkhtACD8we5x7PP87RZTQ4Ez1jW8KGvZrMVhiXbGAThbDCs9ffg2ja0CTw bIg6CE+m0D3tWqK2BZ4v9P/H6kYhJkuFd+2anMl7o/VikZFy7FwdwJ0XR4RXNbEnFdEI FMeloe2V4DLYHVGzaC0VjxQCigd1LSJPV6mPDVA5DIgSxA8seLNNyLw0E62Wk13MLmyV igIj2x5Rd9IIqrWcxrJrasG6slOkdEijkrbdUlfmMkC/pnhN+VbVfF5QikwwcWK9ewy6 i+NuRkartD2hFTMDv1cO2rhgPr0eqt+/tB2iIud7Fl/fWqyNGL/NGJchzO/cHAtsqz9S OGeQ== X-Gm-Message-State: AEkoouu53VuNKwdTo2fhCHlblbLYiCUXsF+CC3m/bERIXqZuV7I8lYz3YxPpjHPq2rPwspN7eaPw1RPC2UpIvA== X-Received: by 10.46.32.29 with SMTP id g29mr3367916ljg.32.1469224013077; Fri, 22 Jul 2016 14:46:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.87.136 with HTTP; Fri, 22 Jul 2016 14:46:33 -0700 (PDT) In-Reply-To: References: From: Spiros Eliopoulos Date: Fri, 22 Jul 2016 17:46:33 -0400 Message-ID: To: =?UTF-8?Q?Daniel_B=C3=BCnzli?= Cc: OCaml Content-Type: multipart/alternative; boundary=001a1142b0186534f40538405fd2 Subject: Re: [Caml-list] ANN: angstrom --001a1142b0186534f40538405fd2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Jul 22, 2016 at 5:38 PM, Daniel B=C3=BCnzli wrote: > > For a high-level comparison of Angstrom's features to other > parser-combinator libraries, see the table included in the README: > > > > https://github.com/inhabitedtype/angstrom#comparison-to-other-libraries > Do you have a story for precise line-column and byte count tracking ? It's > quite important in practice to be able to give good error reports. > Angstrom's targeting the use case of network protocols and serialization formats, a use case where line/column numbers are of dubious value, and doesn't really make sense when dealing with binary formats. So it will likely never support line/column tracking. It will however at some point report character position on failure. I have a local branch that implements it, though it's untested. > Also does the API easily allow to do best-effort decoding (after reporting > an error allow to resync input by discarding and restart the parsing > process) ? =46rom what I understand, this would require users to modify input rather than putting any correction logic into the parser itself. Angstrom does not support this functionality, and likely won't. In principle the only change necessary would be to simply surface the success continuation on failure. Everything else is accessible to the user (except for failure position, see above). Why it's valuable to do this outside of the parser is unclear to me though. > Yojson wins hands down (it benefits greatly from not having to support > non-blocking incremental input), > > I guess it also benefits of not implementing the standard at all, e.g. it > won't check the validity of the underlying character stream. > > Also regarding benchmarks it would be more interesting to have benchmarks > on real world examples where you convert json input to concrete ocaml data > types. E.g. it would be cool if you could provide a jsonm-like streaming > interface with angstrom and then use angstrom's combinators to parse the > stream of json lexeme into OCaml data structures. Doing that would seem to muddle application and library performance measurements within the benchmark. Arguably, constructing a generic in-memory representation is doing the same in essence. At least this way it's an "application" that's easy for benchmarks to standardize on (more or less) and implement, so that one can use the benchmark results to compare different libraries. But anyways, there's nothing in principle preventing it from happening. There parser would look something like this: skip_many (token >>| (handler : json_token -> unit)) -Spiros E. --001a1142b0186534f40538405fd2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On Fri, Jul 22, 2016 at 5:38 PM, Daniel B=C3=BCnzli <<= a href=3D"mailto:daniel.buenzli@erratique.ch" target=3D"_blank">daniel.buen= zli@erratique.ch> wrote:
> For a high-level comparison of Angstrom's features to other pa= rser-combinator libraries, see the table included in the README:
>
> https://github.com/inhab= itedtype/angstrom#comparison-to-other-libraries
Do you have a story for precise line-column and byte count tracking = ? It's quite important in practice to be able to give good error report= s.

Angstrom's targeting the use cas= e of network protocols and serialization formats, a use case where line/col= umn numbers are of dubious value, and doesn't really make sense when de= aling with binary formats. So it will likely never support line/column trac= king. It will however at some point report character position on failure. I= have a local branch that implements it, though it's untested.
=C2=A0
Also does the API easily allow to do best-effort decoding (after reporting = an error allow to resync input by discarding and restart the parsing proces= s) ?

From what I understand, this would req= uire users to modify input rather than putting any correction logic into th= e parser itself. Angstrom does not support this functionality, and likely w= on't. In principle the only change necessary would be to simply surface= the success continuation on failure. Everything else is accessible to the = user (except for failure position, see above). Why it's valuable to do = this outside of the parser is unclear to me though.

> Yojson wins hands down (it benefits greatly from not having to support= non-blocking incremental input),

I guess it also benefits of not implementing the standard at all, e.= g. it won't check the validity of the underlying character stream.

Also regarding benchmarks it would be more interesting to have benchmarks o= n real world examples where you convert json input to concrete ocaml data t= ypes. E.g. it would be cool if you could provide a jsonm-like streaming int= erface with angstrom and then use angstrom's combinators to parse the s= tream of json lexeme into OCaml data structures.

Doing that would seem to muddle application and library performance m= easurements within the benchmark. Arguably, constructing a generic in-memor= y representation is doing the same in essence. At least this way it's a= n "application" that's easy for benchmarks to standardize on = (more or less) and implement, so that one can use the benchmark results to = compare different libraries.

But anyways, there= 9;s nothing in principle preventing it from happening. There parser would l= ook something like this:

=C2=A0 skip_many (token &= gt;>| (handler : json_token -> unit))

-Spiro= s E.
--001a1142b0186534f40538405fd2--