Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Dario Teixeira <dario.teixeira@nleyten.com>
To: caml-list@inria.fr
Subject: [Caml-list] Menhir grammar with sequences delimited by same token
Date: Sun, 08 May 2016 10:33:48 +0100	[thread overview]
Message-ID: <0a49598f1e0c8838fa69cd4d803af83d@nleyten.com> (raw)

Hi,

(Sending this to Caml-list because Menhir-list is currently down.)

I've come across an interesting parsing problem, one for which I
wonder if there is a succinct solution in Menhir.  Suppose I want
to parse a markup which uses the same token for delimiting *both*
the beginning and the termination of a bold sequence (and likewise
for an emph sequence).  Basically this:

   inline:
     | TEXT               {Ast.Text $1}
     | BOLD inline* BOLD  {Ast.Bold $2}
     | EMPH inline* EMPH  {Ast.Emph $2}


Which of course has a shift/reduce conflict: if the token stream is
[BOLD; TEXT; BOLD; ...], what should the parser do upon encountering
the second BOLD -- start a new nesting level, or close the current
one?  I can force the latter behaviour by rearranging the grammar
so that an inline sequence within BOLDs cannot contain BOLD itself,
and likewise for EMPH:

   inline:
     | TEXT                        {Ast.Text $1}
     | BOLD inline_sans_bold* BOLD {Ast.Bold $2}
     | EMPH inline_sans_emph* EMPH {Ast.Emph $2}

   inline_sans_bold:
     | TEXT                        {Ast.Text $1}
     | EMPH inline_sans_emph* EMPH {Ast.Emph $2}

   inline_sans_emph:
     | TEXT                        {Ast.Text $1}
     | BOLD inline_sans_bold* BOLD {Ast.Bold $2}


For this simple example this approach is feasible, but blows up
into silliness for a real-world case where besides BOLD and EMPH I
have many other similar tokens.  Does Menhir offer a more succinct
solution to this problem?  (I reckon using the priority mechanism
somehow, but exactly how eludes me.)

Thanks in advance for your time!
Best regards,
Dario Teixeira


             reply	other threads:[~2016-05-08  9:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-08  9:33 Dario Teixeira [this message]
2016-05-08 10:27 ` Jacques-Henri Jourdan
2016-05-08 11:57   ` Sébastien Hinderer
2016-05-08 14:16     ` Dario Teixeira
2016-05-08 13:43   ` Dario Teixeira
2016-05-08 21:29     ` Jacques-Henri Jourdan
2016-05-08 13:35 ` Allan Wegan
2016-05-08 14:19   ` Dario Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a49598f1e0c8838fa69cd4d803af83d@nleyten.com \
    --to=dario.teixeira@nleyten.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox