From: "François Pottier" <francois.pottier@inria.fr>
To: OCaML Mailing List <caml-list@inria.fr>, menhir-list <menhir@inria.fr>
Subject: [Caml-list] [ANN] New release of Menhir (20211230)
Date: Fri, 31 Dec 2021 09:22:49 +0100 [thread overview]
Message-ID: <82582cbe-09da-1410-0cce-f1ebf4f71294@inria.fr> (raw)
Dear OCaml & Menhir users,
I am pleased to announce a new release of Menhir, with a major improvement.
The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.
Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.
opam update
opam install menhir.20211230
Happy well-typed parsing in 2022!
--
François Pottier
francois.pottier@inria.fr
http://cambium.inria.fr/~fpottier/
## 2021/12/30
* The code back-end has been rewritten from the ground up by Émile Trotignon
and François Pottier, and now produces efficient and **well-typed** OCaml
code. The infamous `Obj.magic` is not used any more.
The table back-end and the Coq back-end are unaffected by this change.
The main side effects of this change are as follows:
- The code back-end now needs type information. This means that
*either* Menhir's type inference mechanism must be enabled
(the easiest way of enabling it is to use Menhir via `dune`
and to check that the `dune-project` file says
`(using menhir 2.0)` or later)
*or* the type of every nonterminal symbol must be
explicitly given via a `%type` declaration.
- The code back-end no longer allows the type of any symbol to be an
open polymorphic variant type, such as ```[> `A ]```. As a workaround,
we suggest using a closed polymorphic variant instead.
- The code back-end now adheres to the *simplified* error-handling
strategy,
as opposed to the *legacy* strategy.
For grammars that do *not* use the `error` token, this makes no
difference.
For grammars that use the `error` token in the limited way permitted by
the simplified strategy, this makes no difference either. The
simplified
strategy makes the following requirement: the `error` token should
always
appear at the end of a production, whose semantic action should
abort the
parser by raising an exception.
Grammars that make more complex use of the `error` token, and therefore
need the `legacy` strategy, cannot be compiled by the new code
back-end.
As a workaround, it is possible to switch to the table back-end (using
`--table --strategy legacy`) or to the ancient code back-end (using
`--code-ancient`). **In the long run, we recommend abandoning the
use of
the `error` token**. Support for the `error` token may be removed
entirely at some point in the future.
The original code back-end, which has been around since the early days of
Menhir (2005), temporarily remains available (using `--code-ancient`). It
will be removed at some point in the future.
The new code back-end offers several levels of optimization, which remain
undocumented and are subject to change in the future. At present, the
main
levels are roughly as follows:
- `-O 0 --represent-everything` uses a uniform representation of the
stack
and produces straightforward code.
- `-O 0` uses a non-uniform representation of the stack; some stack cells
have fewer fields; some stack cells disappear altogether.
- `-O 1` reduces memory traffic by moving `PUSH` operations so that they
meet `POP` operations and cancel out.
- `-O 2` optimizes the reduction of unit productions (that is,
productions
whose right-hand side has length 1) by performing a limited amount of
code specialization.
The default level of optimization is the maximum level, `-O 2`.
* The new command line switch `--exn-carries-state` causes the exception
`Error` to carry an integer parameter: `exception Error of int`. When the
parser detects a syntax error, the number of the current state is
reported
in this way. This allows the caller to select a suitable syntax error
message, along the lines described in
[Section 11](http://cambium.inria.fr/~fpottier/menhir/manual.html#sec68)
of the manual. This command line switch is currently supported by the
code
back-end only.
* The `$syntaxerror` keyword is no longer supported.
* Document the trick of wrapping module aliases in `open struct ... end`,
like this: `%{ open struct module alias M = MyLongModuleName end %}`.
This allows you to use the short name `M` in your grammar, but forces
OCaml to infer types that refer to the long name `MyLongModuleName`.
(Suggested by Frédéric Bour.)
reply other threads:[~2021-12-31 8:22 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82582cbe-09da-1410-0cce-f1ebf4f71294@inria.fr \
--to=francois.pottier@inria.fr \
--cc=caml-list@inria.fr \
--cc=menhir@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox