* [Caml-list] [ANN] New release of Menhir (20211230)
@ 2021-12-31 8:22 François Pottier
0 siblings, 0 replies; only message in thread
From: François Pottier @ 2021-12-31 8:22 UTC (permalink / raw)
To: OCaML Mailing List, menhir-list
Dear OCaml & Menhir users,
I am pleased to announce a new release of Menhir, with a major improvement.
The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.
Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.
opam update
opam install menhir.20211230
Happy well-typed parsing in 2022!
--
François Pottier
francois.pottier@inria.fr
http://cambium.inria.fr/~fpottier/
## 2021/12/30
* The code back-end has been rewritten from the ground up by Émile Trotignon
and François Pottier, and now produces efficient and **well-typed** OCaml
code. The infamous `Obj.magic` is not used any more.
The table back-end and the Coq back-end are unaffected by this change.
The main side effects of this change are as follows:
- The code back-end now needs type information. This means that
*either* Menhir's type inference mechanism must be enabled
(the easiest way of enabling it is to use Menhir via `dune`
and to check that the `dune-project` file says
`(using menhir 2.0)` or later)
*or* the type of every nonterminal symbol must be
explicitly given via a `%type` declaration.
- The code back-end no longer allows the type of any symbol to be an
open polymorphic variant type, such as ```[> `A ]```. As a workaround,
we suggest using a closed polymorphic variant instead.
- The code back-end now adheres to the *simplified* error-handling
strategy,
as opposed to the *legacy* strategy.
For grammars that do *not* use the `error` token, this makes no
difference.
For grammars that use the `error` token in the limited way permitted by
the simplified strategy, this makes no difference either. The
simplified
strategy makes the following requirement: the `error` token should
always
appear at the end of a production, whose semantic action should
abort the
parser by raising an exception.
Grammars that make more complex use of the `error` token, and therefore
need the `legacy` strategy, cannot be compiled by the new code
back-end.
As a workaround, it is possible to switch to the table back-end (using
`--table --strategy legacy`) or to the ancient code back-end (using
`--code-ancient`). **In the long run, we recommend abandoning the
use of
the `error` token**. Support for the `error` token may be removed
entirely at some point in the future.
The original code back-end, which has been around since the early days of
Menhir (2005), temporarily remains available (using `--code-ancient`). It
will be removed at some point in the future.
The new code back-end offers several levels of optimization, which remain
undocumented and are subject to change in the future. At present, the
main
levels are roughly as follows:
- `-O 0 --represent-everything` uses a uniform representation of the
stack
and produces straightforward code.
- `-O 0` uses a non-uniform representation of the stack; some stack cells
have fewer fields; some stack cells disappear altogether.
- `-O 1` reduces memory traffic by moving `PUSH` operations so that they
meet `POP` operations and cancel out.
- `-O 2` optimizes the reduction of unit productions (that is,
productions
whose right-hand side has length 1) by performing a limited amount of
code specialization.
The default level of optimization is the maximum level, `-O 2`.
* The new command line switch `--exn-carries-state` causes the exception
`Error` to carry an integer parameter: `exception Error of int`. When the
parser detects a syntax error, the number of the current state is
reported
in this way. This allows the caller to select a suitable syntax error
message, along the lines described in
[Section 11](http://cambium.inria.fr/~fpottier/menhir/manual.html#sec68)
of the manual. This command line switch is currently supported by the
code
back-end only.
* The `$syntaxerror` keyword is no longer supported.
* Document the trick of wrapping module aliases in `open struct ... end`,
like this: `%{ open struct module alias M = MyLongModuleName end %}`.
This allows you to use the short name `M` in your grammar, but forces
OCaml to infer types that refer to the long name `MyLongModuleName`.
(Suggested by Frédéric Bour.)
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-12-31 8:22 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-31 8:22 [Caml-list] [ANN] New release of Menhir (20211230) François Pottier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox