* hacks using camlp4
@ 1997-11-26 18:00 David Monniaux
1997-11-27 19:45 ` Daniel de Rauglaudre
0 siblings, 1 reply; 5+ messages in thread
From: David Monniaux @ 1997-11-26 18:00 UTC (permalink / raw)
To: Caml-list
[ J'ai voulu mettre en place des expressions regulieres avec une syntaxe
plus agreable et bien typees, en utilisant camlp4. J'ai quelques
resultats mais aussi quelques problemes... ]
Hi,
I lately checked the camlp4 preprocessor. I think this tool may have lots
of useful applications, since it allows especially custom syntaxes for the
input of certain kind of objects, in a more programmer-friendly fashion
than just inputting raw data structures into the source code.
This also lessens the pressure on putting library-dependent features into
the language itself, as was done for the lists, strings and more
stringently with the format strings for the Printf.*printf functions.
I thus tried to give a type-safe interface to regular expressions. My
small hack (available on request) adds a syntax to do matching on a
regular expression and return components in a friendly and type-safe way,
and also a replace function.
[example:
let x = "meuh je (suis) un [beau] chaton {suis}" =~ item
=> (String.uppercase)
/{item: ['(' '['] ['a'-'z']+ [']' ')']}/
in Printf.printf "%s\n" x;;
will uppercase the substrings enclosed in () or [].
]
To give Perl-like efficiency, I precompile regexps (let-bindings for
precompiled regexps are prepended to the output); that allows efficient
matching even in the middle of a loop. However, this precludes using some
variable parts in the regexps (I for now only allow constant regexps); the
ideal solution would be to compile the regexps at a step in the program
where all the involved values can be computed.
This alas involves some semantic analysis.
I therefore have three problems:
1. Precompiled expressions.
More generally, what would be needed would be some construction to
evaluate an expression as soon as it is possible.
2. It would be nice if regexp precompilation could be done at compile or
preprocessing time (I was thinking of marshalling the precompiled
regexp, but I fear some C-library private data structure inside the
regexp type).
3. The stupid emacs regexp syntax (my code is for now not really
working if you put ^ or ] in ranges).
The third point is only a matter of library, but the others deserve some
thinking, I should say...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: hacks using camlp4
1997-11-26 18:00 hacks using camlp4 David Monniaux
@ 1997-11-27 19:45 ` Daniel de Rauglaudre
1997-11-30 9:59 ` Michel Mauny
1997-12-02 17:03 ` Thierry Bravier
0 siblings, 2 replies; 5+ messages in thread
From: Daniel de Rauglaudre @ 1997-11-27 19:45 UTC (permalink / raw)
To: David Monniaux; +Cc: caml-list
> I lately checked the camlp4 preprocessor. I think this tool may have lots
> of useful applications, since it allows especially custom syntaxes for the
> input of certain kind of objects, in a more programmer-friendly fashion
> than just inputting raw data structures into the source code.
>
> ...
>
> I therefore have three problems:
>
> 1. Precompiled expressions.
> More generally, what would be needed would be some construction to
> evaluate an expression as soon as it is possible.
>
> 2. It would be nice if regexp precompilation could be done at compile or
> preprocessing time (I was thinking of marshalling the precompiled
> regexp, but I fear some C-library private data structure inside the
> regexp type).
A solution is partial evaluation. I have implemented a syntax solution
below (working with ocaml syntax, not righteous one). The idea is to
automatically generate global declarations.
The important file is "partial.ml" which gives a general mechanism of
partial evaluations, in Ocaml (not righteous) syntax. The function
"Partial.eval" generates a global variable and returns a (syntax)
access to it.
Then, an example: partially evaluate strings concatenation (^) when
both parameters are pure strings. The file "concat.ml" changes the
predefined string concatenation to generate a global variable in the
interesting case.
Once these files compiled (see their sources further), here is an
example:
$ cat foo.ml
let foo x =
function
0 -> x
| 1 -> x ^ "aa"
| 2 -> "aa" ^ x
| _ -> "aa" ^ "bb"
;;
No partial evaluation:
$ camlp4o pr_o.cmo foo.ml
let foo x =
function
0 -> x
| 1 -> x ^ "aa"
| 2 -> "aa" ^ x
| _ -> "aa" ^ "bb"
;;
Partial evaluation, automatically generated by partial.cmo + concat.cmo
$ camlp4o pr_o.cmo ./partial.cmo ./concat.cmo foo.ml
let v_1 = "aa" ^ "bb";;
let foo x =
function
0 -> x
| 1 -> x ^ "aa"
| 2 -> "aa" ^ x
| _ -> v_1
;;
Remark: the last case is not pretty printed like this by Camlp4
version 1.06+1; this is just a pretty printing bug fixed in the
version 1.06+2 (patches soon available in the ftp distribution).
Anyway, it works.
Now, the files "partial.ml" and "concat.ml":
========================= partial.ml
open Pcaml;;
Grammar.warning_verbose := false;;
let o2b = function Some _ -> True | None -> False;;
(* Global declarations generated *)
let globals = ref [];;
let add_globals loc si =
match !globals with
[] -> si
| g -> globals := []; <:str_item< declare $list:g @ [si]$ end >>
;;
(* Changes the declarations which holds expressions (cf etc/pa_o.ml) in
order to declare before the possible global declarations *)
EXTEND
str_item:
[ [ "let"; r = OPT "rec"; "_"; "="; e = expr ->
add_globals loc <:str_item< $exp:e$ >>
| "let"; r = OPT "rec"; l = LIST1 let_binding SEP "and"; "in";
x = expr ->
let e = <:expr< let $rec:o2b r$ $list:l$ in $x$ >> in
add_globals loc <:str_item< $exp:e$ >>
| "let"; r = OPT "rec"; l = LIST1 let_binding SEP "and" ->
add_globals loc <:str_item< value $rec:o2b r$ $list:l$ >>
| e = expr ->
add_globals loc <:str_item< $exp:e$ >> ] ]
;
END;;
(* Generates a global variable name. If conflict with program variables,
the user must change the "v_" into something else. *)
let genvar =
let cnt = ref 0 in
fun () -> incr cnt; "v_" ^ string_of_int !cnt
;;
(* [Partial.eval loc e] generates a global variable equal to [e] and returns
an access to it; [loc] is the location of [e] for possible semantic
errors *)
let eval loc e =
let v = genvar () in
globals := !globals @ [ <:str_item< value $lid:v$ = $e$ >> ];
<:expr< $lid:v$ >>
;;
========================= concat.ml
open Pcaml;;
(* Changes the syntax of "^" (cf etc/pa_o.ml) in order to generate
a global variable, using [Partial.eval] when both parameters are
pure strings *)
EXTEND
expr: LEVEL "^"
[ [ e1 = expr;
f = [ op = "^" -> op
| op = "@" -> op ];
e2 = expr ->
let e = <:expr< $lid:f$ $e1$ $e2$ >> in
match f, e1, e2 with
"^", <:expr< $str:_$ >>, <:expr< $str:_$ >> -> Partial.eval loc e
| _ -> e ] ]
;
END;;
=========================
Compilations:
ocamlc -pp "camlp4o pa_extend.cmo q_MLast.cmo" -I `camlp4 -where` -c partial.ml
ocamlc -pp "camlp4o pa_extend.cmo q_MLast.cmo" -I `camlp4 -where` -c concat.ml
--------------------------------------------------------------------------
Daniel de RAUGLAUDRE
Projet Cristal - INRIA Rocquencourt
Tel: +33 (01) 39 63 53 51
Email: daniel.de_rauglaudre@inria.fr
Web: http://pauillac.inria.fr/~ddr/
--------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: hacks using camlp4
1997-11-27 19:45 ` Daniel de Rauglaudre
@ 1997-11-30 9:59 ` Michel Mauny
1997-12-02 17:03 ` Thierry Bravier
1 sibling, 0 replies; 5+ messages in thread
From: Michel Mauny @ 1997-11-30 9:59 UTC (permalink / raw)
To: David Monniaux; +Cc: caml-list
David Monniaux wrote:
> > I lately checked the camlp4 preprocessor. I think this tool may have lots
> > of useful applications, since it allows especially custom syntaxes for the
> > input of certain kind of objects, in a more programmer-friendly fashion
> > than just inputting raw data structures into the source code.
Yes, indeed. One of Camlp4 applications we had in mind are to provide
dedicated syntactic support for the use of some libraries. We are (at
least, I am) eager to see non trivial examples, and regular
expressions could be one of these.
> > I therefore have three problems:
> >
> > 1. Precompiled expressions.
> > More generally, what would be needed would be some construction to
> > evaluate an expression as soon as it is possible.
> >
> > 2. It would be nice if regexp precompilation could be done at compile or
> > preprocessing time (I was thinking of marshalling the precompiled
> > regexp, but I fear some C-library private data structure inside the
> > regexp type).
Daniel de Rauglaudre answered:
> A solution is partial evaluation.
Another one (suggested by Xavier Leroy) is changing the regexp
compiler into a memo-function. Encapsulate the compiler together with
a private memory (assoc list, hash table) that can be used to return
immediately the result of already (that is, previously) compiled
regexprs. (Maybe that one could be directly provided by the Str
library as a `memo_regexp' function?)
More generally, Camlp4 is only a front-end to the OCaml compiler, not
part of it. It can therefore only perform source-level
transformations. If it was possible to express regexp compilation as a
source-to-source transformation, then a general partial evaluation
would definitely solve your problem #2. Unfortunately, the regexp
library is a black box, and can only be used as such, as far as I
understand. If the regexp compiler had some known properties (such as
compositionality), then something could also be done. Unfortunately,
this isn't the case either.
Coming to your point #1, where the question is pretty general, this
can be dealt with partial evaluation (which is a pretty general answer
to pretty general questions :-), as suggested by Daniel. At a compiler
level, classical compile-time optimizations (such as extraction of
loop invariants) or less classical ones (how to obtain full laziness
in lazy functional languages by extracting maximal subexpressions)
seem to address that question. Unfortunately, they are less effective
in languages such as Caml (because of side-effects) than in pure FL,
if semantic preservation is an issue (and it is, generally).
As a conclusion, for your application, Daniel's solution could be used
to globalize non-variable regexprs, and a `memo' interface to the
regexp compiler could do the rest.
Cordialement,
--
Michel Mauny
----------------------------------------------
INRIA -- BP 105 -- F-78153 Le Chesnay Cedex
Tel.: +33 1 39 63 57 96 Fax: +33 1 39 63 56 84
Email: Michel.Mauny@inria.fr
WWW: http://www.inria.fr/Michel.Mauny
----------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: hacks using camlp4
1997-11-27 19:45 ` Daniel de Rauglaudre
1997-11-30 9:59 ` Michel Mauny
@ 1997-12-02 17:03 ` Thierry Bravier
1997-12-02 21:06 ` Daniel de Rauglaudre
1 sibling, 1 reply; 5+ messages in thread
From: Thierry Bravier @ 1997-12-02 17:03 UTC (permalink / raw)
To: caml-list; +Cc: daniel.de_rauglaudre
Daniel de Rauglaudre wrote:
>
> > I lately checked the camlp4 preprocessor. I think this tool may have lots
> > of useful applications, since it allows especially custom syntaxes for the
> > input of certain kind of objects, in a more programmer-friendly fashion
> > than just inputting raw data structures into the source code.
> >
>
> A solution is partial evaluation. I have implemented a syntax solution
> below (working with ocaml syntax, not righteous one). The idea is to
> automatically generate global declarations.
>
(*
============================================================================
* File: camlp4/fold.ml
* Language: caml
* Author: Thierry Bravier
* Time-stamp: <97/12/02 17:41:28 tb>
* Created: 97/12/02 14:18:53 tb
*
=========================================================================
*)
(*
Dear ocaml users,
Smart as Ocaml is, camlp4 makes it even smarter.
Here is a small constant folding camlp4 module that can be useful to
efficiently parse ocaml (standard syntax) code.
This extension is adapted from [mkumin] in etc/pa_o.ml which can be
seen as a minimal constant folding algorithm on its own.
It is also a degenerated case of pre-compiled expressions as
was discussed on the ocaml mailing list.
In this precise case, the smart [partial.ml] module presented by
Daniel de Rauglaudre is too powerful, since there is no need to
store folded values in global variables, folded values are put in
new literals instead.
Example:
ocamlc -pp "camlp4o pa_extend.cmo q_MLast.cmo" -I `camlp4 -where` -c
fold.ml
cat foldable.ml
let f x = "foo"^"bar"^"gee", 50 - 100 -10 * x + 4 * 8
camlp4o pr_o.cmo ./fold.cmo foldable.ml
let f x = "foobargee", -50 - 10 * x + 32;;
Unfortunately, THIS MODULE IS NOT FULLY SAFE, since it relies on
the dangerous assumption that ( + ) is the integer addition, ( ** )
is the floating point power operator, etc.
This is not always true:
let (+) x y = y - x;;
let x = 10 + 100;;
x is 90, not 110 !
Or even:
let f (^) = "foo" ^ "bar"
val f : (string -> string -> 'a) -> 'a
So please, first check the lexical environment in which expressions
are to be expanded.
Thierry Bravier Dassault Aviation - DGT / DTN / ELO / EAV
78, Quai Marcel Dassault F-92214 Saint-Cloud Cedex - France
Telephone : (33) 01 47 11 53 07 Telecopie : (33) 01 47 11 52 83
E-Mail : mailto:thierry.bravier@dassault-aviation.fr
*)
(*
=========================================================================
*)
open Pcaml
;;
(*
=========================================================================
*)
type literal =
| Int of int
| Flo of float
| Str of string
| Chr of char
let get_literal loc = function
| <:expr< $int:i$ >> -> Some (Int (int_of_string i))
| <:expr< $flo:f$ >> -> Some (Flo (float_of_string f))
| <:expr< $str:s$ >> -> Some (Str s)
| <:expr< $chr:c$ >> -> Some (Chr c)
| _ -> None
and put_literal loc = function
| Int i -> <:expr< $int:string_of_int i$ >>
| Flo f -> <:expr< $flo:string_of_float f$ >>
| Str s -> <:expr< $str:s$ >>
| Chr c -> <:expr< $chr:c$ >>
(*
=========================================================================
*)
let fold_1 (nop, fop) loc e1 =
match get_literal loc e1 with
| Some l1 -> <:expr< $fop l1$ >>
| _ -> <:expr< $lid:nop$ $e1$ >>
and fold_2 (nop, fop) loc e1 e2 =
match get_literal loc e1, get_literal loc e2 with
| Some l1, Some l2 -> <:expr< $fop l1 l2$ >>
| _ -> <:expr< $lid:nop$ $e1$ $e2$ >>
(*
=========================================================================
*)
let dont_fold_1 nop loc =
fold_1
(nop, (fun l1 -> <:expr< $lid:nop$ $put_literal loc l1$ >>))
loc
and fold_int_1 (nop, fop) loc =
fold_1
(nop,
(function
| Int i1 -> put_literal loc (Int (fop i1))
| l1 -> <:expr< $lid:nop$ $put_literal loc l1$ >>))
loc
and fold_flo_1 (nop, fop) loc =
fold_1
(nop,
(function
| Flo f1 -> put_literal loc (Flo (fop f1))
| l1 -> <:expr< $lid:nop$ $put_literal loc l1$ >>))
loc
let fold_int_2 (nop, fop) loc =
fold_2
(nop,
(fun l1 l2 ->
match l1, l2 with
| Int i1, Int i2 -> put_literal loc (Int (fop i1 i2))
| _ -> <:expr< $lid:nop$ $put_literal loc l1$ $put_literal loc
l2$ >>))
loc
and dont_fold_2 nop loc =
fold_2
(nop,
(fun l1 l2 ->
<:expr< $lid:nop$ $put_literal loc l1$ $put_literal loc l2$ >>))
loc
and fold_flo_2 (nop, fop) loc =
fold_2
(nop,
(fun l1 l2 ->
match l1, l2 with
| Flo f1, Flo f2 -> put_literal loc (Flo (fop f1 f2))
| _ -> <:expr< $lid:nop$ $put_literal loc l1$ $put_literal loc
l2$ >>))
loc
and fold_str_2 (nop, fop) loc =
fold_2
(nop,
(fun l1 l2 ->
match l1, l2 with
| Str s1, Str s2 -> put_literal loc (Str (fop s1 s2))
| _ -> <:expr< $lid:nop$ $put_literal loc l1$ $put_literal loc
l2$ >>))
loc
(*
=========================================================================
*)
type folder_1 =
| Dont_Fold_1
| Fold_Int_1 of (int -> int)
| Fold_Flo_1 of (float -> float)
and folder_2 =
| Dont_Fold_2
| Fold_Int_2 of (int -> int -> int)
| Fold_Flo_2 of (float -> float -> float)
| Fold_Str_2 of (string -> string -> string)
let make_folder_1 = function
| nop, Dont_Fold_1 -> dont_fold_1 nop
| nop, Fold_Int_1 fop -> fold_int_1 (nop, fop)
| nop, Fold_Flo_1 fop -> fold_flo_1 (nop, fop)
and make_folder_2 = function
| nop, Dont_Fold_2 -> dont_fold_2 nop
| nop, Fold_Int_2 fop -> fold_int_2 (nop, fop)
| nop, Fold_Flo_2 fop -> fold_flo_2 (nop, fop)
| nop, Fold_Str_2 fop -> fold_str_2 (nop, fop)
;;
(*
=========================================================================
*)
EXTEND
expr: LEVEL "^"
[ [ e1 = SELF;
f = [ op = "^" -> op, Fold_Str_2 ( ^ )
| op = "@" -> op, Dont_Fold_2 ];
e2 = SELF -> make_folder_2 f loc e1 e2 ] ]
;
expr: LEVEL "+"
[ [ e1 = SELF;
f = [ op = "+" -> op, Fold_Int_2 ( + )
| op = "-" -> op, Fold_Int_2 ( - )
| op = "+." -> op, Fold_Flo_2 ( +. )
| op = "-." -> op, Fold_Flo_2 ( -. ) ];
e2 = SELF -> make_folder_2 f loc e1 e2 ] ]
;
expr: LEVEL "*"
[ [ e1 = SELF;
f = [ op = "*" -> op, Fold_Int_2 ( * )
| op = "/" -> op, Fold_Int_2 ( / )
| op = "*." -> op, Fold_Flo_2 ( *. )
| op = "/." -> op, Fold_Flo_2 ( /. )
| op = "land" -> op, Fold_Int_2 ( land )
| op = "lor" -> op, Fold_Int_2 ( lor )
| op = "lxor" -> op, Fold_Int_2 ( lxor )
| op = "mod" -> op, Fold_Int_2 ( mod ) ];
e2 = SELF -> make_folder_2 f loc e1 e2 ] ]
;
expr: LEVEL "**"
[ [ e1 = SELF;
f = [ op = "**" -> op, Fold_Flo_2 ( ** )
| op = "asr" -> op, Fold_Int_2 ( asr )
| op = "lsl" -> op, Fold_Int_2 ( lsl )
| op = "lsr" -> op, Fold_Int_2 ( lsr ) ];
e2 = SELF -> make_folder_2 f loc e1 e2 ] ]
;
expr: LEVEL "unary minus"
[ [ f = [ op = "-" -> "~-", Fold_Int_1 ( ~- )
| op = "-." -> "~-.", Fold_Flo_1 ( ~-. ) ];
e = SELF -> make_folder_1 f loc e ] ]
;
expr: LEVEL "~-"
[ [ f = [ op = "~-" -> op, Fold_Int_1 ( ~- )
| op = "~-." -> op, Fold_Flo_1 ( ~-. ) ];
e = SELF -> make_folder_1 f loc e ] ]
;
END
(*
=========================================================================
*)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: hacks using camlp4
1997-12-02 17:03 ` Thierry Bravier
@ 1997-12-02 21:06 ` Daniel de Rauglaudre
0 siblings, 0 replies; 5+ messages in thread
From: Daniel de Rauglaudre @ 1997-12-02 21:06 UTC (permalink / raw)
To: Thierry Bravier; +Cc: caml-list
Interesting, your constant folding code.
> In this precise case, the smart [partial.ml] module presented by
> Daniel de Rauglaudre is too powerful, since there is no need to
> store folded values in global variables, folded values are put in
> new literals instead.
Yes, my implementation was to globalize
Str.regexp "..."
which cannot be written as a literal.
--------------------------------------------------------------------------
Daniel de RAUGLAUDRE
Projet Cristal - INRIA Rocquencourt
Tel: +33 (01) 39 63 53 51
Email: daniel.de_rauglaudre@inria.fr
Web: http://pauillac.inria.fr/~ddr/
--------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~1997-12-03 7:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-11-26 18:00 hacks using camlp4 David Monniaux
1997-11-27 19:45 ` Daniel de Rauglaudre
1997-11-30 9:59 ` Michel Mauny
1997-12-02 17:03 ` Thierry Bravier
1997-12-02 21:06 ` Daniel de Rauglaudre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox