From: Matthieu Wipliez <mwipliez@yahoo.fr>
To: O'Caml Mailing List <caml-list@yquem.inria.fr>
Subject: Re : [Caml-list] Re: camlp4 stream parser syntax
Date: Sat, 7 Mar 2009 23:21:42 +0000 (GMT) [thread overview]
Message-ID: <46331.52510.qm@web27007.mail.ukl.yahoo.com> (raw)
In-Reply-To: <24D11586-4F15-4B6E-8FB7-58651317164D@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]
Hi Joel,
why are you using stream parsers instead of Camlp4 grammars ?
This:
> let rec parse_primary = parser
>
> | [< 'INT n >] -> Int n
> | [< 'FLOAT n >] -> Float n
> | [< 'STRING n >] -> Str n
> | [< 'TRUE >] -> Bool true
> | [< 'FALSE >] -> Bool false
>
> | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
could be written as:
expression: [
[ (i, _) = INT -> Int i
| (s, _) = STRING -> Str s
... ]
];
Note that Camlp4 will automatically raise an exception if the input cannot be parsed with the grammar given.
Also if you have input that is syntactically correct but is not semantically correct, and you want to raise an exception with the error location during parsing, you might want to use Loc.raise as follows:
expression: [
[ e1 = SELF; "/"; e2 = SELF ->
if e2 = Int 0 then
Loc.raise _loc (Failure "division by zero")
else
BinaryOp (e1, Div, e2) ]
];
By the way, do you need you own tailor-made lexer? Camlp4 provides one that might satisfy your needs.
Otherwise, you can always define your own lexer (I had to do that for the project I'm working on, see file attached).
Your parser would then look like
(* functor application *)
module Camlp4Loc = Camlp4.Struct.Loc
module Lexer = Cal_lexer.Make(Camlp4Loc)
module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)
(* exposes EOI and other stuff *)
open Lexer
(* rule definition *)
let rule = Gram.Entry.mk "rule"
(* grammar definition *)
EXTEND Gram
rule: [ [ ... ] ];
END
(* to parse a file *)
Gram.parse rule (Loc.mk file) (Stream.of_channel ch)
This should be compiled with camlp4of.
I hope this helps you with what you'd like to do,
Cheers,
Matthieu
----- Message d'origine ----
> De : Joel Reymont <joelr1@gmail.com>
> À : O'Caml Mailing List <caml-list@yquem.inria.fr>
> Envoyé le : Samedi, 7 Mars 2009, 23h52mn 52s
> Objet : [Caml-list] Re: camlp4 stream parser syntax
>
> > Where can I read up on the syntax of the following in a camlp4 stream parser?
> >
> > | [<' INT n >] -> Int n
> >
> > For example, where are [< ... >] described and why is the ' needed in between?
>
>
> To be more precise, I'm using camlp4 to parse a language into a non-OCaml AST.
>
> I'm trying to figure out the meaning of [<, >], [[ and ]]
>
> My ocamllex lexer is wrapped to make it look like a stream lexer (below) and I'm
> returning a tuple of (tok, loc) because I don't see another way of making token
> location available to the parser.
>
> Still, I'm how to integrate the reporting of error location into ?? in something
> like this
>
> | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
>
> Would someone kindly shed light on this?
>
> Thanks in advance, Joel
>
> P.S. ocamllex wrapper to return a' Stream.t
>
> {
> let from_lexbuf tab lb =
> let next _ =
> let tok = token tab lb in
> let loc = Loc.of_lexbuf lb in
> Some (tok, loc)
> in Stream.from next
>
> let setup_loc lb loc =
> let start_pos = Loc.start_pos loc in
> lb.lex_abs_pos <- start_pos.pos_cnum;
> lb.lex_curr_p <- start_pos
>
> let from_string loc tab str =
> let lb = Lexing.from_string str in
> setup_loc lb loc;
> from_lexbuf tab lb
>
> }
>
> ---
> http://tinyco.de
> Mac, C++, OCaml
>
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
[-- Attachment #2: cal_lexer.mll --]
[-- Type: application/octet-stream, Size: 10564 bytes --]
(*****************************************************************************)
(* Cal2C *)
(* Copyright (c) 2007-2008, IETR/INSA of Rennes. *)
(* All rights reserved. *)
(* *)
(* This software is governed by the CeCILL-B license under French law and *)
(* abiding by the rules of distribution of free software. You can use, *)
(* modify and/ or redistribute the software under the terms of the CeCILL-B *)
(* license as circulated by CEA, CNRS and INRIA at the following URL *)
(* "http://www.cecill.info". *)
(* *)
(* Matthieu WIPLIEZ <Matthieu.Wipliez@insa-rennes.fr *)
(*****************************************************************************)
(* File cal_lexer.mll *)
{
open Printf
open Format
module Make (Loc : Camlp4.Sig.Loc) = struct
module Loc = Loc
type token =
| KEYWORD of string
| SYMBOL of string
| IDENT of string
| INT of int * string
| FLOAT of float * string
| CHAR of char * string
| STRING of string * string
| EOI
module Token = struct
module Loc = Loc
type t = token
let to_string =
function
KEYWORD s -> sprintf "KEYWORD %S" s
| SYMBOL s -> sprintf "SYMBOL %S" s
| IDENT s -> sprintf "IDENT %S" s
| INT (_, s) -> sprintf "INT %s" s
| FLOAT (_, s) -> sprintf "FLOAT %s" s
| CHAR (_, s) -> sprintf "CHAR '%s'" s
| STRING (_, s) -> sprintf "STRING \"%s\"" s
(* here it's not %S since the string is already escaped *)
| EOI -> sprintf "EOI"
let print ppf x = pp_print_string ppf (to_string x)
let match_keyword kwd = function
KEYWORD kwd' when kwd = kwd' -> true
| _ -> false
let extract_string =
function
KEYWORD s
| IDENT s
| INT (_, s)
| FLOAT (_, s)
| CHAR (_, s)
| STRING (_, s) -> s
| tok ->
invalid_arg ("Cannot extract a string from this token: "^
to_string tok)
module Error = struct
type t =
Illegal_token of string
| Keyword_as_label of string
| Illegal_token_pattern of string * string
| Illegal_constructor of string
exception E of t
let print ppf =
function
Illegal_token s ->
fprintf ppf "Illegal token (%s)" s
| Keyword_as_label kwd ->
fprintf ppf "`%s' is a keyword, it cannot be used as label name" kwd
| Illegal_token_pattern (p_con, p_prm) ->
fprintf ppf "Illegal token pattern: %s %S" p_con p_prm
| Illegal_constructor con ->
fprintf ppf "Illegal constructor %S" con
let to_string x =
let b = Buffer.create 50 in
let () = bprintf b "%a" print x in Buffer.contents b
end
module M = Camlp4.ErrorHandler.Register(Error)
module Filter = struct
type token_filter = (t, Loc.t) Camlp4.Sig.stream_filter
type t =
{ is_kwd : string -> bool;
mutable filter : token_filter }
let mk is_kwd =
{ is_kwd = is_kwd;
filter = fun s -> s }
let keyword_conversion tok is_kwd =
match tok with
SYMBOL s | IDENT s when is_kwd s -> KEYWORD s
| _ -> tok
let filter x =
let f tok loc =
let tok' = keyword_conversion tok x.is_kwd in
(tok', loc)
in
let rec filter =
parser
| [< '(tok, loc); s >] -> [< ' f tok loc; filter s >]
| [< >] -> [< >]
in
fun strm -> x.filter (filter strm)
let define_filter x f = x.filter <- f x.filter
let keyword_added _ _ _ = ()
let keyword_removed _ _ = ()
end
end
open Lexing
(* Error report *)
module Error = struct
type t =
| Illegal_character of char
| Illegal_escape of string
| Unterminated_comment
| Unterminated_string
| Unterminated_quotation
| Unterminated_antiquot
| Unterminated_string_in_comment
| Comment_start
| Comment_not_end
| Literal_overflow of string
exception E of t
open Format
let print ppf =
function
| Illegal_character c ->
fprintf ppf "Illegal character (%s)" (Char.escaped c)
| Illegal_escape s ->
fprintf ppf "Illegal backslash escape in string or character (%s)" s
| Unterminated_comment ->
fprintf ppf "Comment not terminated"
| Unterminated_string ->
fprintf ppf "String literal not terminated"
| Unterminated_string_in_comment ->
fprintf ppf "This comment contains an unterminated string literal"
| Unterminated_quotation ->
fprintf ppf "Quotation not terminated"
| Unterminated_antiquot ->
fprintf ppf "Antiquotation not terminated"
| Literal_overflow ty ->
fprintf ppf "Integer literal exceeds the range of representable integers of type %s" ty
| Comment_start ->
fprintf ppf "this is the start of a comment"
| Comment_not_end ->
fprintf ppf "this is not the end of a comment"
let to_string x =
let b = Buffer.create 50 in
let () = bprintf b "%a" print x in Buffer.contents b
end
let module M = Camlp4.ErrorHandler.Register(Error) in ()
open Error
open Cal2c_util
exception Eof
(* String construction *)
let str = ref ""
type context = {
loc : Loc.t;
in_comment : bool;
quotations : bool;
antiquots : bool;
lexbuf : lexbuf;
buffer : Buffer.t
}
(* Update the current location with file name and line number. *)
let update_loc c file line absolute chars =
let lexbuf = c.lexbuf in
let pos = lexbuf.lex_curr_p in
let new_file =
match file with
| None -> pos.pos_fname
| Some s -> s
in
lexbuf.lex_curr_p <- { pos with
pos_fname = new_file;
pos_lnum = if absolute then line else pos.pos_lnum + line;
pos_bol = pos.pos_cnum - chars;
}
(* Matches either \ or $. Why so many backslashes? Because \ has to be escaped*)
(* in strings, so we get \\. \, | and $ also have to be escaped in regexps, *)
(* so we have \\\\ \\| \\$. *)
let re_id = Str.regexp "\\\\\\|\\$"
}
(* Numbers *)
let nonZeroDecimalDigit = ['1'-'9']
let decimalDigit = '0' | nonZeroDecimalDigit
let decimalLiteral = nonZeroDecimalDigit (decimalDigit)*
let hexadecimalDigit = decimalDigit | ['a'-'f'] | ['A'-'F']
let hexadecimalLiteral = '0' ('x'|'X') hexadecimalDigit (hexadecimalDigit)*
let octalDigit = ['0'-'7']
let octalLiteral = '0' (octalDigit)*
let integer = decimalLiteral | hexadecimalLiteral | octalLiteral
let exponent = ('e'|'E') ('+'|'-')? decimalDigit+
let real = decimalDigit+ '.' (decimalDigit)* exponent?
| '.' decimalDigit+ exponent?
| decimalDigit+ exponent
(* Identifiers *)
let char = ['a'-'z' 'A'-'Z']
let any_identifier = (char | '_' | decimalDigit | '$')+
let other_identifier =
(char | '_') (char | '_' | decimalDigit | '$')*
| '$' (char | '_' | decimalDigit | '$')+
let identifier = '\\' any_identifier '\\' | other_identifier
let newline = ('\010' | '\013' | "\013\010")
(* Token rule *)
rule token c = parse
| [' ' '\t'] {token c lexbuf}
| newline { update_loc c None 1 false 0; token c lexbuf }
| "^" { SYMBOL "^" }
| "->" { SYMBOL "->" }
| ':' { SYMBOL ":" }
| ":=" { SYMBOL ":=" }
| ',' { SYMBOL "," }
| "!=" { SYMBOL "!=" }
| '/' { SYMBOL "/" }
| '.' { SYMBOL "." }
| ".." { SYMBOL ".." }
| "::" { SYMBOL "::" }
| "-->" { SYMBOL "-->" }
| "==>" { SYMBOL "==>" }
| '=' { SYMBOL "=" }
| ">=" { SYMBOL ">=" }
| '>' { SYMBOL ">" }
| '{' { SYMBOL "{" }
| '[' { SYMBOL "[" }
| "<=" { SYMBOL "<=" }
| '<' { SYMBOL "<" }
| '(' { SYMBOL "(" }
| '-' { SYMBOL "-" }
| '+' { SYMBOL "+" }
| '}' { SYMBOL "}" }
| ']' { SYMBOL "]" }
| ')' { SYMBOL ")" }
| ';' { SYMBOL ";" }
| '#' { SYMBOL "#" }
| '*' { SYMBOL "*" }
| integer as lxm { INT (int_of_string lxm, lxm) }
| real as lxm { FLOAT (float_of_string lxm, lxm) }
| identifier as ident {
let ident = Str.global_replace re_id "_" ident in
IDENT ident }
| '"' { let str = string c lexbuf in STRING (str, str) }
| "//" { single_line_comment c lexbuf }
| "/*" { multi_line_comment c lexbuf }
| eof { EOI }
and string ctx = parse
| "\\\"" { str := !str ^ "\\\""; string ctx lexbuf }
| '"' { let s = !str in str := ""; s }
| _ as c { str := !str ^ (String.make 1 c); string ctx lexbuf }
and single_line_comment c = parse
| newline { update_loc c None 1 false 0; token c lexbuf }
| _ { single_line_comment c lexbuf }
and multi_line_comment c = parse
| "*/" { token c lexbuf }
| newline { update_loc c None 1 false 0; multi_line_comment c lexbuf }
| _ { multi_line_comment c lexbuf }
{
let default_context lb =
{ loc = Loc.ghost ;
in_comment = false ;
quotations = true ;
antiquots = false ;
lexbuf = lb ;
buffer = Buffer.create 256 }
let update_loc c = { (c) with loc = Loc.of_lexbuf c.lexbuf }
let with_curr_loc f c = f (update_loc c) c.lexbuf
let lexing_store s buff max =
let rec self n s =
if n >= max then n
else
match Stream.peek s with
| Some x ->
Stream.junk s;
buff.[n] <- x;
succ n
| _ -> n
in
self 0 s
let from_context c =
let next _ =
let tok = with_curr_loc token c in
let loc = Loc.of_lexbuf c.lexbuf in
Some ((tok, loc))
in Stream.from next
let from_lexbuf ?(quotations = true) lb =
let c = { (default_context lb) with
loc = Loc.of_lexbuf lb;
antiquots = !Camlp4_config.antiquotations;
quotations = quotations }
in from_context c
let setup_loc lb loc =
let start_pos = Loc.start_pos loc in
lb.lex_abs_pos <- start_pos.pos_cnum;
lb.lex_curr_p <- start_pos
let from_string ?quotations loc str =
let lb = Lexing.from_string str in
setup_loc lb loc;
from_lexbuf ?quotations lb
let from_stream ?quotations loc strm =
let lb = Lexing.from_function (lexing_store strm) in
setup_loc lb loc;
from_lexbuf ?quotations lb
let mk () loc strm =
from_stream ~quotations:!Camlp4_config.quotations loc strm
end
}
next prev parent reply other threads:[~2009-03-07 23:22 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-07 22:38 Joel Reymont
2009-03-07 22:52 ` Joel Reymont
2009-03-07 23:21 ` Matthieu Wipliez [this message]
2009-03-07 23:42 ` Re : [Caml-list] " Joel Reymont
2009-03-08 0:40 ` Joel Reymont
2009-03-08 1:08 ` Re : " Matthieu Wipliez
2009-03-08 8:25 ` Joel Reymont
2009-03-08 9:37 ` Daniel de Rauglaudre
2009-03-08 9:51 ` Joel Reymont
2009-03-08 10:27 ` Daniel de Rauglaudre
2009-03-08 10:35 ` Joel Reymont
2009-03-08 11:07 ` Joel Reymont
2009-03-08 11:28 ` Daniel de Rauglaudre
2009-03-08 11:45 ` Re : Re : " Matthieu Wipliez
2009-03-08 11:52 ` Joel Reymont
2009-03-08 13:33 ` Re : " Matthieu Wipliez
2009-03-08 13:59 ` Joel Reymont
2009-03-08 14:09 ` Re : " Matthieu Wipliez
2009-03-08 14:30 ` Joel Reymont
2009-03-08 15:07 ` Re : " Matthieu Wipliez
2009-03-08 15:24 ` Joel Reymont
2009-03-08 15:32 ` Re : " Matthieu Wipliez
2009-03-08 15:39 ` Joel Reymont
2009-03-08 15:46 ` Joel Reymont
2009-03-08 15:55 ` Re : " Matthieu Wipliez
2009-03-08 16:58 ` Joel Reymont
2009-03-08 17:04 ` Re : " Matthieu Wipliez
2009-03-08 17:15 ` Joel Reymont
2009-03-08 9:34 ` Joel Reymont
2009-03-07 23:52 ` [Caml-list] " Jon Harrop
2009-03-07 23:53 ` Joel Reymont
2009-03-08 0:12 ` Jon Harrop
2009-03-08 0:20 ` Re : " Matthieu Wipliez
2009-03-08 0:29 ` Jon Harrop
2009-03-08 0:30 ` Re : " Joel Reymont
2009-03-08 0:37 ` Re : " Matthieu Wipliez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46331.52510.qm@web27007.mail.ukl.yahoo.com \
--to=mwipliez@yahoo.fr \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox