* ocamlyacc: Bug ? Meaning of $end ?
@ 2000-09-25 10:23 Markus Leypold
0 siblings, 0 replies; only message in thread
From: Markus Leypold @ 2000-09-25 10:23 UTC (permalink / raw)
To: caml-list
Dear Friends,
I have found some points in the implementation of ocamlyacc,
which I'm not quite sure, wether I understand right, or wether
this behaviour might even be called a bug.
Let me show my example first: There are 3 modules: The Lexer Cword,
the parser Cword2imf and the test driver tcw. I have stripped the
sample code down to the bare minimum which seems to be necessary to
demonstrate the effect which puzzles me.
The Lexer divides the stream of input characters into convenient
words, runs or white space and newline and end-of-file-marks.
cword.mll:
-----------------------------------
{
open Cwords2imf
}
let newline = '\n'
let normalchar = [^ '\n' '/' '*' ' ' '\t']
let space = [' ' '\t']
let word = normalchar+
rule nextword = parse
word {WORD(Lexing.lexeme lexbuf)}
| space {SPACE(Lexing.lexeme lexbuf)}
| newline {NEWLINE}
| eof {END}
| _ {WORD(Lexing.lexeme lexbuf)}
----------------------------------
The parser is supposed to collect the tokens into lines and newline marks.
cwords2imf.mly:
-----------------------------------
%token NEWLINE END
%token <string> WORD
%token <string> SPACE
%start main
%type <string> main
%%
main
: END {"X\n"}
| NEWLINE {"N\n"}
| linefrag {"L "^$1^"\n"};
linefrag
: SPACE linefrag {$1^$2}
| WORD {$1}
| {"**"};
--------------------------------------
The Testdriver invokes the parser on stdin just once (reading one semantic
value).
tcw.ml:
--------------------------------------
open Cwords2imf;;
let lexbuf = (Lexing.from_channel stdin);;
print_string (Cwords2imf.main
Cwords.nextword
lexbuf);;
--------------------------------------
Now I call tcw and feed the some input like the following to it
bla bla foobar[newline]
blubb hello[newline]
[eof]
where the text in [] is to be interpreted (not to be taken as
verbatim).
So far this works and I can even call the parser+lexer repeatedly on
the input streams, thus reading a 'sentence' from the input every time
and obtaining it's semantic value. But if I replace the rule about
linefrag with
linefrag
: SPACE linefrag {$1^$2}
| WORD linefrag {$1^$2}
| {"**"};
everything goes down the drain, and I get a parse error, even if I call
parser+lexer only once.
So I started to look into the generated statemachine in the .output file,
but things became even more confusing to me:
In both variants of the grammar, the only place, where the parser action
is 'accept', is the following:
state 2
$accept : %entry% . $end (0)
$end accept
It's a bit irritating, that ist doesn't say '. error' also, but never mind.
Ah, then, I thought, that is the problem: 'accept' is only done, if the
matched phrase is followed by the end of the input. $end, I thought,
is a pseudotoken, which signifies the end of input from the lexer
(the original yacc behaves like this, I think).
-> But than i cannot explain, why in the first case, I get a match,
which ist not obtained in the second case. Is the generation
of parse tables erronous ?
-> What is the meaning of $end ? It cannot be the end of input,
since from the example in the manual I understand, that a
parser/lexer-combination might be called on an input stream
repeatedly.
On the other side, to match just the longest prefix sentence
in the input, I'd more expect a state with '. accept' after
entries, which check all the tokens, wether it is possible to
continue parsing ie 'WORD shift 17'.
-> My original idea was to collect the tokens to larger tokens, and to
pass them to another parser, ie letting the parser act as part of a
(handwritten) lexer to another parser. Now I read something about
'the parsers not being reentrant' ? Is that true ? Does that mean
the calling sequence
parser1 -> function1 (as lexer to parser1)
-> parser2
-> lexer2
will not work ?
Can anyone comment on this, possibly showing me THE LIGHT :-) ?
Regards -- Markus
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2000-09-25 17:21 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-25 10:23 ocamlyacc: Bug ? Meaning of $end ? Markus Leypold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox