From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA29587; Mon, 28 Apr 2003 20:13:58 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA29845 for <caml-list@pauillac.inria.fr>; Mon, 28 Apr 2003 20:13:54 +0200 (MET DST)
Received: from jamlikhet.polytechnique.org (b.mx.m4x.polytechnique.org [129.104.30.35])
	by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h3SIDsH14372
	for <caml-list@inria.fr>; Mon, 28 Apr 2003 20:13:54 +0200 (MET DST)
Received: from alan-schm1p.inria.fr (DHCP12-12.CIS.UPENN.EDU [158.130.13.42])
	by ssl.polytechnique.org (Postfix) with ESMTP id 496EFB41E
	for <caml-list@inria.fr>; Mon, 28 Apr 2003 20:13:53 +0200 (CEST)
Received: by alan-schm1p.inria.fr (Postfix, from userid 501)
	id 6633B4044; Mon, 28 Apr 2003 14:13:48 -0400 (EDT)
Date: Mon, 28 Apr 2003 14:13:48 -0400
From: Alan Schmitt <alan.schmitt@polytechnique.org>
To: caml-list@inria.fr
Subject: [Caml-list] a few lexing questions
Message-ID: <20030428181348.GM9337@alan-schm1p>
Mail-Followup-To: caml-list@inria.fr
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4i
X-Editor: Vim http://vim.sf.net/
X-Info: http://pauillac.inria.fr/~aschmitt/
X-Operating-System: Linux/2.4.21pre4-1mdk (i686)
X-Uptime: 14:04:41 up 15:21,  1 user,  load average: 0.02, 0.03, 0.00
X-Spam: no; 0.00; reuse:01 symmetric:01 parser:02 bytes:02 buggy:03 lexbuf:03 lex:04 merged:04 parse:04 merging:05 implement:05 suggestion:06 i'd:06 'x':07 letter:92 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

Hi,

I am writing a parser / pretty-printer for the iCalendar file format,
and I have a few questions.

First of all, there are some times when I need to differentiate lexing
according to the first letter of what I am lexing. As I want to reuse
the lexing code, I did the following:

and alt_lang_xparam = parse
| 'A'                    { lexbuf.Lexing.lex_curr_pos <- Lexing.lexeme_start lexbuf;
                           Altrepparam (altrepparam lexbuf) }
| 'X'                    { lexbuf.Lexing.lex_curr_pos <- Lexing.lexeme_start lexbuf;
                           Xparam (xparam lexbuf) }
| 'L'                    { lexbuf.Lexing.lex_curr_pos <- Lexing.lexeme_start lexbuf;
                           Langparam (languageparam lexbuf) }

Is it buggy ? Bad style ? Is there a nicer way to do it ?

A second question is about integration this code with other lexing code
or streams. An iCalendar file cannot have a line that is longer than 75
bytes, excluding line break. A line may be broken anywhere as long as
there is a space at the beginning of the next line, as in:

this is a very lo
 ng line

represents "this is a very long line". As this break may occur anywhere
(even inside keywords), I assume when writing the lexer these kind of
lines have been already merged together. I know how to implement the
merging using a temp file, but I'm looking for a nicer solution (like
using a stream, or using one lexer to feed the current lexer). Any
suggestion ?

I'd also like to have some advice on doing the symmetric transformation
(breaking long lines when necessary) ... Would streams be a good
solution here ?

Thanks a lot,

Alan Schmitt

-- 
The hacker: someone who figured things out and made something cool happen.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners