Re: [Caml-list] updating position in an ocamllex lexer

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Leo White <leo@lpw25.net>
To: caml-list@inria.fr
Subject: Re: [Caml-list] updating position in an ocamllex lexer
Date: Thu, 30 Jul 2015 04:21:06 -0400	[thread overview]
Message-ID: <1438244466.807779.337036689.55800F88@webmail.messagingengine.com> (raw)
In-Reply-To: <20150728155638.GA27761@pl-59055.rocqadm.inria.fr>

I think the problem with your attempt to manually update the current
position is that you are adjusting pos_cnum, which is the number of
characters since the beginning of the whole file, not the number of
characters since the beginning of the line. Instead you should be
adjusting pos_bol. For example, if you look at the source for
Lexing.new_line:

  let new_line lexbuf =
    let lcp = lexbuf.lex_curr_p in
    lexbuf.lex_curr_p <- { lcp with
      pos_lnum = lcp.pos_lnum + 1;
      pos_bol = lcp.pos_cnum;
    }

you can see that it adds one to the line number and sets pos_bol
to be the current position.

Regards,

Leo

On Tue, 28 Jul 2015, at 11:56 AM, Sébastien Hinderer wrote:
> Dear all,
> 
> I have to maintain a lexer which did not, so far, compute token
> positions during the lexing step. The positions were computed
> later, which was not very eficient. So I am now trying to compute
> positions during the lexing and have a problem to figure out what
> to do for one token:
> 
>   | ['\n'] [' ' '\t' '\r' '\011' '\012' ]* { ... }
> 
> When this token is met and contains just "\n", calling new_line on the
> lexbuf is enough to keep further position information accurate.
> 
> However, if the recognized token is, say, "\n\t", then it seems that the
> column for further tokens will be incorrect.
> 
> I assumed that I had to manually update the current position and added
> code like so:
> 
>   let s = Lexing.lexeme lexbuf in
>   let l = String.length s in
>   let t = TCommentNewline (tokinfo lexbuf) in
>   (* Adjust the column manually *)
>   Lexing.new_line lexbuf;
>   let lcp = lexbuf.lex_curr_p in
>   lexbuf.lex_curr_p <- { lcp with
>     pos_cnum = lcp.pos_bol + l - 1;
>   };
>   t
> 
> But that does not seem to work.
> 
> Does somebody know how such tokens should be handled, please?
> 
> Many thanks in advance for any help,
> 
> Sébastien.
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

next prev parent reply	other threads:[~2015-07-30  8:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28 15:56 Sébastien Hinderer
2015-07-30  8:21 ` Leo White [this message]
2015-07-30  8:56   ` Sébastien Hinderer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438244466.807779.337036689.55800F88@webmail.messagingengine.com \
    --to=leo@lpw25.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox