From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Pierre Weis <pierre.weis@inria.fr>
Cc: vincent@leleu.info, caml-list@inria.fr
Subject: Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
Date: Wed, 23 Oct 2002 00:31:34 +0200 [thread overview]
Message-ID: <20021022223134.GA6028@ice.gerd-stolpmann.de> (raw)
In-Reply-To: <200210222116.XAA05792@pauillac.inria.fr>; from pierre.weis@inria.fr on Die, Okt 22, 2002 at 23:16:19 +0200
Am 2002.10.22 23:16 schrieb(en) Pierre Weis:
> > Version Francaise a la fin
> > ------------------------------
> >
> > Hello,
> >
> > I'm writting an ocamllex/ocamlyacc based application that extracts a <string
> > list> of emails embedded in a text/html file.
> > Would anyone of you know of any available implementation I could get
> > inspiration from (and save some time!).
>
> Really precise parsing of email messages requires implementing the
> RFC822 (more precisely RFC2822 nowadays), which is not a trivial
> task. I started to do it but gave up due to the absence of a scanf
> facility. I launched a thread to implement scanf, and 5 years after I
> understood how to do it in the Caml system!
>
> Now that we have scanf, I could go on to implement RFC(2)822.
>
> But don't hold your breath: if you don't need a full parser for mail
> messages the simpler way is to write a (false but trivial)
> approximation with a lexer...
>
> There may be such a program into Xaviers's spamoracle ?
Well, O'caml programming is so much fun that everybody wants to
reinvent the wheel. I really understand that, I'm also tempted
every day.
My wheel came into the world in the spring of 2000, and has grown
since that a lot. It is now called "ocamlnet" after the fusion
with Patrick Doane's wheel, and includes not only a parser for RFC(2)822
messages, but supports also the MIME RFCs (2045-47), RFC 2231,
parsing of dates, the ability to parse from pipelines chunk by
chunk, and last but not least even printers for these (partly
brain-dead) formats. You also find an HTML parser, and a lot of
other useful stuff. It is now more a mobile construction set than
a wheel.
By the way: if anybody has something to contribute, any addition
that is useful, works, and will be maintained is still accepted.
You find it here:
http://sourceforge.net/projects/ocamlnet
Gerd
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2002-10-22 22:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-22 14:27 Vincent Leleu
2002-10-22 21:16 ` Pierre Weis
2002-10-22 22:31 ` Gerd Stolpmann [this message]
2002-10-22 22:43 ` Stefano Zacchiroli
2002-10-22 23:29 ` Gerd Stolpmann
2002-10-23 7:32 ` Stefano Zacchiroli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021022223134.GA6028@ice.gerd-stolpmann.de \
--to=info@gerd-stolpmann.de \
--cc=caml-list@inria.fr \
--cc=pierre.weis@inria.fr \
--cc=vincent@leleu.info \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox