* [Caml-list] Query: email parser in ocamllex/ocamlyacc @ 2002-10-22 14:27 Vincent Leleu 2002-10-22 21:16 ` Pierre Weis 0 siblings, 1 reply; 6+ messages in thread From: Vincent Leleu @ 2002-10-22 14:27 UTC (permalink / raw) To: caml-list Version Francaise a la fin ------------------------------ Hello, I'm writting an ocamllex/ocamlyacc based application that extracts a <string list> of emails embedded in a text/html file. Would anyone of you know of any available implementation I could get inspiration from (and save some time!). Thanks a lot, Vincent Leleu ------------------------------- Bonjour, Je suis en train d'ecrire une application basee sur ocamllex/ocamlyacc. L'application est destinee a extraire les emails (vers une structure <string list>) contenus dans un texte ou document html. Quelqu'un sait-il si une implementation de ceci existe deja afin que je puisse m'en inspirer (et economiser mon temps!). D'avance merci, Vincent Leleu ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc 2002-10-22 14:27 [Caml-list] Query: email parser in ocamllex/ocamlyacc Vincent Leleu @ 2002-10-22 21:16 ` Pierre Weis 2002-10-22 22:31 ` Gerd Stolpmann 0 siblings, 1 reply; 6+ messages in thread From: Pierre Weis @ 2002-10-22 21:16 UTC (permalink / raw) To: vincent; +Cc: caml-list > Version Francaise a la fin > ------------------------------ > > Hello, > > I'm writting an ocamllex/ocamlyacc based application that extracts a <string > list> of emails embedded in a text/html file. > Would anyone of you know of any available implementation I could get > inspiration from (and save some time!). Really precise parsing of email messages requires implementing the RFC822 (more precisely RFC2822 nowadays), which is not a trivial task. I started to do it but gave up due to the absence of a scanf facility. I launched a thread to implement scanf, and 5 years after I understood how to do it in the Caml system! Now that we have scanf, I could go on to implement RFC(2)822. But don't hold your breath: if you don't need a full parser for mail messages the simpler way is to write a (false but trivial) approximation with a lexer... There may be such a program into Xaviers's spamoracle ? Best regards, Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/ > ------------------------------- > > Bonjour, > > Je suis en train d'ecrire une application basee sur ocamllex/ocamlyacc. > L'application est destinee a extraire les emails (vers une structure <string > list>) contenus dans un texte ou document html. > > Quelqu'un sait-il si une implementation de ceci existe deja afin que je > puisse m'en inspirer (et economiser mon temps!). L'analyse syntaxique précise des messages électroniques nécessite l'implémentation de la RFC822 (plus précisément la RFC2822 maintenant), ce qui n'est pas trivial. J'ai essayé une fois mais j'ai arrêté à cause de l'absence d'une fonction scanf. J'ai alors lancé une sous-tâche: implémenter scanf, et 5 ans après j'ai enfin compris comment le faire en Caml! Maintenant que nous avons scanf, je devrais revenir d'interruption et me remettre à implémenter la RFC(2)822. Mais n'attendez pas une distribution rapide: si vous n'avez pas besoin d'un analyseur très précis le plus simple est d'en écrire une approximation (fausse mais triviale) à l'aide d'un lexeur... Il y a sans doute un tel programme dans le filtre spamoracle de Xavier... Cordialement, Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc 2002-10-22 21:16 ` Pierre Weis @ 2002-10-22 22:31 ` Gerd Stolpmann 2002-10-22 22:43 ` Stefano Zacchiroli 0 siblings, 1 reply; 6+ messages in thread From: Gerd Stolpmann @ 2002-10-22 22:31 UTC (permalink / raw) To: Pierre Weis; +Cc: vincent, caml-list Am 2002.10.22 23:16 schrieb(en) Pierre Weis: > > Version Francaise a la fin > > ------------------------------ > > > > Hello, > > > > I'm writting an ocamllex/ocamlyacc based application that extracts a <string > > list> of emails embedded in a text/html file. > > Would anyone of you know of any available implementation I could get > > inspiration from (and save some time!). > > Really precise parsing of email messages requires implementing the > RFC822 (more precisely RFC2822 nowadays), which is not a trivial > task. I started to do it but gave up due to the absence of a scanf > facility. I launched a thread to implement scanf, and 5 years after I > understood how to do it in the Caml system! > > Now that we have scanf, I could go on to implement RFC(2)822. > > But don't hold your breath: if you don't need a full parser for mail > messages the simpler way is to write a (false but trivial) > approximation with a lexer... > > There may be such a program into Xaviers's spamoracle ? Well, O'caml programming is so much fun that everybody wants to reinvent the wheel. I really understand that, I'm also tempted every day. My wheel came into the world in the spring of 2000, and has grown since that a lot. It is now called "ocamlnet" after the fusion with Patrick Doane's wheel, and includes not only a parser for RFC(2)822 messages, but supports also the MIME RFCs (2045-47), RFC 2231, parsing of dates, the ability to parse from pipelines chunk by chunk, and last but not least even printers for these (partly brain-dead) formats. You also find an HTML parser, and a lot of other useful stuff. It is now more a mobile construction set than a wheel. By the way: if anybody has something to contribute, any addition that is useful, works, and will be maintained is still accepted. You find it here: http://sourceforge.net/projects/ocamlnet Gerd ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc 2002-10-22 22:31 ` Gerd Stolpmann @ 2002-10-22 22:43 ` Stefano Zacchiroli 2002-10-22 23:29 ` Gerd Stolpmann 0 siblings, 1 reply; 6+ messages in thread From: Stefano Zacchiroli @ 2002-10-22 22:43 UTC (permalink / raw) To: caml-list On Wed, Oct 23, 2002 at 12:31:34AM +0200, Gerd Stolpmann wrote: > My wheel came into the world in the spring of 2000, and has grown > since that a lot. It is now called "ocamlnet" after the fusion BTW, ocamlnet IMO is lacking documentation. All .mli are really well commented but there is no out-of-band documentation like the really goog pxp manual or examples for the various ocamlnet modules. Are you planning to write something like that? TIA, Cheers. -- Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro "I know you believe you understood what you think I said, but I am not sure you realize that what you heard is not what I meant!" -- G.Romney ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc 2002-10-22 22:43 ` Stefano Zacchiroli @ 2002-10-22 23:29 ` Gerd Stolpmann 2002-10-23 7:32 ` Stefano Zacchiroli 0 siblings, 1 reply; 6+ messages in thread From: Gerd Stolpmann @ 2002-10-22 23:29 UTC (permalink / raw) To: Stefano Zacchiroli; +Cc: caml-list Am 2002.10.23 00:43 schrieb(en) Stefano Zacchiroli: > On Wed, Oct 23, 2002 at 12:31:34AM +0200, Gerd Stolpmann wrote: > > My wheel came into the world in the spring of 2000, and has grown > > since that a lot. It is now called "ocamlnet" after the fusion > > BTW, ocamlnet IMO is lacking documentation. > All .mli are really well commented but there is no out-of-band > documentation like the really goog pxp manual or examples for the > various ocamlnet modules. > > Are you planning to write something like that? Yes, a manual is really needed. I have currently not enough time to do it. Maybe I find time for certain special themes... I could imagine an introduction to netchannels with some references to examples would already do most of the job. Gerd ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc 2002-10-22 23:29 ` Gerd Stolpmann @ 2002-10-23 7:32 ` Stefano Zacchiroli 0 siblings, 0 replies; 6+ messages in thread From: Stefano Zacchiroli @ 2002-10-23 7:32 UTC (permalink / raw) To: caml-list On Wed, Oct 23, 2002 at 01:29:02AM +0200, Gerd Stolpmann wrote: > Yes, a manual is really needed. I have currently not enough time > to do it. Maybe I find time for certain special themes... I could > imagine an introduction to netchannels with some references to > examples would already do most of the job. Yes this would surely be good, but I'm also thinking about an introduction to the CGI module with some examples. This can be helpful in improving ocaml visibility on the server side scripting world. Cheers. -- Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro "I know you believe you understood what you think I said, but I am not sure you realize that what you heard is not what I meant!" -- G.Romney ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-10-23 7:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-10-22 14:27 [Caml-list] Query: email parser in ocamllex/ocamlyacc Vincent Leleu 2002-10-22 21:16 ` Pierre Weis 2002-10-22 22:31 ` Gerd Stolpmann 2002-10-22 22:43 ` Stefano Zacchiroli 2002-10-22 23:29 ` Gerd Stolpmann 2002-10-23 7:32 ` Stefano Zacchiroli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox