From: Yitzhak Mandelbaum <yitzhakm@CS.Princeton.EDU>
To: Mike Lin <nilekim@gmail.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Parallelized parsing
Date: Mon, 20 Apr 2009 20:52:40 -0400 [thread overview]
Message-ID: <87C7CCB2-26AB-4F87-BF38-5924FA68077C@cs.princeton.edu> (raw)
In-Reply-To: <2a1a1a0c0904201435y12e36603t60fb40fd1a7d8260@mail.gmail.com>
Unfortunately, most forms of parsing are not terribly amenable to
efficient parallelization because of the irregular nature of the
subcomponents of the parsing problem. That is, you can't easily break
up the problem into subcomponents that can be farmed out to different
CPUs. That said, if you've just got CPU's lying around, unused anyhow
and wasting resources isn't that important, then there is plenty of
work on this topic which might be applicable. Some good places to
start are here:
A. Nijholt. Parallel approaches to context-free language parsing.
Chapter 2 in: Parallel Natural Language Processing, U. Hahn and G.
Adriaens (eds.), Ablex Publishing Corporation, Norwood, New Jersey,
1994, 135-167 (ISBN 0.89391.869.5).
and here:
Survey of Parallel Context-Free Parsing Techniques, M. P. van
Lohuizen, 1997.
In case you're really interested in the topic, here's a more complete
list of references with assorted notes (my own). References below are
taken from the above survey.
* **Parallel Natural Language Processing**. A book on parallel
parsing algorithms. By Geert Adriaens, Udo Hahn. [Available in Google
Books](http://books.google.com/books?id=G9-67_mQPnkC). Some relevant
chapters follow
* YO94: nonterminal-per-processor.
Akinori Yonezawa and Ichiro Ohsawa. Object-oriented parallel parsing
for context-free grammers. In Adriaens and Hahn.
* Fan94: "Connectionist" parsing. Good for massively parallel
machines. Survey authors comment that not promising for CF parsing.
Mark Fanty. Context-free parsing in connectionist networks. In
Adriaens and Hahn.
* Sik93b: A Cross-breeding of Tomita and Earley.
Klaas Sikkel. **Parsing Schemata**. PhD thesis. Dept of Computer Science
University of Twente Enschede The Netherlands
* There's a book-chapter preprint on parsing schemata which is
related to the above. There's also a book of the same name, which is a
revised version of the thesis.
* GC88: MIMD shared-memory Earley.
Ralph Grishman, Mahesh Chitrao. Evaluation of a parallel chart parser
([citeseer](http://citeseer.comp.nus.edu.sg/579833.html)
* dV93b: measurements on par. impl. of CYK, Earley and DD.
J.P.M. de Vreught. A practical comparison between parallel tabular
recognizers.
* CF84, Sij86, Tan83: VLSI Earley.
* More VLSI Earley: **A Parallel Parsing VLSI Architecture for
Arbitrary Context Free Grammars**
Andreas Koulouris, Nectarios Koziris, Theodore Andronikos,
George Papakonstantinou, Panayotis Tsanakas. 1998.
* HdV91, IPS91: load balancing approaches to chart parsing.
J. Hoogerbrugge and J.P.M. de Vreught. **Parallel recognizing in
practice**.
Ibarra, Pong, and Sohn, **Parallel recognition and parsing on the
hyper-cube**. IEEE Transactions on Computers, 40(6):764-770, 1991.
* The proposal for Berkeley's new parallel computing center
(sponsored by Intel & MS) mentions the need for parallel parsing of
web pages. I don't know what, if any, progress they've made in that
direction.
* More parallel earley. Includes description of algorithm
(basically, bottom-up earley) together with proofs relating to running
time and communication time. Also reports results on running an
implemenation on a parallel-machine simulator. **A parallel parsing
algorithm for arbitrary context-free grammars**. Dong-Yul Ra and Jong-
Hyun Kim.
* A Static Load-Balancing Scheme for Parallel XML Parsing on
Multicore CPUs. Yinfei Pan, Wei Lu, Ying Zhang , Kenneth Chiu. A paper
on parallel XML parsing. I've seen a few of these. This one is
representative.
* A paper on linear algebra on GPUs (http://www.cs.utexas.edu/users/flame/pubs/sc08.pdf
). Possibly relevant because chart parsing can be implemented as a
form of matrix multiply.
Yitzhak
On Apr 20, 2009, at 5:35 PM, Mike Lin wrote:
> There is certainly a reasonable body of basic CS research on
> parallelizing CFG algorithms such as CYK, the Earley parser, and to a
> lesser extent the more practical LALR strategy used by yacc etc. (In
> the latter case it seems to get easier if you're willing to trade off
> determinism when parsing ambiguous grammars.)
>
> I know some people who use some of this stuff in very specific
> contexts (RNA folding), but I haven't seen any practical
> general-purpose tools like a parallel yacc...
>
> Overall, I don't actually know much more than you could figure out
> from Google Scholar in an hour but hopefully these were some useful
> search terms.
>
> On Mon, Apr 20, 2009 at 5:15 PM, Jon Harrop <jon@ffconsultancy.com>
> wrote:
>>
>> I'm desperately trying to prepare for the imminent drop of a rock-
>> solid
>> multicore-friendly OCaml implementation and was wondering what work
>> has been
>> done on parallelized parsers and/or parallel-friendly grammars?
>>
>> For example, Mathematica syntax for nested lists of integers looks
>> like:
>>
>> {{{1, 2}}, {{3, 4}, {4, 5}}, ..}
>>
>> and there are obvious divide-and-conquer approaches to lexing and
>> parsing that
>> grammar. You can recursively subdivide the string (e.g. memory
>> mapped from a
>> file) to build a tree of where the tokens { , and } appear by index
>> and then
>> recursively convert the tree into an AST.
>>
>> What other grammars can be lexed and/or parsed efficiently in
>> parallel?
>>
>> --
>> Dr Jon Harrop, Flying Frog Consultancy Ltd.
>> http://www.ffconsultancy.com/?e
>>
>> _______________________________________________
>> Caml-list mailing list. Subscription management:
>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>> Archives: http://caml.inria.fr
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-----------------------------
Yitzhak Mandelbaum
next prev parent reply other threads:[~2009-04-21 0:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-20 21:15 Jon Harrop
2009-04-20 21:35 ` [Caml-list] " Mike Lin
2009-04-21 0:52 ` Yitzhak Mandelbaum [this message]
2009-04-21 15:55 ` Jon Harrop
2009-04-21 1:44 ` Polymorphism problem Eliot Handelman
2009-04-21 8:50 ` [Caml-list] " Mauricio Fernandez
2009-04-21 7:19 ` [Caml-list] Parallelized parsing David MENTRE
2009-04-21 16:04 ` Jon Harrop
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87C7CCB2-26AB-4F87-BF38-5924FA68077C@cs.princeton.edu \
--to=yitzhakm@cs.princeton.edu \
--cc=caml-list@inria.fr \
--cc=nilekim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox