From: Mike Lin <nilekim@gmail.com>
To: Erick Matsen <matsen@berkeley.edu>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] speeding up matrix multiplication (newbie question)
Date: Fri, 20 Feb 2009 17:23:15 -0500 [thread overview]
Message-ID: <2a1a1a0c0902201423o5ae7dd3dhb5c64da3c3255a69@mail.gmail.com> (raw)
In-Reply-To: <243054520902201153q4efbe1f0j686bd7212b515c03@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5438 bytes --]
Erick, we should compare notes sometime. I have a lot of code for doing this
kind of stuff (I am working on empirical codon models with 61x61 rate
matrices). The right way to speed up matrix-vector operations is to use BLAS
via either Lacaml or ocamlgsl. But if, like me, you like to
counterproductively fiddle around with stupid things, here's a little ocaml
library I wrote that links to a C function to do vector-vector dot using
SSE2 vectorization.
http://www.broad.mit.edu/~mlin/for_others/SuperFast_dot.zip
IIRC in my tests it gave about 50% speedup vs. the obvious C code and 100%
speedup vs. the obvious ocaml code. But, I am doing 61x61 and I'm not sure
if the speedup scales down to 20x20 or especially 4x4.
Mike
On Fri, Feb 20, 2009 at 2:53 PM, Erick Matsen <matsen@berkeley.edu> wrote:
> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
>
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
>
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
>
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
>
> Outer loop:
> modify the amount of time each markov process runs
> exponentiate the rate matrices to get transition matrices
>
> Recur over the tree, starting at the leaves:
> at a node, multiply all of the daughter likelihood vectors together
> return the multiplication of that product by the trasition matrix
> (bottleneck!)
>
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
>
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
>
> So far I have been doing the simplest thing possible, which is just to
> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> embarassing.
>
> let mul_vec m v =
> if Array.length v <> n_cols m then
> failwith "mul_vec: matrix size and vector size don't match!";
> let result = Array.create (n_rows m) N.zero in
> for i=0 to (n_rows m)-1 do
> for j=0 to (n_cols m)-1 do
> result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
> done;
> done;
> result
>
> I have implemented it in a functorial way for flexibility. N is the
> number class. How much improvement might I hope for if I make a
> dedicated float vector multiplication function? I'm sorry, I know
> nothing about "boxing." Where can I read about that?
>
>
> Thank you again,
>
> Erick
>
>
>
> On Fri, Feb 20, 2009 at 10:46 AM, Xavier Leroy <Xavier.Leroy@inria.fr>
> wrote:
> >> I'm working on speeding up some code, and I wanted to check with
> >> someone before implementation.
> >>
> >> As you can see below, the code primarily spends its time multiplying
> >> relatively small matrices. Precision is of course important but not
> >> an incredibly crucial issue, as the most important thing is relative
> >> comparison between things which *should* be pretty different.
> >
> > You need to post your matrix multiplication code so that the regulars
> > on this list can tear it to pieces :-)
> >
> > From the profile you gave, it looks like you parameterized your matrix
> > multiplication code over the + and * operations over matrix elements.
> > This is good for genericity but not so good for performance, as it
> > will result in more boxing (heap allocation) of floating-point values.
> > The first thing you should try is write a version of matrix
> > multiplication that is specialized for type "float".
> >
> > Then, there are several ways to write the textbook matrix
> > multiplication algorithm, some of which perform less boxing than
> > others. Again, post your code and we'll let you know.
> >
> >> Currently I'm just using native (double-precision) ocaml floats and
> >> the native ocaml arrays for a first pass on the problem. Now I'm
> >> thinking about moving to using float32 bigarrays, and I'm hoping
> >> that the code will double in speed. I'd like to know: is that
> >> realistic? Any other suggestions?
> >
> > It won't double in speed: arithmetic operations will take exactly the
> > same time in single or double precision. What single-precision
> > bigarrays buy you is halving the memory footprint of your matrices.
> > That could result in better cache behavior and therefore slightly
> > better speed, but it depends very much on the sizes and number of your
> > matrices.
> >
> > - Xavier Leroy
> >
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
[-- Attachment #2: Type: text/html, Size: 6990 bytes --]
next prev parent reply other threads:[~2009-02-20 22:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-20 15:40 Erick Matsen
2009-02-20 15:45 ` [Caml-list] " RABIH.ELCHAAR
2009-02-20 17:46 ` Jon Harrop
2009-02-20 18:46 ` Xavier Leroy
2009-02-20 19:53 ` Erick Matsen
2009-02-20 21:21 ` Will M. Farr
2009-02-20 21:37 ` Martin Jambon
2009-02-20 22:23 ` Mike Lin [this message]
2009-02-20 22:30 ` Will M. Farr
2009-02-20 22:43 ` Markus Mottl
2009-02-23 22:59 ` Erick Matsen
2009-02-20 22:43 ` Mike Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a1a1a0c0902201423o5ae7dd3dhb5c64da3c3255a69@mail.gmail.com \
--to=nilekim@gmail.com \
--cc=caml-list@inria.fr \
--cc=matsen@berkeley.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox