From: "Andreas Biegert" <andreas.biegert@googlemail.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Package with multidimensional AND sparse matrices?
Date: Tue, 4 Jul 2006 09:46:33 +0200 [thread overview]
Message-ID: <fd0847170607040046q6c96c8e9w2d0a494948bb40db@mail.gmail.com> (raw)
In-Reply-To: <fd0847170607040038u117d2594r38adcbdf1f00ec04@mail.gmail.com>
Well, in short: the algorithm computes the optimal multiple alignment
of up to n protein sequences with maximum length L. The
straightforward and theoretically optimal approach to solve this
problem is by dynamic programming, therefore the time and space
complexity is O(L^n). I managed to come up with a pretty good
heuristic that lets me boil down the number of cells that need to be
computed giving me a complexitiy of O(L^2) instead of O(L^n), in fact
I can even manually adjust the number of nonzero entries in the
matrix. In the 2000*2000*2000*2000 matrix only at most 2000 cells will
be non-zero. Furthermore for all non-zero cells M(i,j,k,l) holds
i<j<k<l. The bad news is that performance is important since the
program has to run under 5min. An estimation of the time requirements
given the number of sequences n and masimum lengths L is:
24 * L^2 * T_x * 2^(n+1) - 1
with T_x being about 100ns
For n=4 and L=2000 I get just about 5min. Would it be a bad idea to
use Hashtables all the way instead of matrices? Accessing elements in
a hashtable has O(1), right?
Andreas
2006/7/4, Andreas Biegert <andreas.biegert@googlemail.com>:
> Well, in short: the algorithm computes the optimal multiple alignment
> of up to n protein sequences with maximum length L. The
> straightforward and theoretically optimal approach to solve this
> problem is by dynamic programming, therefore the time and space
> complexity is O(L^n). I managed to come up with a pretty good
> heuristic that lets me boil down the number of cells that need to be
> computed giving me a complexitiy of O(L^2) instead of O(L^n), in fact
> I can even manually adjust the number of nonzero entries in the
> matrix. In the 2000*2000*2000*2000 matrix only at most 2000 cells will
> be non-zero. Furthermore for all non-zero cells M(i,j,k,l) holds
> i<j<k<l. The bad news is that performance is important since the
> program has to run under 5min. An estimation of the time requirements
> given the number of sequences n and masimum lengths L is:
>
> 24 * L^2 * T_x * 2^(n+1) - 1
> with T_x being about 100ns
>
> For n=4 and L=2000 I get just about 5min. Would it be a bad idea to
> use Hashtables all the way instead of matrices? Accessing elements in
> a hashtable has O(1), right?
>
> Andreas
>
>
>
> 2000/1/2, Jon Harrop <jon@ffconsultancy.com>:
> > On Monday 03 July 2006 23:55, Andreas Biegert wrote:
> > > The ocamlfloat package provides a sparse matrix implmentation for
> > > 2-dimensional matrices, but I would need at least 4 dimensions.
> >
> > You probably mean "tensor" rather than "matrix".
> >
> > > In case there is no such package, would it be very hard to write one by
> > > myself?
> >
> > Provided performance and memory usage are not too important, it is quite
> > simple to write your own implementation. First, you must decide what
> > requirements you have, i.e. what do you want to do with these tensors? Then,
> > you must choose a data structure to represent your tensors, e.g. a hash table
> > that maps int * int * int * int to float. Finally, you must implement the
> > functions that you need.
> >
> > You could start with something like this:
> >
> > module SparseArray : sig
> > type t
> >
> > val make : int list -> t
> > val get : t -> int list -> float
> > val set : t -> int list -> float -> unit
> > end = struct
> > type t = int list * (int list, float) Hashtbl.t
> >
> > let make n = n, Hashtbl.create 0
> >
> > let check s n i =
> > List.iter2 (fun n i -> if i<0 || i>=n then invalid_arg s) n i
> >
> > let get (n, m) i =
> > check "SparseArray.get" n i;
> > try Hashtbl.find m i with Not_found -> 0.
> >
> > let set (n, m) i x =
> > check "SparseArray.set" n i;
> > if x<>0. then Hashtbl.replace m i x else
> > try Hashtbl.remove m i with Not_found -> ()
> > end;;
> >
> > That should be fine for getting and setting but you probably want to loop over
> > dimensions and perform arithmetic operations, in which case you need a way to
> > jump to the non-zero elements.
> >
> > How many non-zero elements do you have and how long do you expect your
> > calculations to take?
> >
> > --
> > Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> > Objective CAML for Scientists
> > http://www.ffconsultancy.com/products/ocaml_for_scientists
> >
> > _______________________________________________
> > Caml-list mailing list. Subscription management:
> > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> > Archives: http://caml.inria.fr
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
>
next prev parent reply other threads:[~2006-07-04 7:46 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-03 22:55 Andreas Biegert
2000-01-02 4:56 ` [Caml-list] " Jon Harrop
[not found] ` <fd0847170607040038u117d2594r38adcbdf1f00ec04@mail.gmail.com>
2006-07-04 7:46 ` Andreas Biegert [this message]
2006-07-04 9:06 ` Martin Jambon
2006-07-04 10:03 ` Jon Harrop
2006-07-04 16:55 ` Andreas Biegert
2006-07-03 23:59 ` Markus Mottl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd0847170607040046q6c96c8e9w2d0a494948bb40db@mail.gmail.com \
--to=andreas.biegert@googlemail.com \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox