From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Optimising: div and mod on AMD64
Date: Wed, 30 Nov 2005 04:29:56 +0000 [thread overview]
Message-ID: <200511300429.56862.jon@ffconsultancy.com> (raw)
I've spent a little time playing around with my 19-line Sudoku program:
http://www.ffconsultancy.com/free/sudoku
Algorithmic optimisations are clearly the way forward from my brute force
approach. However, I am curious about low-level aspects of the performance of
the existing implementation in various ways.
Note that 85-90% of the time is spent in the "invalid" function:
let rec invalid ?(i=0) x y n =
i<9 && (m.(y).[i] = n || m.(i).[x] = n ||
m.(y/3*3 + i/3).[x/3*3 + i mod 3] = n || invalid ~i:(i+1) x y n)
This is a simple recursive function. You can improve its performance a bit by
getting rid of the optional argument:
let rec invalid i x y n =
i<9 && (m.(y).[i] = n || m.(i).[x] = n ||
m.(y/3*3 + i/3).[x/3*3 + i mod 3] = n || invalid (i+1) x y n)
The C++ equivalent of this function (including bounds checking) is:
bool invalid(int i, int x, int y, int n) {
return i<9 && (m.at(y).at(i) == n || m.at(i).at(x) == n ||
m.at(y/3*3 + i/3).at(x/3*3 + i%3) == n ||
invalid(i+1, x, y, n));
}
When switching from 32-bit x86 to 64-bit AMD64, the C++ becomes slightly
faster but the OCaml becomes much slower, to the extent that the OCaml is
4.8x slower than the C++ on AMD64:
Platform x86 (900MHz) AMD64 (800MHz)
OCaml 3.716 6.209
C++ 1.351 1.284
After quite a bit of playing I found out some interesting things. Allocation
is very slow, but the "invalid" function does no allocation and incurs no GC.
Recursion can be slow. Unrolling the invalid function makes little difference
to performance. Removing bounds checking only improves performance by 6% on
both platforms.
To my surprise, it seems that the vast majority of the time in the OCaml
implementation is spent computing x/3*3, i/3 and i mod 3. This is surprising
because these operations are "normally" very fast, e.g. in the C++.
Getting rid of integer div and mod in the "invalid" function by hoisting the
computation of x/3*3 out of "invalid" and then out of the fold, and recursing
over ix=[0,3) and iy=[0,3) rather than i=[0,9) gives:
let rec invalid ix iy fx fy x y n =
if ix=3 then invalid 0 (iy+1) fx fy x y n else
iy<3 && (m.(y).[ix + 3*iy] = n || m.(ix + 3*iy).[x] = n ||
m.(fy + iy).[fx + ix] = n || invalid (ix+1) iy fx fy x y n)
Greatly improves performance, particularly on AMD64:
OCaml 1.946 1.308
Hoisting x/3*3 out of the fold (which isn't even the inner loop!) in the
"search" function almost doubles the performance of the whole program:
let fx = x/3*3 and fy = y/3*3 in
fold (fun accu n ->
let n = Char.chr (n + 48) in
if invalid 0 0 fx fy x y n then accu else
(m.(y).[x] <- n;
let accu = search (x+1) y f accu in
m.(y).[x] <- '0';
accu)) accu 1 10
From my mediocre knowledge of x86 and AMD64 assembler, it looks as though
ocamlopt is generating naive integer divisions and modulos, even when the
divisors are small, constant integers. In contrast, g++ is converting these
operations into multiplications by constants, shifts and
addition/subtraction.
I am surprised that this makes such a big difference but, assuming I'm right,
may we have optimised integer arithmetic for constant divisors please?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
reply other threads:[~2005-11-30 4:35 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200511300429.56862.jon@ffconsultancy.com \
--to=jon@ffconsultancy.com \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox