From: Siegfried Gonzi <siegfried.gonzi@kfunigraz.ac.at>
To: caml-list@inria.fr
Subject: [Caml-list] Some clarifications to the language shootout page
Date: Mon, 25 Nov 2002 10:01:01 +0100 [thread overview]
Message-ID: <3DE1E6CD.2020906@kfunigraz.ac.at> (raw)
Introduction
------------
The language shootout page has some nice examples where Bigloo and OCaml
stacked up well when compared to C. But I think this is only valid for
the micro benchmarks. Take the matrix-matrix multiplication (MM) for
example. Though, rarely ever will you be in dire need for a
matrix-matrix multiplication, because there exists better methods for
solving problems in linear algebra. However, the matrix-matrix
multiplication is useful to check some facts.
Method
------
I downloaded the codes for the MM from the language shootout page and
adapted the examples as follows: The overall code scaffolding has been
left as it was with the original version on the language shootout page:
a) The dimension is increased to 512x512
b) The run is just performed for one MM
c) The arrays are double precision
Results and Discussion
-----------------------
Though, the published MM codes at the language shootout page do not use
any sophisticated constructs like loop unrolling or the like, they can
nevertheless be used because they are consistent within their specific
programming language namely C, C++, Scheme (Bigloo), OCaml, Python and
Fortran 90/95. However, I have assumed that the posted codes there are
posted by masters of their specific language (not every code has been
written by the owner of the language shootout page).
The code can be found at the end of this post. It is easy to copy and
paste it for own tests: memory has been observed via "top":
compile
real/usr/sys memory
--------------------------------------------------------------
bigloo -Obench bench.scm 22s/22s/0.1s 6%
bigloo -Obench bench.scm 1m30s/1m30s/0.3s 30%
ocamlopt bench.ml 38s/38s/0.1s 3%
g++ -O6 bench.c 8s/8s/0.1s 3%
g++ -O6 bench.cpp 8.5s/8.5s/0.1s 5%
python bench.py 4m16s/4m16s/0.1s 6%
ifc -O3 bench.f90 13s/13s/0.04s 3%
[suffixes: scm...Scheme (Bigloo); ml...OCaml; c...gnu C; cpp...gnu C++;
py...Python; f90...Fortran 90/95]
[Machine: laptop Celeron 1000MHz, RAM 256MB, SuSE Linux 8.0, gcc, bigloo
2.5a, ocaml 3.04, Python 2.2, Intel Fortran 90/95]
The second Bigloo version uses ordinary "+" and "-" operators. The first
version uses Bigloo's operators "+fl", "+fx",... It is incomprehensible
why the second ordinary Bigloo version uses that much on memory! For a
dimension of 1024x1024 the overall timing picture is the same, except
that the normal Bigloo version consumes 80% of the memory as compared to
the C version which consumes 20%.
Especially important is the fact that the Python version stacks up very
well; you may not forget Python is just interpreted and not compiled.
The elegant C++ version is of interest too, because it relies on the
vector container and it is possible to completely avoid the awful "*".
Some note to the Fortran version: Fortran starts at 1 (C starts at 0)
when indexing arrays; and the first index runs faster (in C the second
index runs faster). Reading from left to right is more natural (index
wise), so using C arrays is way more pleasant. Bigloo uses C's array
indexing scheme. I hate Fortran for this array indexing nightmare and
hope I didn't mix things up because I was a little bit surprised that C
is faster than Fortran 90. The Fortran 90/95 compiler stems from Intel
and is the one and only free Fortran 90/95 compiler available for Linux (go to the
Intel site and download the 50MB if you need a Fortran 90/95 compiler).
Nowadays times are over were Fortran is that much faster than C/C++ on
ordinary home PCs. Computing in Science and Engineering had once a nice
article (Theurich, G., et al., 2001: "Making the Fortran-To-C
Transition: How Painful is it Really?", January/February 2001, p.21)
about translating a 20,000 lines of code program (in the field of
chemistry) from Fortran 77 to C. Everybody would have excpected that the
C version runs slower than the original Fortran version, but it turned
out that the C version runs even slightly faster than the Fortran version.
Before you scream:
a) I am sure there exists an ocaml option for further optimizations;
"ocamlopt --help" did not unveil it.
b) The Bigloo (Scheme) version could surely be improved but how? /I mean
not the code scheme itslef, because this should remain consistent as
compared to the other code versions./ Fore example transposing one array
is not allowed, because this would not only reduce Bigloo's execution
time but also the time of the C version would be reduced by a factor
of 2. Surely, there exists the caveat: But mum told me, we should
not use the same algorithms for every programming language. But mum
didn't tell that often the reality does not let you do so. And in
addition mum did not tell you that you are not a Forth programmer.
c) I post it here for soliciting comments. I could have gone to bed to
construe my own explanations and theories. But this would likely result
in a desaster, because most of the time it isn't how I would like that
it is.
The original language shootout page results on the MM are misleading:
a) Pure integers are hardly ever used in reality
b) Problems often do not fit into the cache
c) A comparsion of the FFT would be more interesting
Conclusion
----------
Nevertheless Bigloo remains still a very decent compiler in my opinion,
and I do not regret that I have made the transition from Python (which
is one of the most confusing programming languages out there) to Scheme
speak Bigloo.
People should be more wary that consulting the language shootout page
can result in misleading conclusions; as you know: benchmarking is black
art.
S. Gonzi
APPENDIX:
===============================
Normal Bigloo version
original v.: (C) M. Serrano
usage: bigloo -Obench bench.scm
time ./a.out
===============================
(module bench-matrix)
(define (mkmatrix rows cols)
(let ((mx (make-vector rows 0.0))
(count 1.0))
(do ((i 0 (+ i 1)))
((= i rows))
(let ((row (make-vector cols 0.0)))
(do ((j 0 (+ j 1)))
((= j cols))
(vector-set! row j count)
(set! count (+ count 1.0)))
(vector-set! mx i row)))
mx))
(define (num-cols mx)
(let ((row (vector-ref mx 0)))
(vector-length row)))
(define (num-rows mx)
(vector-length mx))
(define (mmult rows cols m1 m2)
(let ((m3 (make-vector rows 0.0)))
(do ((i 0 (+ 1 i)))
((= i rows))
(let ((m1i (vector-ref m1 i))
(row (make-vector cols 0.0)))
(do ((j 0 (+ 1 j)))
((= j cols))
(let ((val 0.0))
(do ((k 0 (+ 1 k)))
((= k cols))
(set! val (+ val (* (vector-ref m1i k)
(vector-ref (vector-ref m2 k) j)))))
(vector-set! row j val)))
(vector-set! m3 i row)))
m3))
(define (do-main size)
(let ((mm 0.0)
(m1 (mkmatrix size size))
(m2 (mkmatrix size size)))
(set! mm (mmult size size m1 m2))
(let ((r0 (vector-ref mm 0))
(r2 (vector-ref mm 2))
(r3 (vector-ref mm 3))
(r4 (vector-ref mm 4)))
(print (vector-ref r0 0) " " (vector-ref r2 3) " "
(vector-ref r3 2) " " (vector-ref r4 4)))))
(do-main 512)
====
===============================
Bigloo version based on +fl,...
original v.: (C) M. Serrano
usage: bigloo -Obench bench.scm
time ./a.out
===============================
(module bench-matrix
(option (set! *genericity* #f)))
(define (mkmatrix rows cols)
(let ((mx (make-vector rows 0.0))
(count 1.0))
(do ((i 0 (+fx i 1)))
((=fx i rows))
(let ((row (make-vector cols 0.0)))
(do ((j 0 (+fx j 1)))
((=fx j cols))
(vector-set! row j count)
(set! count (+fl count 1.0)))
(vector-set! mx i row)))
mx))
(define (num-cols mx)
(let ((row (vector-ref mx 0)))
(vector-length row)))
(define (num-rows mx)
(vector-length mx))
(define (mmult rows cols m1 m2)
(let ((m3 (make-vector rows 0.0)))
(do ((i 0 (+fx 1 i)))
((=fx i rows))
(let ((m1i (vector-ref m1 i))
(row (make-vector cols 0.0)))
(do ((j 0 (+fx 1 j)))
((=fx j cols))
(let ((val 0.0))
(do ((k 0 (+fx 1 k)))
((=fx k cols))
(set! val (+fl val (*fl (vector-ref m1i k)
(vector-ref (vector-ref m2 k) j)))))
(vector-set! row j val)))
(vector-set! m3 i row)))
m3))
(define (do-main size)
(let ((mm 0.0)
(m1 (mkmatrix size size))
(m2 (mkmatrix size size)))
(set! mm (mmult size size m1 m2))
(let ((r0 (vector-ref mm 0))
(r2 (vector-ref mm 2))
(r3 (vector-ref mm 3))
(r4 (vector-ref mm 4)))
(print (vector-ref r0 0) " " (vector-ref r2 3) " "
(vector-ref r3 2) " " (vector-ref r4 4)))))
(do-main 512)
====
=========================
Ocaml version
original v.: (C) M. Mottl
usage: ocamlopt bench.ml
time ./a.out
=========================
let mkmatrix rows cols =
let count = ref 1 and last_col = cols - 1
and m = Array.make_matrix rows cols 0.0 in
for i = 0 to rows - 1 do
let mi = m.(i) in
for j = 0 to last_col do
mi.(j) <- float_of_int(!count);
incr count done;
done;
m
let rec inner_loop k v m1i m2 j =
if k < 0 then v
else inner_loop (k - 1) (v +. m1i.(k) *. m2.(k).(j)) m1i m2 j
let mmult rows cols m1 m2 m3 =
let last_col = cols - 1 and last_row = rows - 1 in
for i = 0 to last_row do
let m1i = m1.(i) and m3i = m3.(i) in
for j = 0 to last_col do m3i.(j) <- inner_loop last_row 0.0 m1i m2 j
done;
done
let size = 512
let m1 = mkmatrix size size and m2 = mkmatrix size size
let m3 = Array.make_matrix size size 0.0;;
mmult size size m1 m2 m3;
Printf.printf "%f %f %f %f\n" m3.(0).(0) m3.(2).(3) m3.(3).(2) m3.(4).(4)
====
=======================
C version
usage: g++ -O6 bench.c
time ./a.out
=======================
#include <iostream>
#include <stdlib.h>
using namespace std;
double **mkmatrix(int rows, int cols) {
int i, j, count = 1;
double **m = (double **) malloc(rows * sizeof(double *));
for (i=0; i<rows; i++) {
m[i] = (double *) malloc(cols * sizeof(double));
for (j=0; j<cols; j++) {
m[i][j] = count++;
}
}
return(m);
}
void zeromatrix(int rows, int cols, double **m) {
int i, j;
for (i=0; i<rows; i++)
for (j=0; j<cols; j++)
m[i][j] = 0;
}
void freematrix(int rows, double **m) {
while (--rows > -1) { free(m[rows]); }
free(m);
}
double **mmult(int rows, int cols, double **m1, double **m2, double **m3) {
int i, j, k;
double val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = double(0.0);
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
int main()
{
int i, SIZE=512;
double **m1 = mkmatrix(SIZE, SIZE);
double **m2 = mkmatrix(SIZE, SIZE);
double **mm = mkmatrix(SIZE, SIZE);
mm = mmult(SIZE, SIZE, m1, m2, mm);
cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " <<
mm[4][4] << endl;
freematrix(SIZE, m1);
freematrix(SIZE, m2);
freematrix(SIZE, mm);
return(0);
}
====
=========================
C++ version
(C) S. Gonzi
usage: g++ -O6 bench.cpp
time ./a.out
=========================
#include <iostream>
#include <vector>
using namespace std;
vector<vector<double> > mkmatrix(int rows, int cols)
{
double counter=double(0.0);
vector<vector<double> > m;
for(int i=0; i<rows; i++)
{
vector<double> row_vec;
for(int j=0; j<cols; j++)
{
counter = counter + 1.0;
row_vec.push_back(counter);
}
m.push_back(row_vec);
}
return(m);
}
vector<vector<double> > mmult(int rows, int cols,
vector<vector<double> > m1,
vector<vector<double> > m2,
vector<vector<double> > m3)
{
double val;
for(int i=0; i<rows; i++)
{
for(int j=0; j<rows; j++)
{
val = double(0.0);
for(int k=0; k<cols; k++)
{
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
int main()
{
int SIZE=512;
vector<vector<double> > m1 = mkmatrix(SIZE,SIZE);
vector<vector<double> > m2 = mkmatrix(SIZE,SIZE);
vector<vector<double> > mm = mkmatrix(SIZE,SIZE);
mm = mmult(SIZE,SIZE,m1,m2,mm);
cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " <<
mm[4][4] << endl;
return(0);
}
====
=============================
Python version
usage: time python bench.py
=============================
def mkmatrix(rows, cols):
count = 1.0
mx = [ None ] * rows
for i in xrange(rows):
mx[i] = [0] * cols
for j in xrange(cols):
mx[i][j] = count
count += 1.0
return mx
def mmult(rows, cols, m1, m2):
m3 = [ None ] * rows
for i in xrange( rows ):
m3[i] = [0] * cols
for j in xrange( cols ):
val = 0.0
for k in xrange( cols ):
val += m1[i][k] * m2[k][j]
m3[i][j] = val
return m3
size = 512
def main():
m1 = mkmatrix(size, size)
m2 = mkmatrix(size, size)
mm = mmult(size, size, m1, m2)
print mm[0][0], mm[2][3], mm[3][2], mm[4][4]
main()
====
=============================
Intel Fortran 90/95 version
(C) Siegfried Gonzi
usage: ifc -O3 bench.f90
timer ./a.out
=============================
program main
integer:: size=512
double precision, POINTER,dimension(:,:):: m1, m2, mm
call mkmatrix(size,size,m1)
call mkmatrix(size,size,m2)
call mkmatrix(size,size,mm)
call mmult(size,size,m1,m2,mm)
print *, mm(1,1)
deallocate(m1)
deallocate(m2)
deallocate(mm)
contains
subroutine mkmatrix(rows, cols,m)
integer, intent(in):: rows, cols
double precision, POINTER, dimension(:,:):: m
double precision:: counter
integer:: i,j
allocate(m(rows,cols))
counter = 1.0
do i=1,cols
do j=1,rows
counter = counter + 1.0
m(j,i) = counter
end do
end do
end subroutine mkmatrix
subroutine mmult(rows, cols, m1, m2, m3)
integer, intent(in):: rows, cols
double precision, dimension(rows,cols), intent(in):: m1, m2
double precision, dimension(rows,cols), intent(in out):: m3
double precision:: val
integer:: i,j,k
do i=1,rows
do j=1,rows
val = 0.0
do k=1,cols
val = val + m1(k,i)*m2(j,k)
end do
m3(j,i) = val
end do
end do
end subroutine mmult
end program main
====
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next reply other threads:[~2002-11-25 8:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-25 9:01 Siegfried Gonzi [this message]
2002-11-25 10:23 ` Pierre Weis
2002-11-25 13:03 ` Siegfried Gonzi
2002-11-25 11:09 ` Siegfried Gonzi
2002-11-26 9:43 ` Siegfried Gonzi
2002-11-26 13:22 ` Pierre Weis
2002-11-27 20:50 isaac gouy
2002-11-29 8:32 ` Siegfried Gonzi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DE1E6CD.2020906@kfunigraz.ac.at \
--to=siegfried.gonzi@kfunigraz.ac.at \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox