Lahey Support
08-15-2003, 01:33 AM
The fact that MATMUL is a slow matrix multiplier was discussed in a
computing class I took a couple of years ago. I had hoped that would have
been fixed by now. The best thing you can do is write your own version of
MATMUL that has the do loops properly ordered to take advantage of memory
caching. The inner loop should run down columns of the matrix. Since
matrices are stored column-major in fortran it is more likely that the data
needed is already in cache if you run down a column of a matrix than if you
run along a row.
As for your example, you are testing a very small matrix multiplication and
using inline code. The overhead of branching to a subprogram is not there
as it is using MATMUL. This is not really a fair test. You should use a
larger matrix with a subprogram of your own to test against MATMUL. MATMUL
was designed for use with large matrices.
Thomas (Tom) W. Laub
Sandia National Laboratories
Dept. 15341, Simulation Technology Research
Mail Stop 1179
Albuquerque, NM 87111-1179
Phone: 505-844-9142
Fax: 505-844-0092
-----Original Message-----
From: W. Schmidt [address removed]
Sent: Friday, September 07, 2001 2:03 AM
To: Lahey-Forum
Subject: [LF] MATMUL-Performance
Dear forum members,
testing the new Fortran90/95 feature MATMUL (with LF90 4.5 and LF95
5.6f,
OS: WIN9X and WIN2000, with and without optimizing) i have found that
the use of MATMUL causes the programs to run remarkable more slowly.
For example, inside a subroutine (often called from a big main prgram)
with the following subset of declarations
REAL(8) :: A(2,2),B(2,2)
REAL(8) :: X(2),Y(2),Z(2)
i have changed the following code
Z(1)=A(1,1)*X(1)+A(1,2)*X(2)-B(1,1)*Y(1)-B(1,2)*Y(2)
Z(2)=A(2,1)*X(1)+A(2,2)*X(2)-B(2,1)*Y(1)-B(2,2)*Y(2)
to the equivalent code
Z=MATMUL(A,X)-MATMUL(B,Y)
The results from the MATMUL-code are exactly the same as from the
explicit code before, but the execution grows with MATMUL by a factor >
6.
In an other test with the same (matrix- and vector-dimensions, the
execution time with MATMUL-code was growing by a factor 3.
Has anyone analogous experiences with other f90/f95 compilers?
Will the MUTMUL-Performance grow by greater matrix- and
vector-dimensions?
How actual are the popular benchmarks in respect to the new Fortran90/95
features?
MfG
W. Schmidt
----------------------------------------------------------
To unsubscribe, send to [address removed] the following
as the first and only line of the message body:
unsubscribe fortran
----------------------------------------------------------
----------------------------------------------------------
To unsubscribe, send to [address removed] the following
as the first and only line of the message body:
unsubscribe fortran
----------------------------------------------------------
computing class I took a couple of years ago. I had hoped that would have
been fixed by now. The best thing you can do is write your own version of
MATMUL that has the do loops properly ordered to take advantage of memory
caching. The inner loop should run down columns of the matrix. Since
matrices are stored column-major in fortran it is more likely that the data
needed is already in cache if you run down a column of a matrix than if you
run along a row.
As for your example, you are testing a very small matrix multiplication and
using inline code. The overhead of branching to a subprogram is not there
as it is using MATMUL. This is not really a fair test. You should use a
larger matrix with a subprogram of your own to test against MATMUL. MATMUL
was designed for use with large matrices.
Thomas (Tom) W. Laub
Sandia National Laboratories
Dept. 15341, Simulation Technology Research
Mail Stop 1179
Albuquerque, NM 87111-1179
Phone: 505-844-9142
Fax: 505-844-0092
-----Original Message-----
From: W. Schmidt [address removed]
Sent: Friday, September 07, 2001 2:03 AM
To: Lahey-Forum
Subject: [LF] MATMUL-Performance
Dear forum members,
testing the new Fortran90/95 feature MATMUL (with LF90 4.5 and LF95
5.6f,
OS: WIN9X and WIN2000, with and without optimizing) i have found that
the use of MATMUL causes the programs to run remarkable more slowly.
For example, inside a subroutine (often called from a big main prgram)
with the following subset of declarations
REAL(8) :: A(2,2),B(2,2)
REAL(8) :: X(2),Y(2),Z(2)
i have changed the following code
Z(1)=A(1,1)*X(1)+A(1,2)*X(2)-B(1,1)*Y(1)-B(1,2)*Y(2)
Z(2)=A(2,1)*X(1)+A(2,2)*X(2)-B(2,1)*Y(1)-B(2,2)*Y(2)
to the equivalent code
Z=MATMUL(A,X)-MATMUL(B,Y)
The results from the MATMUL-code are exactly the same as from the
explicit code before, but the execution grows with MATMUL by a factor >
6.
In an other test with the same (matrix- and vector-dimensions, the
execution time with MATMUL-code was growing by a factor 3.
Has anyone analogous experiences with other f90/f95 compilers?
Will the MUTMUL-Performance grow by greater matrix- and
vector-dimensions?
How actual are the popular benchmarks in respect to the new Fortran90/95
features?
MfG
W. Schmidt
----------------------------------------------------------
To unsubscribe, send to [address removed] the following
as the first and only line of the message body:
unsubscribe fortran
----------------------------------------------------------
----------------------------------------------------------
To unsubscribe, send to [address removed] the following
as the first and only line of the message body:
unsubscribe fortran
----------------------------------------------------------