[petsc-users] performance issue

Sat Mar 10 11:05:45 CST 2012

I am using an explicit time stepper. The matrices are assembled only once,
and then I use the linear operator for example to compute the least stable
eigenmode(s). I attached the output of log_summary for performing the same
number of time steps using the linear and nonlinear operators.

On Sat, Mar 10, 2012 at 5:10 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Sat, Mar 10, 2012 at 09:59, Xavier Garnaud <
> xavier.garnaud at ladhyx.polytechnique.fr> wrote:
>
>> am solving the compressible Navier--Stokes equations in compressible
>> form, so in order to apply the operator, I
>>
>>    1. apply BCs on the flow field
>>    2. compute the flux
>>    3. take the derivative using finite differences
>>    4. apply BCs on the derivatives of the flux
>>
>>
>> In order to apply the linearized operator, I wish to linearize steps 2
>> and 4 (the other are linear). For this I assemble sparse matrices (MPIAIJ).
>> The matrices should be block diagonal -- with square or rectangular blocks
>> --  so I preallocate the whole diagonal blocks (but I only use MatSetValues
>> for nonzero entries). When I do this, the linearized code runs
>> approximately 50% slower (the computation of derivatives takes more that
>> 70% of the time in the non-linear code), so steps 2 and 4 are much slower
>> for the linear operator although the number of operations is very similar.
>> Is this be due to the poor preallocation? Is there a way to improve the
>> performance?
>>
>
> It's not clear to me from this description if you are even using an
> implicit method. Is the linearization for use in a Newton iteration? How
> often do you have to reassemble? Please always send -log_summary output
> with performance questions.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120310/1bd0f59c/attachment-0001.htm>
-------------- next part --------------
  nx  = 256
  ny  = 128

  JET MESH 
     ix     = 74
     iy_in  = 44
     iy_out = 65
     rank: 0, i0-i1xj0-j1 : 0 - 74 x 44 - 63
     rank: 2, i0-i1xj0-j1 : 0 - 74 x 64 - 65

  TIME STEPPING
     Tf  = 1 
     CFL = 2 
State loaded from q0.h5 
32768 40960
32768 40960
32768 40960
32768 40960
Euler - x 
Euler - y 
LODI - x 
LODIq - x 
LODI - y 
LODIq - y 
Stress - x 
Stress - x 
dFv/dq - x 
dFv/dtau - x 
dFv/dq - y 
dFv/dtau - y 
 |MatEulerx       | = 21.7871 
 |MatEulery       | = 10.4999 
 |MatLODIx        | = 13.3652 
 |MatLODIy        | = 15.0075 
 |MatLODIqx       | = 4.58531e+06 
 |MatLODIqy       | = 1.00002 
 |MatViscousx_q   | = 0.00122487 
 |MatViscousy_q   | = 0.00125045 
 |MatViscousx_tau | = 1.99893 
 |MatViscousy_tau | = 1.99893 
    dt = 0.00571429 
    |q(1.000000)|/|q(0)| = 1.84842 
Elapsed time (CPU) = 27.2226 s 
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ns2d on a real_opt named muzo.polytechnique.fr with 4 processors, by garnaud Sat Mar 10 18:02:03 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.762e+01      1.00000   2.762e+01
Objects:              1.900e+02      1.00000   1.900e+02
Flops:                1.068e+10      1.01222   1.065e+10  4.258e+10
Flops/sec:            3.869e+08      1.01222   3.855e+08  1.542e+09
MPI Messages:         3.260e+04      1.00000   3.260e+04  1.304e+05
MPI Message Lengths:  2.277e+08      1.00000   6.984e+03  9.108e+08
MPI Reductions:       4.280e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.7615e+01 100.0%  4.2584e+10 100.0%  1.304e+05 100.0%  6.984e+03      100.0%  4.270e+02  99.8% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

DERIVATIVES        10508 1.0 1.4299e+01 1.0 8.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 51 77  0  0  0  51 77  0  0  0  2295
FILTERS              350 1.0 1.9905e-01 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2535
VECMANIP           21716 1.0 2.8288e+00 1.2 0.00e+00 0.0 1.3e+05 7.0e+03 6.0e+00  9  0100100  1   9  0100100  1     0
VecView                6 1.0 3.7352e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNorm                2 1.0 5.9009e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScale            1800 1.0 7.1079e-02 1.2 5.90e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  3323
VecCopy              414 1.0 3.7731e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             7051 1.0 7.2879e-01 1.1 4.97e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  5  0  0  0   3  5  0  0  0  2726
VecAXPBYCZ           350 1.0 6.3609e-02 1.0 4.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2524
VecLoad                1 1.0 1.8210e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    31858 1.0 1.5961e+00 1.1 0.00e+00 0.0 1.3e+05 7.0e+03 0.0e+00  6  0100100  0   6  0100100  0     0
VecScatterEnd      31858 1.0 8.4421e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
IMPOSEBC_VISC       5251 1.0 7.2675e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
IMPOSEBC_EULER      1945 1.0 8.8332e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
FLUXES_VISC           22 1.0 4.4665e-03 1.1 4.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3874
FLUXES_EULER          14 1.0 2.4092e-03 1.3 2.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4570
STRESSES              12 1.0 1.9977e-03 1.1 1.67e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3346
MatMult            12250 1.0 6.6647e+00 1.0 1.34e+09 1.1 0.0e+00 0.0e+00 0.0e+00 24 12  0  0  0  24 12  0  0  0   784
MatMultAdd          8750 1.0 2.5075e+00 1.0 4.13e+08 1.1 0.0e+00 0.0e+00 0.0e+00  9  4  0  0  0   9  4  0  0  0   642
MatAssemblyBegin      12 1.0 7.1454e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  6   0  0  0  0  6     0
MatAssemblyEnd        12 1.0 2.1005e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+02  0  0  0  0 25   0  0  0  0 25     0
TSStep               175 1.0 2.6759e+01 1.0 1.05e+10 1.0 1.3e+05 7.0e+03 6.0e+00 97 99 97 97  1  97 99 97 97  1  1570
TSFunctionEval      1750 1.0 2.6487e+01 1.0 1.04e+10 1.0 1.3e+05 7.0e+03 0.0e+00 96 97 97 97  0  96 97 97 97  0  1562
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     5              5       905272     0
              Vector    59             59      6793272     0
      Vector Scatter    22             22        23320     0
           Index Set    49             49       220252     0
   IS L to G Mapping    10             10       703172     0
              Viewer     4              3         2064     0
              Matrix    39             36     33641936     0
                  TS     1              1         1088     0
                SNES     1              1         1200     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 3.57628e-06
Average time for zero size MPI_Send(): 5.24521e-06
#PETSc Option Table entries:
-Tf 1.
-cfl 2
-lints
-log_summary
-nx 256
-ny 128
-save 1.
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Thu Feb 16 17:51:17 2012
Configure options: --with-mpi=yes --with-shared-libraries --with-scalar-type=real --with-fortran-interfaces=1 --FFLAGS=-I/usr/include --with-fortran --with-fortran-kernels=1 --with-clanguage=c COPTFLAGS=-O3 FOPTFLAGS=-O3 --download-mumps=MUMPS_4.10.0.tar.gz --download-scalapack=SCALAPACK-1.7.tar.gz --download-blacs=blacs-dev.tar.gz --download-parmetis=ParMetis-3.2.0-p1.tar.gz --download-superlu=superlu_4.2.tar.gz --download-superlu_dist=superlu_dist_2.5.tar.gz --download-spooles=spooles-2.2-dec-2008.tar.gz --download-umfpack=UMFPACK-5.5.1.tar.gz --with-debugging=0 --with-mpi-dir=/home/garnaud/local/openmpi-1.4.4 --download-hdf5 --download-f-blas-lapack
-----------------------------------------
Libraries compiled on Thu Feb 16 17:51:17 2012 on muzo.polytechnique.fr 
Machine characteristics: Linux-2.6.39.4-4.2-desktop-x86_64-with-mandrake-2011.0-Official
Using PETSc directory: /home/garnaud/local/petsc/petsc-3.2-p5
Using PETSc arch: real_opt
-----------------------------------------

Using C compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 -I/usr/include -fPIC -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/openmpi-1.4.4/include
-----------------------------------------

Using C linker: /home/garnaud/local/openmpi-1.4.4/bin/mpicc
Using Fortran linker: /home/garnaud/local/openmpi-1.4.4/bin/mpif90
Using libraries: -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lspooles -lscalapack -lblacs -lsuperlu_4.2 -lumfpack -lamd -lflapack -lfblas -lhdf5_fortran -lhdf5 -lz -Wl,-rpath,/home/garnaud/local/openmpi-1.4.4/lib -L/home/garnaud/local/openmpi-1.4.4/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -L/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lgfortran -lm -lm -lquadmath -lm -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
-----------------------------------------
-------------- next part --------------
  nx  = 256
  ny  = 128

  JET MESH 
     ix     = 74
     iy_in  = 44
     iy_out = 65
     rank: 0, i0-i1xj0-j1 : 0 - 74 x 44 - 63
     rank: 2, i0-i1xj0-j1 : 0 - 74 x 64 - 65

  TIME STEPPING
     Tf  = 1 
     CFL = 2 
    dt = 0.00571429 
    |q(1.000000)|/|q(0)| = 1.0005 
Elapsed time (CPU) = 19.2814 s 
Final state saved in q0.h5 
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ns2d on a real_opt named muzo.polytechnique.fr with 4 processors, by garnaud Sat Mar 10 18:03:09 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.955e+01      1.00000   1.955e+01
Objects:              8.400e+01      1.00000   8.400e+01
Flops:                1.090e+10      1.00000   1.090e+10  4.358e+10
Flops/sec:            5.574e+08      1.00000   5.574e+08  2.229e+09
MPI Messages:         3.259e+04      1.00000   3.259e+04  1.303e+05
MPI Message Lengths:  2.276e+08      1.00000   6.984e+03  9.103e+08
MPI Reductions:       1.180e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.9549e+01 100.0%  4.3584e+10 100.0%  1.303e+05 100.0%  6.984e+03      100.0%  1.170e+02  99.2% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

DERIVATIVES        10502 1.0 1.4136e+01 1.0 8.20e+09 1.0 0.0e+00 0.0e+00 0.0e+00 72 75  0  0  0  72 75  0  0  0  2321
FILTERS              350 1.0 1.9755e-01 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2554
VECMANIP           21704 1.0 2.4231e+00 1.2 0.00e+00 0.0 1.3e+05 7.0e+03 6.0e+00 11  0100100  5  11  0100100  5     0
VecView                7 1.0 4.2857e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecNorm                2 1.0 6.0606e-04 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  2   0  0  0  0  2     0
VecScale            1750 1.0 6.4685e-02 1.1 5.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  3546
VecCopy              352 1.0 3.0717e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             8750 1.0 8.0684e-01 1.1 6.08e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  6  0  0  0   4  6  0  0  0  3013
VecAXPBYCZ           350 1.0 6.5956e-02 1.0 4.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2434
VecScatterBegin    10852 1.0 1.4070e+00 1.1 0.00e+00 0.0 1.3e+05 7.0e+03 0.0e+00  7  0100100  0   7  0100100  0     0
VecScatterEnd      10852 1.0 7.2064e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
IMPOSEBC_VISC       5250 1.0 7.5293e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
IMPOSEBC_EULER      5425 1.0 5.2320e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
FLUXES_VISC         3500 1.0 6.5287e-01 1.1 6.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  6  0  0  0   3  6  0  0  0  4216
FLUXES_EULER        3500 1.0 4.5190e-01 1.1 6.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  6  0  0  0   2  6  0  0  0  6091
STRESSES            3500 1.0 4.7757e-01 1.1 4.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0  4083
TSStep               175 1.0 1.8815e+01 1.0 1.08e+10 1.0 1.3e+05 7.0e+03 1.0e+01 96 99 97 97  8  96 99 97 97  9  2289
TSFunctionEval      1750 1.0 1.8553e+01 1.0 1.06e+10 1.0 1.3e+05 7.0e+03 4.0e+00 95 97 97 97  3  95 97 97 97  3  2287
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     5              5       905272     0
              Vector    28             28      4715800     0
      Vector Scatter    10             10        10600     0
           Index Set    25             25       202300     0
   IS L to G Mapping    10             10       703172     0
              Viewer     4              3         2064     0
                  TS     1              1         1088     0
                SNES     1              1         1200     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 3.57628e-06
Average time for zero size MPI_Send(): 5.00679e-06
#PETSc Option Table entries:
-Tf 1.
-cfl 2
-log_summary
-nx 256
-ny 128
-save 1.
-ts
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Thu Feb 16 17:51:17 2012
Configure options: --with-mpi=yes --with-shared-libraries --with-scalar-type=real --with-fortran-interfaces=1 --FFLAGS=-I/usr/include --with-fortran --with-fortran-kernels=1 --with-clanguage=c COPTFLAGS=-O3 FOPTFLAGS=-O3 --download-mumps=MUMPS_4.10.0.tar.gz --download-scalapack=SCALAPACK-1.7.tar.gz --download-blacs=blacs-dev.tar.gz --download-parmetis=ParMetis-3.2.0-p1.tar.gz --download-superlu=superlu_4.2.tar.gz --download-superlu_dist=superlu_dist_2.5.tar.gz --download-spooles=spooles-2.2-dec-2008.tar.gz --download-umfpack=UMFPACK-5.5.1.tar.gz --with-debugging=0 --with-mpi-dir=/home/garnaud/local/openmpi-1.4.4 --download-hdf5 --download-f-blas-lapack
-----------------------------------------
Libraries compiled on Thu Feb 16 17:51:17 2012 on muzo.polytechnique.fr 
Machine characteristics: Linux-2.6.39.4-4.2-desktop-x86_64-with-mandrake-2011.0-Official
Using PETSc directory: /home/garnaud/local/petsc/petsc-3.2-p5
Using PETSc arch: real_opt
-----------------------------------------

Using C compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 -I/usr/include -fPIC -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/openmpi-1.4.4/include
-----------------------------------------

Using C linker: /home/garnaud/local/openmpi-1.4.4/bin/mpicc
Using Fortran linker: /home/garnaud/local/openmpi-1.4.4/bin/mpif90
Using libraries: -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lspooles -lscalapack -lblacs -lsuperlu_4.2 -lumfpack -lamd -lflapack -lfblas -lhdf5_fortran -lhdf5 -lz -Wl,-rpath,/home/garnaud/local/openmpi-1.4.4/lib -L/home/garnaud/local/openmpi-1.4.4/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -L/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lgfortran -lm -lm -lquadmath -lm -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
-----------------------------------------