MPI questions

Thu Dec 18 08:20:01 CST 2008

On Dec 18, 2008, at 2:02 AM, Lars Rindorf wrote:

> Dear all
>
> I have some questions regarding MPI and PETSc. I have browsed the
> documentation and the internet, and I have not found the answers to
> them. But if they are trivial, then please accept my apologies :-)
>
> First of all, I'm running petsc on a quad core xeon and I'm using  
> MUMPS
> direct solver (iterative solvers not an option for my purposes). I
> compile PETSc with g++ and gfortran under redhat linux. My program  
> using
> petsc is compiled with mpiCC (a C++ program).

>
>
> The questions are:
>
> 1. Does petsc need to be compiled with mpi (mpicc, mpiCC etc) to be
> efficient? If not, is it faster when compiled with mpi compilers?

You should compile PETSc with the exact same compilers as your code
(or similarly compile your code with the exact same compilers as PETSc),
this is not because of speed but because not all compilers work mixed  
with
other compilers. We highly recommend using the mpicc/mpiCC etc compilers
to build PETSc rather than directly g++; otherwise people end up mixing
different compilers. It won't make any difference in terms of speed,  
in the
end they ALWAYS just use the underlying compiler to compile the code.

For performance you should build PETSc with the config/configure.py
option --with-debugging=0

>
>
> 2. Using a CPU with four cores then does MPI provide any significant
> speed-up? And if so, can it be optimized?

    Make sure you use --with-debugging=0; there is no particular
"optimization" you can do.

    You will not likely see speedup in going from 1 process to 2 with
MUMPS (the parallelism introduces a lot of overhead that is not
there when running on one process). You could see some improvement
in going from 2 to 4 processes.

>
>
> 3. I'm having problem with a problem that should scale (approx.)  
> linear
> in time as function of DOF. But it only scales linearly for small  
> number
> of DOF for large number of DOF it scales quadraticly. This is before  
> the
> RAM is even 1/4 full. (Direct solvers use a lot of RAM.). I guess what
> is limiting the speed is the communication between the CPU and the  
> RAM,
> but I don't know Does anybody has experience with this problem? Or  
> maybe
> a small test program that I can test?

   Where is the sparse matrix coming from? If it is coming from a 3d  
grid
problem (like a PDE with finite elements) you could easily see quadratic
growth with problem size with a direct solver. How do you know it should
scale linearly?

    If you run the PETSc program with -log_summary you can see where
the run is taking the time and see if that makes sense. What you should
see is that the large bulk of the time is in the LU numerical  
factorization.

    Barry

>
>
> Thanks!
>
> KR, Lars
>