[petsc-users] Performance of superlu_dist

Desire NUENTSA WAKAM desire.nuentsa_wakam at inria.fr
Fri Apr 1 09:48:45 CDT 2011


On a multicore node, you may not get a very good speedup if the 
bandwidth is heavily shared between all the cores. I guess this is what 
Petsc people have explained here 
http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers
If you have a multi-socket multicore node, my guess would be to keep one 
MPI  process on each socket and then to use a multithreaded BLAS (like 
Goto) inside each socket to keep the cores busy during BLAS operations.
Hope this helps
Desire

On 04/01/2011 03:43 PM, Ormiston, Scott J. wrote:
> I am just starting to try superlu_dist to get a direct solver that 
> runs in parallel with PETSc.
>
> My first tests (with ex15f) show that it takes longer and longer as 
> the number of cores increases. For example 4 cores takes 8 times 
> longer than 2 cores and 8 cores takes 25 times longer than 4 cores.  
> Obviously I expected a speed-up; has anyone else seen this behaviour 
> with superlu_dist? If not, what could be going wrong here?
>
> Scott Ormiston


More information about the petsc-users mailing list