[petsc-users] Performance of superlu_dist

Ormiston, Scott J. SJ_Ormiston at UManitoba.ca
Fri Apr 1 12:36:05 CDT 2011


Gaetan Kenway wrote:
> I have seen the same thing with SuperLU_dist as Scott Ormiston has. I've 
> been using to solve (small-ish) 3D solid finite element structural 
> system with rarely more than ~30,000 dof. Basically, if you use more 
> than 2 cores, SuperLU_dist tanks and the factorization time goes through 
> the roof exponentially.  However, if you solve the same system with 
> Spooles, its orders of magnitude faster.  I'm not overly concerned with 
> speed, since I only do this factorization once in my code and as such I 
> don't have precise timing results.  WIth 22,000 dof on an dual socket 
> Xeon X5500 series machine (8 cores per node), with spooles, there's a 
> speed up going from 1-8 procs. I could go up to about 32 procs before it 
> takes longer than the single processor case.

Following the suggestion of Desire Nuentsa Wakam (who pointed me to the 
FAQ), I have had better performance from superlu_dist using

mpiexec --cpus-per-proc 4 --bind-to-core -np 3 executable_name \ 
             -pc_type lu -pc_factor_mat_solver_package superlu_dist

on a server that has 4 quad-core CPUS and 64 Gb of RAM. I assume other 
option settings will be needed on other arrangements of cores and 
interconnects.

I have not done enough tests to see about any speed-up.

Thank you for your pointer to Spooles.
Scott Ormiston
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SJ_Ormiston.vcf
Type: text/x-vcard
Size: 321 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110401/542a7139/attachment.vcf>


More information about the petsc-users mailing list