I have seen the same thing with SuperLU_dist as Scott Ormiston has. I've been using to solve (small-ish) 3D solid finite element structural system with rarely more than ~30,000 dof. Basically, if you use more than 2 cores, SuperLU_dist tanks and the factorization time goes through the roof exponentially. However, if you solve the same system with Spooles, its orders of magnitude faster. I'm not overly concerned with speed, since I only do this factorization once in my code and as such I don't have precise timing results. WIth 22,000 dof on an dual socket Xeon X5500 series machine (8 cores per node), with spooles, there's a speed up going from 1-8 procs. I could go up to about 32 procs before it takes longer than the single processor case. <br>
<br>I hope this is of some use. <br><br>Gaetan<br>