[petsc-users] Using direct solvers in parallel

Tue May 15 02:54:56 CDT 2012

Am 15.05.2012 09:36, schrieb Dave May:
> I have seem similar behaviour comparing umfpack and superlu_dist,
> however the difference wasn't enormous, possibly umfpack was a factor
> of 1.2-1.4 times faster on 1 - 4 cores.
> What sort of time differences are you observing? Can you post the
> numbers somewhere?
I attached my data to this mail. For the largest matrix, umfpack failed 
after allocating 4 GB of memory. I have not tried to figure out what's 
the problem there. As you can see, for these matrices the distributed 
solvers are slower by a factor of 2 or 3 compared to umfpack. For all 
solvers, I have used the standard parameters, so I have not played 
around with the permutation strategies and such things. This may be also 
the reason why superlu is much slower than superlu_dist with just one 
core as it makes use of different col and row permutation strategies.
> However, umpack will not work on a distributed memory machine.
> My personal preference is to use superlu_dist in parallel. In my
> experience using it as a coarse grid solver for multigrid, I find it
> much more reliable than mumps. However, when mumps works, its is
> typically slightly faster than superlu_dist. Again, not by a large
> amount - never more than a factor of 2 faster.
In my codes I also make use of the distributed direct solvers for the 
coarse grid problems. I just wanted to make some tests how far away 
these solvers are from the sequential counterparts.

Thomas
>
> The failure rate using mumps is definitely higher (in my experience)
> when running on large numbers of cores compared to superlu_dist. I've
> never got to the bottom as to why it fails.
>
> Cheers,
>    Dave
>
>
> On 15 May 2012 09:25, Thomas Witkowski<thomas.witkowski at tu-dresden.de>  wrote:
>> I made some comparisons of using umfpack, superlu, superlu_dist and mumps to
>> solve systems with sparse matrices arising from finite element method. The
>> size of the matrices range from around 50000 to more than 3 million
>> unknowns. I used 1, 2, 4, 8 and 16 nodes to make the benchmark. Now, I
>> wonder that in all cases the sequential umfpack was the fastest one. So even
>> with 16 cores, superlu_dist and mumps are slower. Can anybody of you confirm
>> this observation? Are there any other parallel direct solvers around which
>> are more efficient?
>>
>> Thomas

-------------- next part --------------
   #rows |  umfpack / 1.core |  superlu / 1.core | superlu_dist / 1.core | mumps / 1.core |
-------------------------------------------------------------------------|----------------|
   49923 |             0.644 |             4.914 |                 2.148 |          1.731 |
  198147 |             3.992 |             41.53 |                 13.05 |          10.04 |
  792507 |             32.33 |             463.5 |                 66.75 |          52.56 |
 3151975 |         4GB limit |                -  |                 394.1 |          303.2 |


   #rows | superlu_dist / 1.core | superlu_dist / 2.core | superlu_dist / 4.core | superlu_dist / 8.core | superlu_dist / 16.core |
---------------------------------|-----------------------|-----------------------|-----------------------|------------------------|
   49923 |                 2.148 |                 1.922 |                 1.742 |                 1.705 |                  1.745 |
  198147 |                 13.05 |		   11.77 |                 10.47 |                 9.832 |                  9.565 |
  792507 |                 66.75 |                 61.07 |                 53.58 |                 49.14 |                  47.06 |
 3151875 |                 394.1 |                failed |                failed |                failed |                 failed |


   #rows | mumps / 1.core | mumps / 2.core | mumps / 4.core | mumps / 8.core | mumps / 16.core |
--------------------------|----------------|----------------|----------------|-----------------|
   49923 |          1.731 |          1.562 |          1.485 |          1.426 |           1.418 |
  198147 |          10.04 |          8.959 |          8.468 |          8.144 |           7.978 |
  792507 |          52.56 |          45.38 |          42.56 |          40.22 |           39.12 |
 3151875 |          303.2 |          255.7 |          232.2 |          216.0 |           210.0 |