[petsc-users] nondeterministic behavior of MUMPS when filtering out zero rows and columns

Matthew Knepley knepley at gmail.com
Thu Nov 7 10:40:05 CST 2019


On Thu, Nov 7, 2019 at 6:44 AM s.a.hack--- via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi,
>
>
>
> I am doing calculations with version 3.12.0 of PETSc.
>
> Using the finite-element method, I solve the Maxwell equations on the
> interior of a 3D domain, coupled with boundary condition auxiliary
> equations on the boundary of the domain. The auxiliary equations employ
> auxiliary variables g.
>
>
>
> For ease of implementation of element matrix assembly, the auxiliary
> variables g are defined on the entire domain. However, only the basis
> functions for g with nonzero value at the boundary give nonzero entries in
> the system matrix.
>
>
>
> The element matrices hence have the structure
>
> [ A B; C D]
>
> at the boundary.
>
>
>
> In the interior the element matrices have the structure
>
> [A 0; 0 0].
>
>
>
> The degrees of freedom in the system matrix can be ordered by element
> [u_e1 g_e1 u_e2 g_e2 …] or by parallel process [u_p1 g_p1 u_p2 g_p2 …].
>
>
>
> To solve the system matrix, I need to filter out zero rows and columns:
>
> error = MatFindNonzeroRows(stiffnessMatrix, &nonzeroRows);
>
> CHKERRABORT(PETSC_COMM_WORLD, error);
>
> error = MatCreateSubMatrix(stiffnessMatrix, nonzeroRows, nonzeroRows,
> MAT_INITIAL_MATRIX, &stiffnessMatrixSubMatrix);
>
> CHKERRABORT(PETSC_COMM_WORLD, error);
>
>
>
> I solve the system matrix in parallel on multiple nodes connected with
> InfiniBand.
>
> The problem is that the MUMPS solver frequently (nondeterministically)
> hangs during KSPSolve() (after KSPSetUp() is completed).
>
> Running with the options -ksp_view and -info the last printed statement is:
>
> [0] VecScatterCreate_SF(): Using StarForest for vector scatter
>
There is a bug in some older MPI implementations. You can try using

  -vec_assembly_legacy -matstash_legacy

to see if you avoid the bug.

> In the calculations where the program does not hang, the calculated
> solution is correct.
>
>
>
> The problem doesn’t occur for calculations on a single node, or for
> calculations with the SuperLU solver (but SuperLU will not allow
> calculations that are as large).
>

SuperLU_dist can do large problems. Use --download-superlu_dist


> The problem also doesn’t seem to occur for small problems.
>
> The problem doesn’t occur either when I put ones on the diagonal, but this
> is computationally expensive:
>
> error = MatFindZeroRows(stiffnessMatrix, &zeroRows);
>
> CHKERRABORT(PETSC_COMM_WORLD, error);
>
> error = MatZeroRowsColumnsIS(stiffnessMatrix, zeroRows, diagEntry,
> PETSC_IGNORE, PETSC_IGNORE);
>
> CHKERRABORT(PETSC_COMM_WORLD, error);
>

The two function calls above are expensive? I can you run it with -log_view
and send the timing?

  Thanks,

    Matt


>
>
> Would you have any ideas on what I could check?
>
>
>
> Best regards,
>
> Sjoerd
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191107/bf8aa68f/attachment.html>


More information about the petsc-users mailing list