[petsc-users] Vexing deadlock situation with petsc4py

Guyer, Jonathan E. Dr. (Fed) jonathan.guyer at nist.gov
Wed Oct 28 11:35:19 CDT 2020


We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I’ve gone through the refactoring line-by-line and I just don’t think I’ve changed anything substantive, just how the code is organized.

I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761

I explain in greater detail in that “pull request” how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:

On processor 0 in

  matrix.setUp()

specifically in

  [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c

and on other processors a few lines earlier in

  matrix.create(comm)

specifically in

  [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c


The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like

    >>> from scipy.stats.mstats import argstoarray

Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.

Running with `-malloc_debug true` doesn’t illuminate anything.

I’ve run with `-info` and `-log_trace` and don’t see any obvious issues, but there’s a ton of output.



I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I’m grateful for any help anybody can offer despite the mess that I’m offering.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/6c65b2f9/attachment.html>


More information about the petsc-users mailing list