[petsc-users] MPI barrier issue using MatZeroRows

Amneet Bhalla mail2amneet at gmail.com
Wed Nov 29 10:50:45 CST 2023


Ok, I added both, but it still hangs. Here, is bt from all three tasks:

Task 1:

amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44691

(lldb) process attach --pid 44691

Process 44691 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal + 8

libsystem_kernel.dylib`:

->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>

    0x18a2d7510 <+12>: pacibsp

    0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!

    0x18a2d7518 <+20>: mov    x29, sp

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

Executable module set to
"/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".

Architecture set to: arm64-apple-macosx-.

(lldb) cont

Process 44691 resuming

Process 44691 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000010ba40b60
libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752

libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:

->  0x10ba40b60 <+752>: add    w8, w8, #0x1

    0x10ba40b64 <+756>: ldr    w9, [x22]

    0x10ba40b68 <+760>: cmp    w8, w9

    0x10ba40b6c <+764>: b.lt   0x10ba40b4c               ; <+732>

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

(lldb) bt

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

  * frame #0: 0x000000010ba40b60
libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752

    frame #1: 0x000000010ba48528
libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088

    frame #2: 0x000000010ba47964
libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368

    frame #3: 0x000000010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588

    frame #4: 0x0000000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280

    frame #5: 0x0000000106d67650
libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000105846470, N=1,
rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
b=0x0000000000000000) at mpiaij.c:827:3

    frame #6: 0x0000000106aadfac
libpetsc.3.17.dylib`MatZeroRows(mat=0x0000000105846470, numRows=1,
rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
b=0x0000000000000000) at matrix.c:5935:3

    frame #7: 0x00000001023952d0
fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016dc04168,
omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
u_bc_coefs=0x000000016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
u_dof_index_idx=27, p_dof_index_idx=28,
patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016dbfcec0,
mu_interp_type=VC_HARMONIC_INTERP) at AcousticStreamingPETScMatUtilities.cpp
:799:36

    frame #8: 0x00000001023acb8c
fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016dc04018,
x=0x000000016dc05778, (null)=0x000000016dc05680) at
FOAcousticStreamingPETScLevelSolver.cpp:149:5

    frame #9: 0x000000010254a2dc
fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016dc04018,
x=0x000000016dc05778, b=0x000000016dc05680) at PETScLevelSolver.cpp:340:5

    frame #10: 0x0000000102202e5c
fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x000000016dc07450) at
fo_acoustic_streaming_solver.cpp:400:22

    frame #11: 0x0000000189fbbf28 dyld`start + 2236

(lldb)


Task 2:

amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44692

(lldb) process attach --pid 44692

Process 44692 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal + 8

libsystem_kernel.dylib`:

->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>

    0x18a2d7510 <+12>: pacibsp

    0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!

    0x18a2d7518 <+20>: mov    x29, sp

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

Executable module set to
"/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".

Architecture set to: arm64-apple-macosx-.

(lldb) cont

Process 44692 resuming

Process 44692 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000010e5a022c
libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 516

libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:

->  0x10e5a022c <+516>: ldr    x10, [x19, #0x4e8]

    0x10e5a0230 <+520>: cmp    x9, x10

    0x10e5a0234 <+524>: b.hs   0x10e5a0254               ; <+556>

    0x10e5a0238 <+528>: add    w8, w8, #0x1

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

(lldb) bt

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

  * frame #0: 0x000000010e5a022c
libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 516

    frame #1: 0x000000010e59fd14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier +
224

    frame #2: 0x000000010e59fb60
libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha + 44

    frame #3: 0x000000010e585490 libpmpi.12.dylib`MPIR_Barrier + 900

    frame #4: 0x0000000106ac5030 libmpi.12.dylib`MPI_Barrier + 684

    frame #5: 0x0000000108e62638
libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=1140850688,
comm_out=0x00000001408ae4b0, first_tag=0x00000001408ae4e4) at tagm.c:235:5

    frame #6: 0x0000000108e6a910
libpetsc.3.17.dylib`PetscHeaderCreate_Private(h=0x00000001408ae470,
classid=1211228, class_name="KSP", descr="Krylov Method", mansec="KSP",
comm=1140850688, destroy=(libpetsc.3.17.dylib`KSPDestroy at itfunc.c:1418),
view=(libpetsc.3.17.dylib`KSPView at itcreate.c:113)) at inherit.c:62:3

    frame #7: 0x000000010aa28010
libpetsc.3.17.dylib`KSPCreate(comm=1140850688, inksp=0x000000016b0a4160) at
itcreate.c:679:3

    frame #8: 0x00000001050aa2f4
fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a4018,
x=0x000000016b0a5778, b=0x000000016b0a5680) at PETScLevelSolver.cpp:344:12

    frame #9: 0x0000000104d62e5c
fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x000000016b0a7450) at
fo_acoustic_streaming_solver.cpp:400:22

    frame #10: 0x0000000189fbbf28 dyld`start + 2236

(lldb)


Task 3:

amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44693

(lldb) process attach --pid 44693

Process 44693 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal + 8

libsystem_kernel.dylib`:

->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>

    0x18a2d7510 <+12>: pacibsp

    0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!

    0x18a2d7518 <+20>: mov    x29, sp

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

Executable module set to
"/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".

Architecture set to: arm64-apple-macosx-.

(lldb) cont

Process 44693 resuming

Process 44693 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x000000010e59c68c
libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather + 952

libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather:

->  0x10e59c68c <+952>: ldr    w9, [x21]

    0x10e59c690 <+956>: cmp    w8, w9

    0x10e59c694 <+960>: b.lt   0x10e59c670               ; <+924>

    0x10e59c698 <+964>: bl     0x10e59ce64               ;
MPID_Progress_test

Target 0: (fo_acoustic_streaming_solver_2d) stopped.

(lldb) bt

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

  * frame #0: 0x000000010e59c68c
libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather + 952

    frame #1: 0x000000010e5a44bc
libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 980

    frame #2: 0x000000010e5a3964
libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368

    frame #3: 0x000000010e591e78 libpmpi.12.dylib`MPIR_Allreduce + 1588

    frame #4: 0x0000000106ab47dc libmpi.12.dylib`MPI_Allreduce + 2280

    frame #5: 0x00000001098c3650
libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000136862270, N=1,
rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
b=0x0000000000000000) at mpiaij.c:827:3

    frame #6: 0x0000000109609fac
libpetsc.3.17.dylib`MatZeroRows(mat=0x0000000136862270, numRows=1,
rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
b=0x0000000000000000) at matrix.c:5935:3

    frame #7: 0x0000000104ef12d0
fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016b0a8168,
omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
u_bc_coefs=0x000000016b0a8398, data_time=NaN, num_dofs_per_proc=size=3,
u_dof_index_idx=27, p_dof_index_idx=28,
patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016b0a0ec0,
mu_interp_type=VC_HARMONIC_INTERP) at AcousticStreamingPETScMatUtilities.cpp
:799:36

    frame #8: 0x0000000104f08b8c
fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016b0a8018,
x=0x000000016b0a9778, (null)=0x000000016b0a9680) at
FOAcousticStreamingPETScLevelSolver.cpp:149:5

    frame #9: 0x00000001050a62dc
fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a8018,
x=0x000000016b0a9778, b=0x000000016b0a9680) at PETScLevelSolver.cpp:340:5

    frame #10: 0x0000000104d5ee5c
fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x000000016b0ab450) at
fo_acoustic_streaming_solver.cpp:400:22

    frame #11: 0x0000000189fbbf28 dyld`start + 2236

(lldb)


On Wed, Nov 29, 2023 at 7:22 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Nov 29, 2023, at 1:16 AM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
>
> BTW, I think you meant using MatSetOption(mat, *MAT_NO_OFF_PROC_ZERO_ROWS*,
> PETSC_TRUE)
>
>
> Yes
>
>  instead ofMatSetOption(mat, *MAT_NO_OFF_PROC_ENTRIES*, PETSC_TRUE) ??
>
>
>   Please try setting both flags.
>
>  However, that also did not help to overcome the MPI Barrier issue.
>
>
>   If there is still a problem please trap all the MPI processes when they
> hang in the debugger and send the output from using bt on all of them. This
> way
> we can see the different places the different MPI processes are stuck at.
>
>
>
> On Tue, Nov 28, 2023 at 9:57 PM Amneet Bhalla <mail2amneet at gmail.com>
> wrote:
>
>> I added that option but the code still gets stuck at the same call
>> MatZeroRows with 3 processors.
>>
>> On Tue, Nov 28, 2023 at 7:23 PM Amneet Bhalla <mail2amneet at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Nov 28, 2023 at 6:42 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   for (int comp = 0; comp < 2; ++comp)
>>>>                 {
>>>>                 .......
>>>>                     for (Box<NDIM>::Iterator bc(bc_coef_box); bc; bc++)
>>>>                     {
>>>>                        ......
>>>>                         if (IBTK::abs_equal_eps(b, 0.0))
>>>>                         {
>>>>                             const double diag_value = a;
>>>>                             ierr = MatZeroRows(mat, 1, &u_dof_index,
>>>> diag_value, NULL, NULL);
>>>>                             IBTK_CHKERRQ(ierr);
>>>>                         }
>>>>                     }
>>>>                 }
>>>>
>>>> In general, this code will not work because each process calls
>>>> MatZeroRows a different number of times, so it cannot match up with all the
>>>> processes.
>>>>
>>>> If u_dof_index is always local to the current process, you can call
>>>> MatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) above the for loop
>>>> and
>>>> the MatZeroRows will not synchronize across the MPI processes (since it
>>>> does not need to and you told it that).
>>>>
>>>
>>> Yes, u_dof_index is going to be local and I put a check on it a few
>>> lines before calling MatZeroRows.
>>>
>>> Can MatSetOption() be called after the matrix has been assembled?
>>>
>>>
>>>> If the u_dof_index will not always be local, then you need, on each
>>>> process, to list all the u_dof_index for each process in an array and then
>>>> call MatZeroRows()
>>>> once after the loop so it can exchange the needed information with the
>>>> other MPI processes to get the row indices to the right place.
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>>
>>>> On Nov 28, 2023, at 6:44 PM, Amneet Bhalla <mail2amneet at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Folks,
>>>>
>>>> I am using MatZeroRows() to set Dirichlet boundary conditions. This
>>>> works fine for the serial run and the solver produces correct results
>>>> (verified through analytical solution). However, when I run the case in
>>>> parallel, the simulation gets stuck at MatZeroRows(). My understanding is
>>>> that this function needs to be called after the MatAssemblyBegin{End}() has
>>>> been called, and should be called by all processors. Here is that bit of
>>>> the code which calls MatZeroRows() after the matrix has been assembled
>>>>
>>>>
>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L724-L801
>>>>
>>>> I ran the parallel code (on 3 processors) in the debugger
>>>> (-start_in_debugger). Below is the call stack from the processor that gets
>>>> stuck
>>>>
>>>> amneetb at APSB-MBP-16:~$ lldb  -p 4307
>>>> (lldb) process attach --pid 4307
>>>> Process 4307 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>> SIGSTOP
>>>>     frame #0: 0x000000018a2d750c
>>>> libsystem_kernel.dylib`__semwait_signal + 8
>>>> libsystem_kernel.dylib`:
>>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>
>>>>     0x18a2d7510 <+12>: pacibsp
>>>>     0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!
>>>>     0x18a2d7518 <+20>: mov    x29, sp
>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>> Executable module set to
>>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>>> Architecture set to: arm64-apple-macosx-.
>>>> (lldb) cont
>>>> Process 4307 resuming
>>>> Process 4307 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>> SIGSTOP
>>>>     frame #0: 0x0000000109d281b8
>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 400
>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>>>> ->  0x109d281b8 <+400>: ldr    w9, [x24]
>>>>     0x109d281bc <+404>: cmp    w8, w9
>>>>     0x109d281c0 <+408>: b.lt   0x109d281a0               ; <+376>
>>>>     0x109d281c4 <+412>: bl     0x109d28e64               ;
>>>> MPID_Progress_test
>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>> (lldb) bt
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>> SIGSTOP
>>>>   * frame #0: 0x0000000109d281b8
>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 400
>>>>     frame #1: 0x0000000109d27d14
>>>> libpmpi.12.dylib`MPIDI_SHM_mpi_barrier + 224
>>>>     frame #2: 0x0000000109d27b60
>>>> libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha + 44
>>>>     frame #3: 0x0000000109d0d490 libpmpi.12.dylib`MPIR_Barrier + 900
>>>>     frame #4: 0x000000010224d030 libmpi.12.dylib`MPI_Barrier + 684
>>>>     frame #5: 0x00000001045ea638
>>>> libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=-2080374782,
>>>> comm_out=0x000000010300bcb0, first_tag=0x000000010300bce4) at tagm.c:
>>>> 235:5
>>>>     frame #6: 0x00000001045f2910
>>>> libpetsc.3.17.dylib`PetscHeaderCreate_Private(h=0x000000010300bc70,
>>>> classid=1211227, class_name="PetscSF", descr="Star Forest",
>>>> mansec="PetscSF", comm=-2080374782,
>>>> destroy=(libpetsc.3.17.dylib`PetscSFDestroy at sf.c:224),
>>>> view=(libpetsc.3.17.dylib`PetscSFView at sf.c:841)) at inherit.c:62:3
>>>>     frame #7: 0x00000001049cf820
>>>> libpetsc.3.17.dylib`PetscSFCreate(comm=-2080374782, sf=0x000000016f911a50)
>>>> at sf.c:62:3
>>>>     frame #8: 0x0000000104cd3024
>>>> libpetsc.3.17.dylib`MatZeroRowsMapLocal_Private(A=0x00000001170c1270, N=1,
>>>> rows=0x000000016f912cb4, nr=0x000000016f911df8, olrows=0x000000016f911e00)
>>>> at zerorows.c:36:5
>>>>     frame #9: 0x000000010504ea50
>>>> libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x00000001170c1270, N=1,
>>>> rows=0x000000016f912cb4, diag=1, x=0x0000000000000000,
>>>> b=0x0000000000000000) at mpiaij.c:768:3
>>>>     frame #10: 0x0000000104d95fac
>>>> libpetsc.3.17.dylib`MatZeroRows(mat=0x00000001170c1270, numRows=1,
>>>> rows=0x000000016f912cb4, diag=1, x=0x0000000000000000,
>>>> b=0x0000000000000000) at matrix.c:5935:3
>>>>     frame #11: 0x000000010067d320
>>>> fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016f91c178,
>>>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>>>> u_bc_coefs=0x000000016f91c3a8, data_time=NaN, num_dofs_per_proc=size=3,
>>>> u_dof_index_idx=27, p_dof_index_idx=28,
>>>> patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016f914ed0,
>>>> mu_interp_type=VC_HARMONIC_INTERP) at
>>>> AcousticStreamingPETScMatUtilities.cpp:794:36
>>>>     frame #12: 0x0000000100694bdc
>>>> fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016f91c028,
>>>> x=0x000000016f91d788, (null)=0x000000016f91d690) at
>>>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>>>     frame #13: 0x000000010083232c
>>>> fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016f91c028,
>>>> x=0x000000016f91d788, b=0x000000016f91d690) at PETScLevelSolver.cpp:340
>>>> :5
>>>>     frame #14: 0x00000001004eb230
>>>> fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x000000016f91f460) at
>>>> fo_acoustic_streaming_solver.cpp:400:22
>>>>     frame #15: 0x0000000189fbbf28 dyld`start + 2236
>>>>
>>>>
>>>> Any suggestions on how to avoid this barrier? Here are all MAT options
>>>> I am using (in the debug mode), if that is helpful:
>>>>
>>>>
>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L453-L458
>>>>
>>>> Thanks,
>>>> --
>>>> --Amneet
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> --Amneet
>>
>>
>>
>>
>
> --
> --Amneet
>
>
>
>
>

-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231129/ab71a3c6/attachment-0001.html>


More information about the petsc-users mailing list