[petsc-users] MPI barrier issue using MatZeroRows

Amneet Bhalla mail2amneet at gmail.com
Wed Nov 29 12:59:00 CST 2023


Actually it is 3.17.5

On Wed, Nov 29, 2023 at 10:56 AM Amneet Bhalla <mail2amneet at gmail.com>
wrote:

> I am using 3.17
>
> On Wed, Nov 29, 2023 at 10:50 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>    What PETSc version are you using?
>>
>>
>> On Nov 29, 2023, at 1:02 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla <mail2amneet at gmail.com>
>> wrote:
>>
>>> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>>>
>>
>> It looks like two processes are calling AllReduce, but one is not. Are
>> all procs not calling MatZeroRows?
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Task 1:
>>>
>>> amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44691
>>> (lldb) process attach --pid 44691
>>> Process 44691 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>>> + 8
>>> libsystem_kernel.dylib`:
>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>
>>>     0x18a2d7510 <+12>: pacibsp
>>>     0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!
>>>     0x18a2d7518 <+20>: mov    x29, sp
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> Executable module set to
>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>> Architecture set to: arm64-apple-macosx-.
>>> (lldb) cont
>>> Process 44691 resuming
>>> Process 44691 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000010ba40b60libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release
>>> + 752
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>>> ->  0x10ba40b60 <+752>: add    w8, w8, #0x1
>>>     0x10ba40b64 <+756>: ldr    w9, [x22]
>>>     0x10ba40b68 <+760>: cmp    w8, w9
>>>     0x10ba40b6c <+764>: b.lt   0x10ba40b4c               ; <+732>
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> (lldb) bt
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>   * frame #0: 0x000000010ba40b60libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release
>>> + 752
>>>     frame #1: 0x000000010ba48528libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather
>>> + 1088
>>>     frame #2: 0x000000010ba47964libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma
>>> + 368
>>>     frame #3: 0x000000010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>>>     frame #4: 0x0000000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
>>>     frame #5: 0x0000000106d67650libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000105846470,
>>> N=1, rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
>>> b=0x0000000000000000) at mpiaij.c:827:3
>>>     frame #6: 0x0000000106aadfaclibpetsc.3.17.dylib`MatZeroRows(mat=0x0000000105846470,
>>> numRows=1, rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
>>> b=0x0000000000000000) at matrix.c:5935:3
>>>     frame #7: 0x00000001023952d0fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016dc04168,
>>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>>> u_bc_coefs=0x000000016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
>>> u_dof_index_idx=27, p_dof_index_idx=28,
>>> patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016dbfcec0,
>>> mu_interp_type=VC_HARMONIC_INTERP) at
>>> AcousticStreamingPETScMatUtilities.cpp:799:36
>>>     frame #8: 0x00000001023acb8cfo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016dc04018,
>>> x=0x000000016dc05778, (null)=0x000000016dc05680) at
>>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>>     frame #9: 0x000000010254a2dcfo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016dc04018,
>>> x=0x000000016dc05778, b=0x000000016dc05680) at PETScLevelSolver.cpp:340:
>>> 5
>>>     frame #10: 0x0000000102202e5c fo_acoustic_streaming_solver_2d`main(argc=11,
>>> argv=0x000000016dc07450) at fo_acoustic_streaming_solver.cpp:400:22
>>>     frame #11: 0x0000000189fbbf28 dyld`start + 2236
>>> (lldb)
>>>
>>>
>>> Task 2:
>>>
>>> amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44692
>>> (lldb) process attach --pid 44692
>>> Process 44692 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>>> + 8
>>> libsystem_kernel.dylib`:
>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>
>>>     0x18a2d7510 <+12>: pacibsp
>>>     0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!
>>>     0x18a2d7518 <+20>: mov    x29, sp
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> Executable module set to
>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>> Architecture set to: arm64-apple-macosx-.
>>> (lldb) cont
>>> Process 44692 resuming
>>> Process 44692 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000010e5a022clibpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>> + 516
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>>> ->  0x10e5a022c <+516>: ldr    x10, [x19, #0x4e8]
>>>     0x10e5a0230 <+520>: cmp    x9, x10
>>>     0x10e5a0234 <+524>: b.hs   0x10e5a0254               ; <+556>
>>>     0x10e5a0238 <+528>: add    w8, w8, #0x1
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> (lldb) bt
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>   * frame #0: 0x000000010e5a022clibpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>> + 516
>>>     frame #1: 0x000000010e59fd14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier
>>> + 224
>>>     frame #2: 0x000000010e59fb60libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha
>>> + 44
>>>     frame #3: 0x000000010e585490 libpmpi.12.dylib`MPIR_Barrier + 900
>>>     frame #4: 0x0000000106ac5030 libmpi.12.dylib`MPI_Barrier + 684
>>>     frame #5: 0x0000000108e62638libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=1140850688,
>>> comm_out=0x00000001408ae4b0, first_tag=0x00000001408ae4e4) at tagm.c:235
>>> :5
>>>     frame #6: 0x0000000108e6a910libpetsc.3.17.dylib`PetscHeaderCreate_Private(h=0x00000001408ae470,
>>> classid=1211228, class_name="KSP", descr="Krylov Method", mansec="KSP",
>>> comm=1140850688, destroy=(libpetsc.3.17.dylib`KSPDestroy at itfunc.c:1418),
>>> view=(libpetsc.3.17.dylib`KSPView at itcreate.c:113)) at inherit.c:62:3
>>>     frame #7: 0x000000010aa28010 libpetsc.3.17.dylib`KSPCreate(comm=1140850688,
>>> inksp=0x000000016b0a4160) at itcreate.c:679:3
>>>     frame #8: 0x00000001050aa2f4fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a4018,
>>> x=0x000000016b0a5778, b=0x000000016b0a5680) at PETScLevelSolver.cpp:344:
>>> 12
>>>     frame #9: 0x0000000104d62e5c fo_acoustic_streaming_solver_2d`main(argc=11,
>>> argv=0x000000016b0a7450) at fo_acoustic_streaming_solver.cpp:400:22
>>>     frame #10: 0x0000000189fbbf28 dyld`start + 2236
>>> (lldb)
>>>
>>>
>>> Task 3:
>>>
>>> amneetb at APSB-MacBook-Pro-16:~$ lldb  -p 44693
>>> (lldb) process attach --pid 44693
>>> Process 44693 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>>> + 8
>>> libsystem_kernel.dylib`:
>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>
>>>     0x18a2d7510 <+12>: pacibsp
>>>     0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!
>>>     0x18a2d7518 <+20>: mov    x29, sp
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> Executable module set to
>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>> Architecture set to: arm64-apple-macosx-.
>>> (lldb) cont
>>> Process 44693 resuming
>>> Process 44693 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>     frame #0: 0x000000010e59c68clibpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather
>>> + 952
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather:
>>> ->  0x10e59c68c <+952>: ldr    w9, [x21]
>>>     0x10e59c690 <+956>: cmp    w8, w9
>>>     0x10e59c694 <+960>: b.lt   0x10e59c670               ; <+924>
>>>     0x10e59c698 <+964>: bl     0x10e59ce64               ;
>>> MPID_Progress_test
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> (lldb) bt
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>   * frame #0: 0x000000010e59c68clibpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather
>>> + 952
>>>     frame #1: 0x000000010e5a44bclibpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather
>>> + 980
>>>     frame #2: 0x000000010e5a3964libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma
>>> + 368
>>>     frame #3: 0x000000010e591e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>>>     frame #4: 0x0000000106ab47dc libmpi.12.dylib`MPI_Allreduce + 2280
>>>     frame #5: 0x00000001098c3650libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000136862270,
>>> N=1, rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
>>> b=0x0000000000000000) at mpiaij.c:827:3
>>>     frame #6: 0x0000000109609faclibpetsc.3.17.dylib`MatZeroRows(mat=0x0000000136862270,
>>> numRows=1, rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
>>> b=0x0000000000000000) at matrix.c:5935:3
>>>     frame #7: 0x0000000104ef12d0fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016b0a8168,
>>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>>> u_bc_coefs=0x000000016b0a8398, data_time=NaN, num_dofs_per_proc=size=3,
>>> u_dof_index_idx=27, p_dof_index_idx=28,
>>> patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016b0a0ec0,
>>> mu_interp_type=VC_HARMONIC_INTERP) at
>>> AcousticStreamingPETScMatUtilities.cpp:799:36
>>>     frame #8: 0x0000000104f08b8cfo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016b0a8018,
>>> x=0x000000016b0a9778, (null)=0x000000016b0a9680) at
>>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>>     frame #9: 0x00000001050a62dcfo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a8018,
>>> x=0x000000016b0a9778, b=0x000000016b0a9680) at PETScLevelSolver.cpp:340:
>>> 5
>>>
>>     frame #10: 0x0000000104d5ee5c fo_acoustic_streaming_solver_2d`main(argc=11,
>>> argv=0x000000016b0ab450) at fo_acoustic_streaming_solver.cpp:400:22
>>>     frame #11: 0x0000000189fbbf28 dyld`start + 2236
>>> (lldb)
>>>
>>
>>>
>>> On Wed, Nov 29, 2023 at 7:22 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>
>>>>
>>>> On Nov 29, 2023, at 1:16 AM, Amneet Bhalla <mail2amneet at gmail.com>
>>>> wrote:
>>>>
>>>> BTW, I think you meant using MatSetOption(mat,
>>>> *MAT_NO_OFF_PROC_ZERO_ROWS*, PETSC_TRUE)
>>>>
>>>>
>>>> Yes
>>>>
>>>>  instead ofMatSetOption(mat, *MAT_NO_OFF_PROC_ENTRIES*, PETSC_TRUE) ??
>>>>
>>>>
>>>>   Please try setting both flags.
>>>>
>>>>  However, that also did not help to overcome the MPI Barrier issue.
>>>>
>>>>
>>>>   If there is still a problem please trap all the MPI processes when
>>>> they hang in the debugger and send the output from using bt on all of them.
>>>> This way
>>>> we can see the different places the different MPI processes are stuck
>>>> at.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 28, 2023 at 9:57 PM Amneet Bhalla <mail2amneet at gmail.com>
>>>> wrote:
>>>>
>>>> I added that option but the code still gets stuck at the same call
>>>>> MatZeroRows with 3 processors.
>>>>>
>>>>> On Tue, Nov 28, 2023 at 7:23 PM Amneet Bhalla <mail2amneet at gmail.com>
>>>>> wrote:
>>>>>
>>>>
>>>>>>
>>>>>> On Tue, Nov 28, 2023 at 6:42 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>>
>>>>>>>   for (int comp = 0; comp < 2; ++comp)
>>>>>>>                 {
>>>>>>>                 .......
>>>>>>>                     for (Box<NDIM>::Iterator bc(bc_coef_box); bc;
>>>>>>> bc++)
>>>>>>>                     {
>>>>>>>                        ......
>>>>>>>                         if (IBTK::abs_equal_eps(b, 0.0))
>>>>>>>                         {
>>>>>>>                             const double diag_value = a;
>>>>>>>                             ierr = MatZeroRows(mat, 1,
>>>>>>> &u_dof_index, diag_value, NULL, NULL);
>>>>>>>                             IBTK_CHKERRQ(ierr);
>>>>>>>                         }
>>>>>>>                     }
>>>>>>>                 }
>>>>>>>
>>>>>>> In general, this code will not work because each process calls
>>>>>>> MatZeroRows a different number of times, so it cannot match up with all the
>>>>>>> processes.
>>>>>>>
>>>>>>> If u_dof_index is always local to the current process, you can call
>>>>>>> MatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) above the for loop
>>>>>>> and
>>>>>>> the MatZeroRows will not synchronize across the MPI processes (since
>>>>>>> it does not need to and you told it that).
>>>>>>>
>>>>>>
>>>>>> Yes, u_dof_index is going to be local and I put a check on it a few
>>>>>> lines before calling MatZeroRows.
>>>>>>
>>>>>> Can MatSetOption() be called after the matrix has been assembled?
>>>>>>
>>>>>>
>>>>>>> If the u_dof_index will not always be local, then you need, on each
>>>>>>> process, to list all the u_dof_index for each process in an array and then
>>>>>>> call MatZeroRows()
>>>>>>> once after the loop so it can exchange the needed information with
>>>>>>> the other MPI processes to get the row indices to the right place.
>>>>>>>
>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Nov 28, 2023, at 6:44 PM, Amneet Bhalla <mail2amneet at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Folks,
>>>>>>>
>>>>>>> I am using MatZeroRows() to set Dirichlet boundary conditions. This
>>>>>>> works fine for the serial run and the solver produces correct results
>>>>>>> (verified through analytical solution). However, when I run the case in
>>>>>>> parallel, the simulation gets stuck at MatZeroRows(). My understanding is
>>>>>>> that this function needs to be called after the MatAssemblyBegin{End}() has
>>>>>>> been called, and should be called by all processors. Here is that bit of
>>>>>>> the code which calls MatZeroRows() after the matrix has been assembled
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L724-L801
>>>>>>>
>>>>>>>
>>>>>>> I ran the parallel code (on 3 processors) in the debugger
>>>>>>> (-start_in_debugger). Below is the call stack from the processor that gets
>>>>>>> stuck
>>>>>>>
>>>>>>> amneetb at APSB-MBP-16:~$ lldb  -p 4307
>>>>>>> (lldb) process attach --pid 4307
>>>>>>> Process 4307 stopped
>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>>> SIGSTOP
>>>>>>>     frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>>>>>>> + 8
>>>>>>> libsystem_kernel.dylib`:
>>>>>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c               ; <+40>
>>>>>>>     0x18a2d7510 <+12>: pacibsp
>>>>>>>     0x18a2d7514 <+16>: stp    x29, x30, [sp, #-0x10]!
>>>>>>>     0x18a2d7518 <+20>: mov    x29, sp
>>>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>>>> Executable module set to
>>>>>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>>>>>> Architecture set to: arm64-apple-macosx-.
>>>>>>> (lldb) cont
>>>>>>> Process 4307 resuming
>>>>>>> Process 4307 stopped
>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>>> SIGSTOP
>>>>>>>
>>>>>>>     frame #0: 0x0000000109d281b8libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>>>>>> + 400
>>>>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>>>>>>> ->  0x109d281b8 <+400>: ldr    w9, [x24]
>>>>>>>     0x109d281bc <+404>: cmp    w8, w9
>>>>>>>     0x109d281c0 <+408>: b.lt   0x109d281a0               ; <+376>
>>>>>>>     0x109d281c4 <+412>: bl     0x109d28e64               ;
>>>>>>> MPID_Progress_test
>>>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>>>> (lldb) bt
>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>>> SIGSTOP
>>>>>>>   * frame #0: 0x0000000109d281b8libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>>>>>> + 400
>>>>>>>     frame #1: 0x0000000109d27d14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier
>>>>>>> + 224
>>>>>>>     frame #2: 0x0000000109d27b60libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha
>>>>>>> + 44
>>>>>>>     frame #3: 0x0000000109d0d490 libpmpi.12.dylib`MPIR_Barrier + 900
>>>>>>>     frame #4: 0x000000010224d030 libmpi.12.dylib`MPI_Barrier + 684
>>>>>>>     frame #5: 0x00000001045ea638libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=-2080374782,
>>>>>>> comm_out=0x000000010300bcb0, first_tag=0x000000010300bce4) at tagm.c
>>>>>>> :235
>>>>>>>
>>>>>>>

-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231129/73588a06/attachment-0001.html>


More information about the petsc-users mailing list