[petsc-users] MPI barrier issue using MatZeroRows
Amneet Bhalla
mail2amneet at gmail.com
Wed Nov 29 12:56:44 CST 2023
I am using 3.17
On Wed, Nov 29, 2023 at 10:50 AM Barry Smith <bsmith at petsc.dev> wrote:
>
> What PETSc version are you using?
>
>
> On Nov 29, 2023, at 1:02 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla <mail2amneet at gmail.com>
> wrote:
>
>> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>>
>
> It looks like two processes are calling AllReduce, but one is not. Are all
> procs not calling MatZeroRows?
>
> Thanks,
>
> Matt
>
>
>> Task 1:
>>
>> amneetb at APSB-MacBook-Pro-16:~$ lldb -p 44691
>> (lldb) process attach --pid 44691
>> Process 44691 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>> + 8
>> libsystem_kernel.dylib`:
>> -> 0x18a2d750c <+8>: b.lo 0x18a2d752c ; <+40>
>> 0x18a2d7510 <+12>: pacibsp
>> 0x18a2d7514 <+16>: stp x29, x30, [sp, #-0x10]!
>> 0x18a2d7518 <+20>: mov x29, sp
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> Executable module set to
>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>> Architecture set to: arm64-apple-macosx-.
>> (lldb) cont
>> Process 44691 resuming
>> Process 44691 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000010ba40b60libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release
>> + 752
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>> -> 0x10ba40b60 <+752>: add w8, w8, #0x1
>> 0x10ba40b64 <+756>: ldr w9, [x22]
>> 0x10ba40b68 <+760>: cmp w8, w9
>> 0x10ba40b6c <+764>: b.lt 0x10ba40b4c ; <+732>
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> (lldb) bt
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> * frame #0: 0x000000010ba40b60libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release
>> + 752
>> frame #1: 0x000000010ba48528libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather
>> + 1088
>> frame #2: 0x000000010ba47964libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma
>> + 368
>> frame #3: 0x000000010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>> frame #4: 0x0000000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
>> frame #5: 0x0000000106d67650libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000105846470,
>> N=1, rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
>> b=0x0000000000000000) at mpiaij.c:827:3
>> frame #6: 0x0000000106aadfaclibpetsc.3.17.dylib`MatZeroRows(mat=0x0000000105846470,
>> numRows=1, rows=0x000000016dbfa9f4, diag=1, x=0x0000000000000000,
>> b=0x0000000000000000) at matrix.c:5935:3
>> frame #7: 0x00000001023952d0fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016dc04168,
>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>> u_bc_coefs=0x000000016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
>> u_dof_index_idx=27, p_dof_index_idx=28,
>> patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016dbfcec0,
>> mu_interp_type=VC_HARMONIC_INTERP) at
>> AcousticStreamingPETScMatUtilities.cpp:799:36
>> frame #8: 0x00000001023acb8cfo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016dc04018,
>> x=0x000000016dc05778, (null)=0x000000016dc05680) at
>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>> frame #9: 0x000000010254a2dcfo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016dc04018,
>> x=0x000000016dc05778, b=0x000000016dc05680) at PETScLevelSolver.cpp:340:5
>> frame #10: 0x0000000102202e5c fo_acoustic_streaming_solver_2d`main(argc=11,
>> argv=0x000000016dc07450) at fo_acoustic_streaming_solver.cpp:400:22
>> frame #11: 0x0000000189fbbf28 dyld`start + 2236
>> (lldb)
>>
>>
>> Task 2:
>>
>> amneetb at APSB-MacBook-Pro-16:~$ lldb -p 44692
>> (lldb) process attach --pid 44692
>> Process 44692 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>> + 8
>> libsystem_kernel.dylib`:
>> -> 0x18a2d750c <+8>: b.lo 0x18a2d752c ; <+40>
>> 0x18a2d7510 <+12>: pacibsp
>> 0x18a2d7514 <+16>: stp x29, x30, [sp, #-0x10]!
>> 0x18a2d7518 <+20>: mov x29, sp
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> Executable module set to
>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>> Architecture set to: arm64-apple-macosx-.
>> (lldb) cont
>> Process 44692 resuming
>> Process 44692 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000010e5a022clibpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>> + 516
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>> -> 0x10e5a022c <+516>: ldr x10, [x19, #0x4e8]
>> 0x10e5a0230 <+520>: cmp x9, x10
>> 0x10e5a0234 <+524>: b.hs 0x10e5a0254 ; <+556>
>> 0x10e5a0238 <+528>: add w8, w8, #0x1
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> (lldb) bt
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> * frame #0: 0x000000010e5a022clibpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>> + 516
>> frame #1: 0x000000010e59fd14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier
>> + 224
>> frame #2: 0x000000010e59fb60libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha
>> + 44
>> frame #3: 0x000000010e585490 libpmpi.12.dylib`MPIR_Barrier + 900
>> frame #4: 0x0000000106ac5030 libmpi.12.dylib`MPI_Barrier + 684
>> frame #5: 0x0000000108e62638libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=1140850688,
>> comm_out=0x00000001408ae4b0, first_tag=0x00000001408ae4e4) at tagm.c:235:
>> 5
>> frame #6: 0x0000000108e6a910libpetsc.3.17.dylib`PetscHeaderCreate_Private(h=0x00000001408ae470,
>> classid=1211228, class_name="KSP", descr="Krylov Method", mansec="KSP",
>> comm=1140850688, destroy=(libpetsc.3.17.dylib`KSPDestroy at itfunc.c:1418),
>> view=(libpetsc.3.17.dylib`KSPView at itcreate.c:113)) at inherit.c:62:3
>> frame #7: 0x000000010aa28010 libpetsc.3.17.dylib`KSPCreate(comm=1140850688,
>> inksp=0x000000016b0a4160) at itcreate.c:679:3
>> frame #8: 0x00000001050aa2f4fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a4018,
>> x=0x000000016b0a5778, b=0x000000016b0a5680) at PETScLevelSolver.cpp:344:
>> 12
>> frame #9: 0x0000000104d62e5c fo_acoustic_streaming_solver_2d`main(argc=11,
>> argv=0x000000016b0a7450) at fo_acoustic_streaming_solver.cpp:400:22
>> frame #10: 0x0000000189fbbf28 dyld`start + 2236
>> (lldb)
>>
>>
>> Task 3:
>>
>> amneetb at APSB-MacBook-Pro-16:~$ lldb -p 44693
>> (lldb) process attach --pid 44693
>> Process 44693 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>> + 8
>> libsystem_kernel.dylib`:
>> -> 0x18a2d750c <+8>: b.lo 0x18a2d752c ; <+40>
>> 0x18a2d7510 <+12>: pacibsp
>> 0x18a2d7514 <+16>: stp x29, x30, [sp, #-0x10]!
>> 0x18a2d7518 <+20>: mov x29, sp
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> Executable module set to
>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>> Architecture set to: arm64-apple-macosx-.
>> (lldb) cont
>> Process 44693 resuming
>> Process 44693 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> frame #0: 0x000000010e59c68clibpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather
>> + 952
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather:
>> -> 0x10e59c68c <+952>: ldr w9, [x21]
>> 0x10e59c690 <+956>: cmp w8, w9
>> 0x10e59c694 <+960>: b.lt 0x10e59c670 ; <+924>
>> 0x10e59c698 <+964>: bl 0x10e59ce64 ;
>> MPID_Progress_test
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>> (lldb) bt
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>> * frame #0: 0x000000010e59c68clibpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_gather
>> + 952
>> frame #1: 0x000000010e5a44bclibpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather
>> + 980
>> frame #2: 0x000000010e5a3964libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma
>> + 368
>> frame #3: 0x000000010e591e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>> frame #4: 0x0000000106ab47dc libmpi.12.dylib`MPI_Allreduce + 2280
>> frame #5: 0x00000001098c3650libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x0000000136862270,
>> N=1, rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
>> b=0x0000000000000000) at mpiaij.c:827:3
>> frame #6: 0x0000000109609faclibpetsc.3.17.dylib`MatZeroRows(mat=0x0000000136862270,
>> numRows=1, rows=0x000000016b09e9f4, diag=1, x=0x0000000000000000,
>> b=0x0000000000000000) at matrix.c:5935:3
>> frame #7: 0x0000000104ef12d0fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016b0a8168,
>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>> u_bc_coefs=0x000000016b0a8398, data_time=NaN, num_dofs_per_proc=size=3,
>> u_dof_index_idx=27, p_dof_index_idx=28,
>> patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016b0a0ec0,
>> mu_interp_type=VC_HARMONIC_INTERP) at
>> AcousticStreamingPETScMatUtilities.cpp:799:36
>> frame #8: 0x0000000104f08b8cfo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016b0a8018,
>> x=0x000000016b0a9778, (null)=0x000000016b0a9680) at
>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>> frame #9: 0x00000001050a62dcfo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016b0a8018,
>> x=0x000000016b0a9778, b=0x000000016b0a9680) at PETScLevelSolver.cpp:340:5
>>
> frame #10: 0x0000000104d5ee5c fo_acoustic_streaming_solver_2d`main(argc=11,
>> argv=0x000000016b0ab450) at fo_acoustic_streaming_solver.cpp:400:22
>> frame #11: 0x0000000189fbbf28 dyld`start + 2236
>> (lldb)
>>
>
>>
>> On Wed, Nov 29, 2023 at 7:22 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>
>>>
>>> On Nov 29, 2023, at 1:16 AM, Amneet Bhalla <mail2amneet at gmail.com>
>>> wrote:
>>>
>>> BTW, I think you meant using MatSetOption(mat,
>>> *MAT_NO_OFF_PROC_ZERO_ROWS*, PETSC_TRUE)
>>>
>>>
>>> Yes
>>>
>>> instead ofMatSetOption(mat, *MAT_NO_OFF_PROC_ENTRIES*, PETSC_TRUE) ??
>>>
>>>
>>> Please try setting both flags.
>>>
>>> However, that also did not help to overcome the MPI Barrier issue.
>>>
>>>
>>> If there is still a problem please trap all the MPI processes when
>>> they hang in the debugger and send the output from using bt on all of them.
>>> This way
>>> we can see the different places the different MPI processes are stuck at.
>>>
>>>
>>>
>>> On Tue, Nov 28, 2023 at 9:57 PM Amneet Bhalla <mail2amneet at gmail.com>
>>> wrote:
>>>
>>> I added that option but the code still gets stuck at the same call
>>>> MatZeroRows with 3 processors.
>>>>
>>>> On Tue, Nov 28, 2023 at 7:23 PM Amneet Bhalla <mail2amneet at gmail.com>
>>>> wrote:
>>>>
>>>
>>>>>
>>>>> On Tue, Nov 28, 2023 at 6:42 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>> for (int comp = 0; comp < 2; ++comp)
>>>>>> {
>>>>>> .......
>>>>>> for (Box<NDIM>::Iterator bc(bc_coef_box); bc;
>>>>>> bc++)
>>>>>> {
>>>>>> ......
>>>>>> if (IBTK::abs_equal_eps(b, 0.0))
>>>>>> {
>>>>>> const double diag_value = a;
>>>>>> ierr = MatZeroRows(mat, 1, &u_dof_index,
>>>>>> diag_value, NULL, NULL);
>>>>>> IBTK_CHKERRQ(ierr);
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> In general, this code will not work because each process calls
>>>>>> MatZeroRows a different number of times, so it cannot match up with all the
>>>>>> processes.
>>>>>>
>>>>>> If u_dof_index is always local to the current process, you can call
>>>>>> MatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) above the for loop
>>>>>> and
>>>>>> the MatZeroRows will not synchronize across the MPI processes (since
>>>>>> it does not need to and you told it that).
>>>>>>
>>>>>
>>>>> Yes, u_dof_index is going to be local and I put a check on it a few
>>>>> lines before calling MatZeroRows.
>>>>>
>>>>> Can MatSetOption() be called after the matrix has been assembled?
>>>>>
>>>>>
>>>>>> If the u_dof_index will not always be local, then you need, on each
>>>>>> process, to list all the u_dof_index for each process in an array and then
>>>>>> call MatZeroRows()
>>>>>> once after the loop so it can exchange the needed information with
>>>>>> the other MPI processes to get the row indices to the right place.
>>>>>>
>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Nov 28, 2023, at 6:44 PM, Amneet Bhalla <mail2amneet at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Folks,
>>>>>>
>>>>>> I am using MatZeroRows() to set Dirichlet boundary conditions. This
>>>>>> works fine for the serial run and the solver produces correct results
>>>>>> (verified through analytical solution). However, when I run the case in
>>>>>> parallel, the simulation gets stuck at MatZeroRows(). My understanding is
>>>>>> that this function needs to be called after the MatAssemblyBegin{End}() has
>>>>>> been called, and should be called by all processors. Here is that bit of
>>>>>> the code which calls MatZeroRows() after the matrix has been assembled
>>>>>>
>>>>>>
>>>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L724-L801
>>>>>>
>>>>>>
>>>>>> I ran the parallel code (on 3 processors) in the debugger
>>>>>> (-start_in_debugger). Below is the call stack from the processor that gets
>>>>>> stuck
>>>>>>
>>>>>> amneetb at APSB-MBP-16:~$ lldb -p 4307
>>>>>> (lldb) process attach --pid 4307
>>>>>> Process 4307 stopped
>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>> SIGSTOP
>>>>>> frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal
>>>>>> + 8
>>>>>> libsystem_kernel.dylib`:
>>>>>> -> 0x18a2d750c <+8>: b.lo 0x18a2d752c ; <+40>
>>>>>> 0x18a2d7510 <+12>: pacibsp
>>>>>> 0x18a2d7514 <+16>: stp x29, x30, [sp, #-0x10]!
>>>>>> 0x18a2d7518 <+20>: mov x29, sp
>>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>>> Executable module set to
>>>>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>>>>> Architecture set to: arm64-apple-macosx-.
>>>>>> (lldb) cont
>>>>>> Process 4307 resuming
>>>>>> Process 4307 stopped
>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>> SIGSTOP
>>>>>>
>>>>>> frame #0: 0x0000000109d281b8libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>>>>> + 400
>>>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>>>>>> -> 0x109d281b8 <+400>: ldr w9, [x24]
>>>>>> 0x109d281bc <+404>: cmp w8, w9
>>>>>> 0x109d281c0 <+408>: b.lt 0x109d281a0 ; <+376>
>>>>>> 0x109d281c4 <+412>: bl 0x109d28e64 ;
>>>>>> MPID_Progress_test
>>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>>> (lldb) bt
>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>>>>> SIGSTOP
>>>>>> * frame #0: 0x0000000109d281b8libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather
>>>>>> + 400
>>>>>> frame #1: 0x0000000109d27d14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier
>>>>>> + 224
>>>>>> frame #2: 0x0000000109d27b60libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha
>>>>>> + 44
>>>>>> frame #3: 0x0000000109d0d490 libpmpi.12.dylib`MPIR_Barrier + 900
>>>>>> frame #4: 0x000000010224d030 libmpi.12.dylib`MPI_Barrier + 684
>>>>>> frame #5: 0x00000001045ea638libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=-2080374782,
>>>>>> comm_out=0x000000010300bcb0, first_tag=0x000000010300bce4) at tagm.c:
>>>>>> 235
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231129/3540b110/attachment-0001.html>
More information about the petsc-users
mailing list