[petsc-users] MPI barrier issue using MatZeroRows
Barry Smith
bsmith at petsc.dev
Wed Nov 29 09:21:47 CST 2023
> On Nov 29, 2023, at 1:16 AM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
>
> BTW, I think you meant using MatSetOption(mat, MAT_NO_OFF_PROC_ZERO_ROWS, PETSC_TRUE)
Yes
> instead ofMatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE) ??
Please try setting both flags.
> However, that also did not help to overcome the MPI Barrier issue.
If there is still a problem please trap all the MPI processes when they hang in the debugger and send the output from using bt on all of them. This way
we can see the different places the different MPI processes are stuck at.
>
> On Tue, Nov 28, 2023 at 9:57 PM Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>> I added that option but the code still gets stuck at the same call MatZeroRows with 3 processors.
>>
>> On Tue, Nov 28, 2023 at 7:23 PM Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>>
>>>
>>> On Tue, Nov 28, 2023 at 6:42 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>
>>>> for (int comp = 0; comp < 2; ++comp)
>>>> {
>>>> .......
>>>> for (Box<NDIM>::Iterator bc(bc_coef_box); bc; bc++)
>>>> {
>>>> ......
>>>> if (IBTK::abs_equal_eps(b, 0.0))
>>>> {
>>>> const double diag_value = a;
>>>> ierr = MatZeroRows(mat, 1, &u_dof_index, diag_value, NULL, NULL);
>>>> IBTK_CHKERRQ(ierr);
>>>> }
>>>> }
>>>> }
>>>>
>>>> In general, this code will not work because each process calls MatZeroRows a different number of times, so it cannot match up with all the processes.
>>>>
>>>> If u_dof_index is always local to the current process, you can call MatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) above the for loop and
>>>> the MatZeroRows will not synchronize across the MPI processes (since it does not need to and you told it that).
>>>
>>> Yes, u_dof_index is going to be local and I put a check on it a few lines before calling MatZeroRows.
>>>
>>> Can MatSetOption() be called after the matrix has been assembled?
>>>
>>>>
>>>> If the u_dof_index will not always be local, then you need, on each process, to list all the u_dof_index for each process in an array and then call MatZeroRows()
>>>> once after the loop so it can exchange the needed information with the other MPI processes to get the row indices to the right place.
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>>
>>>>> On Nov 28, 2023, at 6:44 PM, Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>>>>
>>>>>
>>>>> Hi Folks,
>>>>>
>>>>> I am using MatZeroRows() to set Dirichlet boundary conditions. This works fine for the serial run and the solver produces correct results (verified through analytical solution). However, when I run the case in parallel, the simulation gets stuck at MatZeroRows(). My understanding is that this function needs to be called after the MatAssemblyBegin{End}() has been called, and should be called by all processors. Here is that bit of the code which calls MatZeroRows() after the matrix has been assembled
>>>>>
>>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L724-L801
>>>>>
>>>>> I ran the parallel code (on 3 processors) in the debugger (-start_in_debugger). Below is the call stack from the processor that gets stuck
>>>>>
>>>>> amneetb at APSB-MBP-16:~$ lldb -p 4307
>>>>> (lldb) process attach --pid 4307
>>>>> Process 4307 stopped
>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>>>>> frame #0: 0x000000018a2d750c libsystem_kernel.dylib`__semwait_signal + 8
>>>>> libsystem_kernel.dylib`:
>>>>> -> 0x18a2d750c <+8>: b.lo 0x18a2d752c ; <+40>
>>>>> 0x18a2d7510 <+12>: pacibsp
>>>>> 0x18a2d7514 <+16>: stp x29, x30, [sp, #-0x10]!
>>>>> 0x18a2d7518 <+20>: mov x29, sp
>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>> Executable module set to "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>>>> Architecture set to: arm64-apple-macosx-.
>>>>> (lldb) cont
>>>>> Process 4307 resuming
>>>>> Process 4307 stopped
>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>>>>> frame #0: 0x0000000109d281b8 libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 400
>>>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>>>>> -> 0x109d281b8 <+400>: ldr w9, [x24]
>>>>> 0x109d281bc <+404>: cmp w8, w9
>>>>> 0x109d281c0 <+408>: b.lt <http://b.lt/> 0x109d281a0 ; <+376>
>>>>> 0x109d281c4 <+412>: bl 0x109d28e64 ; MPID_Progress_test
>>>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>>> (lldb) bt
>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>>>>> * frame #0: 0x0000000109d281b8 libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 400
>>>>> frame #1: 0x0000000109d27d14 libpmpi.12.dylib`MPIDI_SHM_mpi_barrier + 224
>>>>> frame #2: 0x0000000109d27b60 libpmpi.12.dylib`MPIDI_Barrier_intra_composition_alpha + 44
>>>>> frame #3: 0x0000000109d0d490 libpmpi.12.dylib`MPIR_Barrier + 900
>>>>> frame #4: 0x000000010224d030 libmpi.12.dylib`MPI_Barrier + 684
>>>>> frame #5: 0x00000001045ea638 libpetsc.3.17.dylib`PetscCommDuplicate(comm_in=-2080374782, comm_out=0x000000010300bcb0, first_tag=0x000000010300bce4) at tagm.c:235:5
>>>>> frame #6: 0x00000001045f2910 libpetsc.3.17.dylib`PetscHeaderCreate_Private(h=0x000000010300bc70, classid=1211227, class_name="PetscSF", descr="Star Forest", mansec="PetscSF", comm=-2080374782, destroy=(libpetsc.3.17.dylib`PetscSFDestroy at sf.c:224), view=(libpetsc.3.17.dylib`PetscSFView at sf.c:841)) at inherit.c:62:3
>>>>> frame #7: 0x00000001049cf820 libpetsc.3.17.dylib`PetscSFCreate(comm=-2080374782, sf=0x000000016f911a50) at sf.c:62:3
>>>>> frame #8: 0x0000000104cd3024 libpetsc.3.17.dylib`MatZeroRowsMapLocal_Private(A=0x00000001170c1270, N=1, rows=0x000000016f912cb4, nr=0x000000016f911df8, olrows=0x000000016f911e00) at zerorows.c:36:5
>>>>> frame #9: 0x000000010504ea50 libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x00000001170c1270, N=1, rows=0x000000016f912cb4, diag=1, x=0x0000000000000000, b=0x0000000000000000) at mpiaij.c:768:3
>>>>> frame #10: 0x0000000104d95fac libpetsc.3.17.dylib`MatZeroRows(mat=0x00000001170c1270, numRows=1, rows=0x000000016f912cb4, diag=1, x=0x0000000000000000, b=0x0000000000000000) at matrix.c:5935:3
>>>>> frame #11: 0x000000010067d320 fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x000000016f91c178, omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4, u_bc_coefs=0x000000016f91c3a8, data_time=NaN, num_dofs_per_proc=size=3, u_dof_index_idx=27, p_dof_index_idx=28, patch_level=Pointer<SAMRAI::hier::PatchLevel<2> > @ 0x000000016f914ed0, mu_interp_type=VC_HARMONIC_INTERP) at AcousticStreamingPETScMatUtilities.cpp:794:36
>>>>> frame #12: 0x0000000100694bdc fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x000000016f91c028, x=0x000000016f91d788, (null)=0x000000016f91d690) at FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>>>> frame #13: 0x000000010083232c fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x000000016f91c028, x=0x000000016f91d788, b=0x000000016f91d690) at PETScLevelSolver.cpp:340:5
>>>>> frame #14: 0x00000001004eb230 fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x000000016f91f460) at fo_acoustic_streaming_solver.cpp:400:22
>>>>> frame #15: 0x0000000189fbbf28 dyld`start + 2236
>>>>>
>>>>>
>>>>> Any suggestions on how to avoid this barrier? Here are all MAT options I am using (in the debug mode), if that is helpful:
>>>>>
>>>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L453-L458
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> --Amneet
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>> --
>> --Amneet
>>
>>
>>
>
>
> --
> --Amneet
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231129/f01f84cc/attachment-0001.html>
More information about the petsc-users
mailing list