[petsc-users] MPI error for large number of processes and subcomms

Thu Apr 16 23:13:13 CDT 2020

Randy,
  I reproduced your error with petsc-3.12.4 and 5120 mpi ranks. I also
found the error went away with petsc-3.13.  However, I have not figured out
what is the bug and which commit fixed it :).
  So at your side, it is better to use the latest petsc.
--Junchao Zhang

On Thu, Apr 16, 2020 at 9:06 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Randy,
>   Up to now I could not reproduce your error, even with the biggest mpirun
> -n 5120 ./test -nsubs 320 -nx 100 -ny 100 -nz 100
>   While I continue doing test, you can try other options. It looks you
> want to duplicate a vector to subcomms. I don't think you need the two
> lines:
>
> call AOApplicationToPetsc(aoParent,nis,ind1,ierr)
> call AOApplicationToPetsc(aoSub,nis,ind2,ierr)
>
>  In addition, you can use simpler and more memory-efficient index sets.
> There is a petsc example for this task, see case 3 in
> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vscat/tests/ex9.c
>  BTW, it is good to use petsc master so we are on the same page.
> --Junchao Zhang
>
>
> On Wed, Apr 15, 2020 at 10:28 AM Randall Mackie <rlmackie862 at gmail.com>
> wrote:
>
>> Hi Junchao,
>>
>> So I was able to create a small test code that duplicates the issue we
>> have been having, and it is attached to this email in a zip file.
>> Included is the test.F90 code, the commands to duplicate crash and to
>> duplicate a successful run, output errors, and our petsc configuration.
>>
>> Our findings to date include:
>>
>> The error is reproducible in a very short time with this script
>> It is related to nproc*nsubs and (although to a less extent) to DM grid
>> size
>> It happens regardless of MPI implementation (mpich, intel mpi 2018, 2019,
>> openmpi) or compiler (gfortran/gcc , intel 2018)
>> No effect changing vecscatter_type to mpi1 or mpi3. Mpi1 seems to
>> slightly increase the limit, but still fails on the full machine set.
>> Nothing looks interesting on valgrind
>>
>> Our initial tests were carried out on an Azure cluster, but we also
>> tested on our smaller cluster, and we found the following:
>>
>> Works:
>> $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1280 -hostfile hostfile ./test
>> -nsubs 80 -nx 100 -ny 100 -nz 100
>>
>> Crashes (this works on Azure)
>> $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2560 -hostfile hostfile ./test
>> -nsubs 80 -nx 100 -ny 100 -nz 100
>>
>> So it looks like it may also be related to the physical number of nodes
>> as well.
>>
>> In any case, even with 2560 processes on 192 cores the memory does not go
>> above 3.5 Gbyes so you don’t need a huge cluster to test.
>>
>> Thanks,
>>
>> Randy M.
>>
>>
>>
>> On Apr 14, 2020, at 12:23 PM, Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>> There is an MPI_Allreduce in PetscGatherNumberOfMessages, that is why I
>> doubted it was the problem. Even if users configure petsc with 64-bit
>> indices, we use PetscMPIInt in MPI calls. So it is not a problem.
>> Try -vecscatter_type mpi1 to restore to the original VecScatter
>> implementation. If the problem still remains, could you provide a test
>> example for me to debug?
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Apr 14, 2020 at 12:13 PM Randall Mackie <rlmackie862 at gmail.com>
>> wrote:
>>
>>> Hi Junchao,
>>>
>>> We have tried your two suggestions but the problem remains.
>>> And the problem seems to be on the MPI_Isend line 117 in
>>> PetscGatherMessageLengths and not MPI_AllReduce.
>>>
>>> We have now tried Intel MPI, Mpich, and OpenMPI, and so are thinking the
>>> problem must be elsewhere and not MPI.
>>>
>>> Give that this is a 64 bit indices build of PETSc, is there some
>>> possible incompatibility between PETSc and MPI calls?
>>>
>>> We are open to any other possible suggestions to try as other than
>>> valgrind on thousands of processes we seem to have run out of ideas.
>>>
>>> Thanks, Randy M.
>>>
>>> On Apr 13, 2020, at 8:54 AM, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Apr 13, 2020 at 10:53 AM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> Randy,
>>>>    Someone reported similar problem before. It turned out an Intel MPI
>>>> MPI_Allreduce bug.  A workaround is setting the environment variable
>>>> I_MPI_ADJUST_ALLREDUCE=1.arr
>>>>
>>>  Correct:  I_MPI_ADJUST_ALLREDUCE=1
>>>
>>>>    But you mentioned mpich also had the error. So maybe the problem is
>>>> not the same. So let's try the workaround first. If it doesn't work, add
>>>> another petsc option -build_twosided allreduce, which is a workaround for
>>>> Intel MPI_Ibarrier bugs we met.
>>>>    Thanks.
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Mon, Apr 13, 2020 at 10:38 AM Randall Mackie <rlmackie862 at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear PETSc users,
>>>>>
>>>>> We are trying to understand an issue that has come up in running our
>>>>> code on a large cloud cluster with a large number of processes and subcomms.
>>>>> This is code that we use daily on multiple clusters without problems,
>>>>> and that runs valgrind clean for small test problems.
>>>>>
>>>>> The run generates the following messages, but doesn’t crash, just
>>>>> seems to hang with all processes continuing to show activity:
>>>>>
>>>>> [492]PETSC ERROR: #1 PetscGatherMessageLengths() line 117 in
>>>>> /mnt/home/cgg/PETSc/petsc-3.12.4/src/sys/utils/mpimesg.c
>>>>> [492]PETSC ERROR: #2 VecScatterSetUp_SF() line 658 in
>>>>> /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/impls/sf/vscatsf.c
>>>>> [492]PETSC ERROR: #3 VecScatterSetUp() line 209 in
>>>>> /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/interface/vscatfce.c
>>>>> [492]PETSC ERROR: #4 VecScatterCreate() line 282 in
>>>>> /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/interface/vscreate.c
>>>>>
>>>>>
>>>>> Looking at line 117 in PetscGatherMessageLengths we find the offending
>>>>> statement is the MPI_Isend:
>>>>>
>>>>>
>>>>>   /* Post the Isends with the message length-info */
>>>>>   for (i=0,j=0; i<size; ++i) {
>>>>>     if (ilengths[i]) {
>>>>>       ierr =
>>>>> MPI_Isend((void*)(ilengths+i),1,MPI_INT,i,tag,comm,s_waits+j);CHKERRQ(ierr);
>>>>>       j++;
>>>>>     }
>>>>>   }
>>>>>
>>>>> We have tried this with Intel MPI 2018, 2019, and mpich, all giving
>>>>> the same problem.
>>>>>
>>>>> We suspect there is some limit being set on this cloud cluster on the
>>>>> number of file connections or something, but we don’t know.
>>>>>
>>>>> Anyone have any ideas? We are sort of grasping for straws at this
>>>>> point.
>>>>>
>>>>> Thanks, Randy M.
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200416/4bfba8cf/attachment-0001.html>