[petsc-users] MatAssemblyEnd taking too long

Fri Aug 21 14:31:25 CDT 2020

  Yes, absolutely a test suite will not solve all problems. In the PETSc model, which is not uncommon, each bug/problem found is suppose to result in another test to detect that problem, thus the test suite can find repeats of the problem without again all the hard work from scratch.

   So this OpenMPI suite, if it gets off the ground, will be valuable ONLY if they accept community additions efficiently and happily. For example would the test suite detect the problem reported by the PETSc user? It should be trivial to have the user run the suite on their system (which is why it needs be very easy to run) and determine. If it does not detect the problem then working with the appropriate "test suite" community we could submit a MR to the test suite that looks for the problem and finds it. Now the test suite is better and we have one less hassle that comes up multiple times for us. In addition the OpenMPI, MPICH developers etc should do the same thing, each time they fix a bug that was not detected by testing they should donate to the universal test suite the code to reproduce the bug.

  The question is would our effort in helping the MPI test suite community be more than our "wasted" effort dealing with buggy MPIs? 

   Barry

  It is a bit curious that after 25 years no friendly extensible universal MPI test suite community has emerged. Perhaps it is because each MPI implementation has its own test processes and suites and cannot form the wider community to have a single friendly extensible universal MPI test suite. Looking back one could say this was a mistake of the MPI forum, they should have started that in motion in 1995, would have saved a lot of duplication of effort and would be very very good now.

> On Aug 21, 2020, at 2:17 PM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> 
> Barry,
>   I mentioned a test suite from MPICH at https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html>. Since it is not easy to use, I did not put it on PETSc FAQ.
>   I also asked in the OpenMPI mailing list. An OpenMPI developer said he could make their tests public, and is in the process of checking with all authors to have a license :). If it is done,  it will be at https://github.com/open-mpi/ompi-tests-public <https://github.com/open-mpi/ompi-tests-public> 
> 
>   A test suite will be helpful but I doubt it will solve the problem.  User's particular case (number of ranks, message size, communication pattern etc) might not be covered by a test suite. 
> --Junchao Zhang
> 
> 
> On Fri, Aug 21, 2020 at 12:33 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   There really needs to be a usable extensive MPI test suite that can find these performance issues, we spend time helping users with these problems when it is really the MPI communities job.
> 
> 
> 
>> On Aug 21, 2020, at 11:55 AM, Manav Bhatia <bhatiamanav at gmail.com <mailto:bhatiamanav at gmail.com>> wrote:
>> 
>> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. 
>> 
>> So, it appears that there is some issue with openmpi-4.0.1 on this machine. 
>> 
>> I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. 
>> 
>> Thank you again for your help. 
>> 
>> Regards, 
>> Manav
>> 
>> 
>>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>> 
>>> Manav,
>>>  I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.
>>>  On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured  --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg
>>>  On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg
>>> 
>>> mpirun -n 8 ./test
>>> rank: 1 : stdout.processor.1
>>> rank: 4 : stdout.processor.4
>>> rank: 0 : stdout.processor.0
>>> rank: 5 : stdout.processor.5
>>> rank: 6 : stdout.processor.6
>>> rank: 7 : stdout.processor.7
>>> rank: 3 : stdout.processor.3
>>> rank: 2 : stdout.processor.2
>>> rank: 1 : Beginning reading nnz...
>>> rank: 4 : Beginning reading nnz...
>>> rank: 0 : Beginning reading nnz...
>>> rank: 5 : Beginning reading nnz...
>>> rank: 7 : Beginning reading nnz...
>>> rank: 2 : Beginning reading nnz...
>>> rank: 3 : Beginning reading nnz...
>>> rank: 6 : Beginning reading nnz...
>>> rank: 5 : Finished reading nnz
>>> rank: 5 : Beginning mat preallocation...
>>> rank: 3 : Finished reading nnz
>>> rank: 3 : Beginning mat preallocation...
>>> rank: 4 : Finished reading nnz
>>> rank: 4 : Beginning mat preallocation...
>>> rank: 7 : Finished reading nnz
>>> rank: 7 : Beginning mat preallocation...
>>> rank: 1 : Finished reading nnz
>>> rank: 1 : Beginning mat preallocation...
>>> rank: 0 : Finished reading nnz
>>> rank: 0 : Beginning mat preallocation...
>>> rank: 2 : Finished reading nnz
>>> rank: 2 : Beginning mat preallocation...
>>> rank: 6 : Finished reading nnz
>>> rank: 6 : Beginning mat preallocation...
>>> rank: 5 : Finished preallocation
>>> rank: 5 : Beginning reading and setting matrix values...
>>> rank: 1 : Finished preallocation
>>> rank: 1 : Beginning reading and setting matrix values...
>>> rank: 7 : Finished preallocation
>>> rank: 7 : Beginning reading and setting matrix values...
>>> rank: 2 : Finished preallocation
>>> rank: 2 : Beginning reading and setting matrix values...
>>> rank: 4 : Finished preallocation
>>> rank: 4 : Beginning reading and setting matrix values...
>>> rank: 0 : Finished preallocation
>>> rank: 0 : Beginning reading and setting matrix values...
>>> rank: 3 : Finished preallocation
>>> rank: 3 : Beginning reading and setting matrix values...
>>> rank: 6 : Finished preallocation
>>> rank: 6 : Beginning reading and setting matrix values...
>>> rank: 1 : Finished reading and setting matrix values
>>> rank: 1 : Beginning mat assembly...
>>> rank: 5 : Finished reading and setting matrix values
>>> rank: 5 : Beginning mat assembly...
>>> rank: 4 : Finished reading and setting matrix values
>>> rank: 4 : Beginning mat assembly...
>>> rank: 2 : Finished reading and setting matrix values
>>> rank: 2 : Beginning mat assembly...
>>> rank: 3 : Finished reading and setting matrix values
>>> rank: 3 : Beginning mat assembly...
>>> rank: 7 : Finished reading and setting matrix values
>>> rank: 7 : Beginning mat assembly...
>>> rank: 6 : Finished reading and setting matrix values
>>> rank: 6 : Beginning mat assembly...
>>> rank: 0 : Finished reading and setting matrix values
>>> rank: 0 : Beginning mat assembly...
>>> rank: 1 : Finished mat assembly
>>> rank: 3 : Finished mat assembly
>>> rank: 7 : Finished mat assembly
>>> rank: 0 : Finished mat assembly
>>> rank: 5 : Finished mat assembly
>>> rank: 2 : Finished mat assembly
>>> rank: 4 : Finished mat assembly
>>> rank: 6 : Finished mat assembly
>>> 
>>> --Junchao Zhang
>>> 
>>> 
>>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>> I will have a look and report back to you. Thanks.
>>> --Junchao Zhang
>>> 
>>> 
>>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiamanav at gmail.com <mailto:bhatiamanav at gmail.com>> wrote:
>>> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc.  from my problem in a text file for each rank, which I use to initialize the matrix.
>>> Please note that the test is specifically for 8 ranks. 
>>> 
>>> The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing <https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing> 
>>> 
>>> This contains a README file with instructions on running. Please note that the work directory needs the index files. 
>>> 
>>> Please let me know if I can provide any further information. 
>>> 
>>> Thank you all for your help. 
>>> 
>>> Regards,
>>> Manav
>>> 
>>>> On Aug 20, 2020, at 12:54 PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>>> 
>>>> Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> writes:
>>>> 
>>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiamanav at gmail.com <mailto:bhatiamanav at gmail.com>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>>
>>>>>> wrote:
>>>>>> 
>>>>>> Can you add a MPI_Barrier before
>>>>>> 
>>>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>>>>>> 
>>>>>> 
>>>>>> With a MPI_Barrier before this function call:
>>>>>> —  three of the processes have already hit this barrier,
>>>>>> —  the other 5 are inside MatStashScatterGetMesg_Private ->
>>>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3
>>>>>> processes)
>>>> 
>>>> This is not itself evidence of inconsistent state.  You can use
>>>> 
>>>>  -build_twosided allreduce
>>>> 
>>>> to avoid the nonblocking sparse algorithm.
>>>> 
>>>>> 
>>>>> Okay, you should run this with -matstash_legacy just to make sure it is not
>>>>> a bug in your MPI implementation. But it looks like
>>>>> there is inconsistency in the parallel state. This can happen because we
>>>>> have a bug, or it could be that you called a collective
>>>>> operation on a subset of the processes. Is there any way you could cut down
>>>>> the example (say put all 1s in the matrix, etc) so
>>>>> that you could give it to us to run?
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200821/97c9fe1c/attachment-0001.html>