[petsc-users] MatAssemblyEnd taking too long

Matthew Knepley knepley at gmail.com
Fri Aug 21 14:50:55 CDT 2020


On Fri, Aug 21, 2020 at 3:32 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Yes, absolutely a test suite will not solve all problems. In the PETSc
> model, which is not uncommon, each bug/problem found is suppose to result
> in another test to detect that problem, thus the test suite can find
> repeats of the problem without again all the hard work from scratch.
>
>    So this OpenMPI suite, if it gets off the ground, will be valuable ONLY
> if they accept community additions efficiently and happily. For example
> would the test suite detect the problem reported by the PETSc user? It
> should be trivial to have the user run the suite on their system (which is
> why it needs be very easy to run) and determine. If it does not detect the
> problem then working with the appropriate "test suite" community we could
> submit a MR to the test suite that looks for the problem and finds it. Now
> the test suite is better and we have one less hassle that comes up multiple
> times for us. In addition the OpenMPI, MPICH developers etc should do the
> same thing, each time they fix a bug that was not detected by testing they
> should donate to the universal test suite the code to reproduce the bug.
>
>   The question is would our effort in helping the MPI test suite community
> be more than our "wasted" effort dealing with buggy MPIs?
>
>    Barry
>
>   It is a bit curious that after 25 years no friendly extensible universal
> MPI test suite community has emerged. Perhaps it is because each MPI
> implementation has its own test processes and suites and cannot form the
> wider community to have a single friendly extensible universal MPI test
> suite. Looking back one could say this was a mistake of the MPI forum, they
> should have started that in motion in 1995, would have saved a lot of
> duplication of effort and would be very very good now.
>

I think they do not do it because people do not hold
implementors accountable, only the packages using MPI.

   Matt


> On Aug 21, 2020, at 2:17 PM, Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Barry,
>   I mentioned a test suite from MPICH at
> https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html.
> Since it is not easy to use, I did not put it on PETSc FAQ.
>   I also asked in the OpenMPI mailing list. An OpenMPI developer said he
> could make their tests public, and is in the process of checking with all
> authors to have a license :). If it is done,  it will be at
> https://github.com/open-mpi/ompi-tests-public
>
>   A test suite will be helpful but I doubt it will solve the problem.
> User's particular case (number of ranks, message size,
> communication pattern etc) might not be covered by a test suite.
> --Junchao Zhang
>
>
> On Fri, Aug 21, 2020 at 12:33 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   There really needs to be a usable extensive MPI test suite that can
>> find these performance issues, we spend time helping users with these
>> problems when it is really the MPI communities job.
>>
>>
>>
>> On Aug 21, 2020, at 11:55 AM, Manav Bhatia <bhatiamanav at gmail.com> wrote:
>>
>> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3
>> and the test is finishing at my end.
>>
>> So, it appears that there is some issue with openmpi-4.0.1 on this
>> machine.
>>
>> I will now build all my dependency toolchain with mpich and hopefully
>> things will work for my application code.
>>
>> Thank you again for your help.
>>
>> Regards,
>> Manav
>>
>>
>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>> Manav,
>>  I downloaded your petsc_mat.tgz but could not reproduce the problem, on
>> both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.
>>  On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured
>> --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort
>> --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0"
>> --PETSC_ARCH=linux-host-dbg
>>  On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is
>> configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx
>> --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g"
>> PETSC_ARCH=mac-clang-dbg
>>
>> mpirun -n 8 ./test
>> rank: 1 : stdout.processor.1
>> rank: 4 : stdout.processor.4
>> rank: 0 : stdout.processor.0
>> rank: 5 : stdout.processor.5
>> rank: 6 : stdout.processor.6
>> rank: 7 : stdout.processor.7
>> rank: 3 : stdout.processor.3
>> rank: 2 : stdout.processor.2
>> rank: 1 : Beginning reading nnz...
>> rank: 4 : Beginning reading nnz...
>> rank: 0 : Beginning reading nnz...
>> rank: 5 : Beginning reading nnz...
>> rank: 7 : Beginning reading nnz...
>> rank: 2 : Beginning reading nnz...
>> rank: 3 : Beginning reading nnz...
>> rank: 6 : Beginning reading nnz...
>> rank: 5 : Finished reading nnz
>> rank: 5 : Beginning mat preallocation...
>> rank: 3 : Finished reading nnz
>> rank: 3 : Beginning mat preallocation...
>> rank: 4 : Finished reading nnz
>> rank: 4 : Beginning mat preallocation...
>> rank: 7 : Finished reading nnz
>> rank: 7 : Beginning mat preallocation...
>> rank: 1 : Finished reading nnz
>> rank: 1 : Beginning mat preallocation...
>> rank: 0 : Finished reading nnz
>> rank: 0 : Beginning mat preallocation...
>> rank: 2 : Finished reading nnz
>> rank: 2 : Beginning mat preallocation...
>> rank: 6 : Finished reading nnz
>> rank: 6 : Beginning mat preallocation...
>> rank: 5 : Finished preallocation
>> rank: 5 : Beginning reading and setting matrix values...
>> rank: 1 : Finished preallocation
>> rank: 1 : Beginning reading and setting matrix values...
>> rank: 7 : Finished preallocation
>> rank: 7 : Beginning reading and setting matrix values...
>> rank: 2 : Finished preallocation
>> rank: 2 : Beginning reading and setting matrix values...
>> rank: 4 : Finished preallocation
>> rank: 4 : Beginning reading and setting matrix values...
>> rank: 0 : Finished preallocation
>> rank: 0 : Beginning reading and setting matrix values...
>> rank: 3 : Finished preallocation
>> rank: 3 : Beginning reading and setting matrix values...
>> rank: 6 : Finished preallocation
>> rank: 6 : Beginning reading and setting matrix values...
>> rank: 1 : Finished reading and setting matrix values
>> rank: 1 : Beginning mat assembly...
>> rank: 5 : Finished reading and setting matrix values
>> rank: 5 : Beginning mat assembly...
>> rank: 4 : Finished reading and setting matrix values
>> rank: 4 : Beginning mat assembly...
>> rank: 2 : Finished reading and setting matrix values
>> rank: 2 : Beginning mat assembly...
>> rank: 3 : Finished reading and setting matrix values
>> rank: 3 : Beginning mat assembly...
>> rank: 7 : Finished reading and setting matrix values
>> rank: 7 : Beginning mat assembly...
>> rank: 6 : Finished reading and setting matrix values
>> rank: 6 : Beginning mat assembly...
>> rank: 0 : Finished reading and setting matrix values
>> rank: 0 : Beginning mat assembly...
>> rank: 1 : Finished mat assembly
>> rank: 3 : Finished mat assembly
>> rank: 7 : Finished mat assembly
>> rank: 0 : Finished mat assembly
>> rank: 5 : Finished mat assembly
>> rank: 2 : Finished mat assembly
>> rank: 4 : Finished mat assembly
>> rank: 6 : Finished mat assembly
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> I will have a look and report back to you. Thanks.
>>> --Junchao Zhang
>>>
>>>
>>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiamanav at gmail.com>
>>> wrote:
>>>
>>>> I have created a standalone test that demonstrates the problem at my
>>>> end. I have stored the indices, etc.  from my problem in a text file
>>>> for each rank, which I use to initialize the matrix.
>>>> Please note that the test is specifically for 8 ranks.
>>>>
>>>> The .tgz file is on my google drive:
>>>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing
>>>>
>>>>
>>>> This contains a README file with instructions on running. Please note
>>>> that the work directory needs the index files.
>>>>
>>>> Please let me know if I can provide any further information.
>>>>
>>>> Thank you all for your help.
>>>>
>>>> Regards,
>>>> Manav
>>>>
>>>> On Aug 20, 2020, at 12:54 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>
>>>> Matthew Knepley <knepley at gmail.com> writes:
>>>>
>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiamanav at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zampini at gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> Can you add a MPI_Barrier before
>>>>
>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>>>>
>>>>
>>>> With a MPI_Barrier before this function call:
>>>> —  three of the processes have already hit this barrier,
>>>> —  the other 5 are inside MatStashScatterGetMesg_Private ->
>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3
>>>> processes)
>>>>
>>>>
>>>> This is not itself evidence of inconsistent state.  You can use
>>>>
>>>>  -build_twosided allreduce
>>>>
>>>> to avoid the nonblocking sparse algorithm.
>>>>
>>>>
>>>> Okay, you should run this with -matstash_legacy just to make sure it is
>>>> not
>>>> a bug in your MPI implementation. But it looks like
>>>> there is inconsistency in the parallel state. This can happen because we
>>>> have a bug, or it could be that you called a collective
>>>> operation on a subset of the processes. Is there any way you could cut
>>>> down
>>>> the example (say put all 1s in the matrix, etc) so
>>>> that you could give it to us to run?
>>>>
>>>>
>>>>
>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200821/d57b15cd/attachment.html>


More information about the petsc-users mailing list