[petsc-users] MatAssemblyEnd taking too long
Junchao Zhang
junchao.zhang at gmail.com
Fri Aug 21 14:17:55 CDT 2020
Barry,
I mentioned a test suite from MPICH at
https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html.
Since it is not easy to use, I did not put it on PETSc FAQ.
I also asked in the OpenMPI mailing list. An OpenMPI developer said he
could make their tests public, and is in the process of checking with all
authors to have a license :). If it is done, it will be at
https://github.com/open-mpi/ompi-tests-public
A test suite will be helpful but I doubt it will solve the problem.
User's particular case (number of ranks, message size,
communication pattern etc) might not be covered by a test suite.
--Junchao Zhang
On Fri, Aug 21, 2020 at 12:33 PM Barry Smith <bsmith at petsc.dev> wrote:
>
> There really needs to be a usable extensive MPI test suite that can find
> these performance issues, we spend time helping users with these problems
> when it is really the MPI communities job.
>
>
>
> On Aug 21, 2020, at 11:55 AM, Manav Bhatia <bhatiamanav at gmail.com> wrote:
>
> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3
> and the test is finishing at my end.
>
> So, it appears that there is some issue with openmpi-4.0.1 on this
> machine.
>
> I will now build all my dependency toolchain with mpich and hopefully
> things will work for my application code.
>
> Thank you again for your help.
>
> Regards,
> Manav
>
>
> On Aug 20, 2020, at 10:45 PM, Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Manav,
> I downloaded your petsc_mat.tgz but could not reproduce the problem, on
> both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.
> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured
> --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort
> --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0"
> --PETSC_ARCH=linux-host-dbg
> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is
> configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g"
> PETSC_ARCH=mac-clang-dbg
>
> mpirun -n 8 ./test
> rank: 1 : stdout.processor.1
> rank: 4 : stdout.processor.4
> rank: 0 : stdout.processor.0
> rank: 5 : stdout.processor.5
> rank: 6 : stdout.processor.6
> rank: 7 : stdout.processor.7
> rank: 3 : stdout.processor.3
> rank: 2 : stdout.processor.2
> rank: 1 : Beginning reading nnz...
> rank: 4 : Beginning reading nnz...
> rank: 0 : Beginning reading nnz...
> rank: 5 : Beginning reading nnz...
> rank: 7 : Beginning reading nnz...
> rank: 2 : Beginning reading nnz...
> rank: 3 : Beginning reading nnz...
> rank: 6 : Beginning reading nnz...
> rank: 5 : Finished reading nnz
> rank: 5 : Beginning mat preallocation...
> rank: 3 : Finished reading nnz
> rank: 3 : Beginning mat preallocation...
> rank: 4 : Finished reading nnz
> rank: 4 : Beginning mat preallocation...
> rank: 7 : Finished reading nnz
> rank: 7 : Beginning mat preallocation...
> rank: 1 : Finished reading nnz
> rank: 1 : Beginning mat preallocation...
> rank: 0 : Finished reading nnz
> rank: 0 : Beginning mat preallocation...
> rank: 2 : Finished reading nnz
> rank: 2 : Beginning mat preallocation...
> rank: 6 : Finished reading nnz
> rank: 6 : Beginning mat preallocation...
> rank: 5 : Finished preallocation
> rank: 5 : Beginning reading and setting matrix values...
> rank: 1 : Finished preallocation
> rank: 1 : Beginning reading and setting matrix values...
> rank: 7 : Finished preallocation
> rank: 7 : Beginning reading and setting matrix values...
> rank: 2 : Finished preallocation
> rank: 2 : Beginning reading and setting matrix values...
> rank: 4 : Finished preallocation
> rank: 4 : Beginning reading and setting matrix values...
> rank: 0 : Finished preallocation
> rank: 0 : Beginning reading and setting matrix values...
> rank: 3 : Finished preallocation
> rank: 3 : Beginning reading and setting matrix values...
> rank: 6 : Finished preallocation
> rank: 6 : Beginning reading and setting matrix values...
> rank: 1 : Finished reading and setting matrix values
> rank: 1 : Beginning mat assembly...
> rank: 5 : Finished reading and setting matrix values
> rank: 5 : Beginning mat assembly...
> rank: 4 : Finished reading and setting matrix values
> rank: 4 : Beginning mat assembly...
> rank: 2 : Finished reading and setting matrix values
> rank: 2 : Beginning mat assembly...
> rank: 3 : Finished reading and setting matrix values
> rank: 3 : Beginning mat assembly...
> rank: 7 : Finished reading and setting matrix values
> rank: 7 : Beginning mat assembly...
> rank: 6 : Finished reading and setting matrix values
> rank: 6 : Beginning mat assembly...
> rank: 0 : Finished reading and setting matrix values
> rank: 0 : Beginning mat assembly...
> rank: 1 : Finished mat assembly
> rank: 3 : Finished mat assembly
> rank: 7 : Finished mat assembly
> rank: 0 : Finished mat assembly
> rank: 5 : Finished mat assembly
> rank: 2 : Finished mat assembly
> rank: 4 : Finished mat assembly
> rank: 6 : Finished mat assembly
>
> --Junchao Zhang
>
>
> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I will have a look and report back to you. Thanks.
>> --Junchao Zhang
>>
>>
>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiamanav at gmail.com>
>> wrote:
>>
>>> I have created a standalone test that demonstrates the problem at my
>>> end. I have stored the indices, etc. from my problem in a text file
>>> for each rank, which I use to initialize the matrix.
>>> Please note that the test is specifically for 8 ranks.
>>>
>>> The .tgz file is on my google drive:
>>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing
>>>
>>>
>>> This contains a README file with instructions on running. Please note
>>> that the work directory needs the index files.
>>>
>>> Please let me know if I can provide any further information.
>>>
>>> Thank you all for your help.
>>>
>>> Regards,
>>> Manav
>>>
>>> On Aug 20, 2020, at 12:54 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>
>>> Matthew Knepley <knepley at gmail.com> writes:
>>>
>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiamanav at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zampini at gmail.com>
>>> wrote:
>>>
>>> Can you add a MPI_Barrier before
>>>
>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>>>
>>>
>>> With a MPI_Barrier before this function call:
>>> — three of the processes have already hit this barrier,
>>> — the other 5 are inside MatStashScatterGetMesg_Private ->
>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3
>>> processes)
>>>
>>>
>>> This is not itself evidence of inconsistent state. You can use
>>>
>>> -build_twosided allreduce
>>>
>>> to avoid the nonblocking sparse algorithm.
>>>
>>>
>>> Okay, you should run this with -matstash_legacy just to make sure it is
>>> not
>>> a bug in your MPI implementation. But it looks like
>>> there is inconsistency in the parallel state. This can happen because we
>>> have a bug, or it could be that you called a collective
>>> operation on a subset of the processes. Is there any way you could cut
>>> down
>>> the example (say put all 1s in the matrix, etc) so
>>> that you could give it to us to run?
>>>
>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200821/bd33f2e8/attachment.html>
More information about the petsc-users
mailing list