[petsc-dev] error with karlrupp/fix-cuda-streams

Sat Sep 28 12:17:42 CDT 2019

The logic is basically correct because I simple zero out yy vector (the
output vector) and it runs great now. The numerics look fine without CPU
pinning.

AND, it worked with 1,2, and 3 GPUs (one node, one socket), but failed with
4 GPU's which uses the second socket. Strange.

On Sat, Sep 28, 2019 at 3:43 AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Mark,
>
>
> MatMultTransposeAdd_SeqAIJCUSPARSE checks if the matrix is in compressed
> row storage, MatMultTranspose_SeqAIJCUSPARSE does not. Probably is this
> the issue? The CUSPARSE classes are kind of messy
>
>
>
> Il giorno sab 28 set 2019 alle ore 07:55 Karl Rupp via petsc-dev <
> petsc-dev at mcs.anl.gov> ha scritto:
>
>> Hi Mark,
>>
>> > OK, so now the problem has shifted somewhat in that it now manifests
>> > itself on small cases. In earlier investigation I was drawn to
>> > MatTranspose but had a hard time pinning it down. The bug seems more
>> > stable now or you probably fixed what looks like all the other bugs.
>> >
>> > I added print statements with norms of vectors in mg.c (v-cycle) and
>> > found that the diffs between the CPU and GPU runs came in MatRestrict,
>> > which calls MatMultTranspose. I added identical print statements in the
>> > two versions of MatMultTranspose and see this. (pinning to the CPU does
>> > not seem to make any difference). Note that the problem comes in the
>> 2nd
>> > iteration where the *output* vector is non-zero coming in (this should
>> > not matter).
>> >
>> > Karl, I zeroed out the output vector (yy) when I come into this method
>> > and it fixed the problem. This is with -n 4, and this always works with
>> > -n 3. See the attached process layouts. It looks like this comes when
>> > you use the 2nd socket.
>> >
>> > So this looks like an Nvidia bug. Let me know what you think and I can
>> > pass it on to ORNL.
>>
>> Hmm, there were some issues with MatMultTranspose_MPIAIJ at some point.
>> I've addressed some of them, but I can't confidently say that all of the
>> issues were fixed. Thus, I don't think it's a problem in NVIDIA's
>> cuSparse, but rather something we need to fix in PETSc. Note that the
>> problem shows up with multiple MPI ranks; if it were a problem in
>> cuSparse, it would show up on a single rank as well.
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>>
>> > 06:49  /gpfs/alpine/geo127/scratch/adams$ jsrun*-n 4 *-a 4 -c 4 -g 1
>> > ./ex56 -cells 8,12,16 *-ex56_dm_vec_type cuda -ex56_dm_mat_type
>> aijcusparse*
>> > [0] 3465 global equations, 1155 vertices
>> > [0] 3465 equations in vector, 1155 vertices
>> >    0 SNES Function norm 1.725526579328e+01
>> >      0 KSP Residual norm 1.725526579328e+01
>> >          2) call Restrict with |r| = 1.402719214830704e+01
>> >                          MatMultTranspose_MPIAIJCUSPARSE |x in| =
>> > 1.40271921483070e+01
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 0.00000000000000e+00
>> > *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| =
>> > 3.43436359545813e+00
>> >                          MatMultTranspose_MPIAIJCUSPARSE final |yy| =
>> > 1.29055494844681e+01
>> >                  3) |R| = 1.290554948446808e+01
>> >          2) call Restrict with |r| = 4.109771717986951e+00
>> >                          MatMultTranspose_MPIAIJCUSPARSE |x in| =
>> > 4.10977171798695e+00
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 0.00000000000000e+00
>> > *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| =
>> > 1.79415048609144e-01
>> >                          MatMultTranspose_MPIAIJCUSPARSE final |yy| =
>> > 9.01083013948788e-01
>> >                  3) |R| = 9.010830139487883e-01
>> >                  4) |X| = 2.864698671963022e+02
>> >                  5) |x| = 9.763280000911783e+02
>> >                  6) post smooth |x| = 8.940011621494751e+02
>> >                  4) |X| = 8.940011621494751e+02
>> >                  5) |x| = 1.005081556495388e+03
>> >                  6) post smooth |x| = 1.029043994031627e+03
>> >      1 KSP Residual norm 8.102614049404e+00
>> >          2) call Restrict with |r| = 4.402603749876137e+00
>> >                          MatMultTranspose_MPIAIJCUSPARSE |x in| =
>> > 4.40260374987614e+00
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 1.29055494844681e+01
>> > *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| =
>> > 1.68544559626318e+00
>> >                          MatMultTranspose_MPIAIJCUSPARSE final |yy| =
>> > 1.82129824300863e+00
>> >                  3) |R| = 1.821298243008628e+00
>> >          2) call Restrict with |r| = 1.068309793900564e+00
>> >                          MatMultTranspose_MPIAIJCUSPARSE |x in| =
>> > 1.06830979390056e+00
>> >                          MatMultTranspose_MPIAIJ |y in| =
>> > 9.01083013948788e-01
>> >                          MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| =
>> > 1.40519177065298e-01
>> >                          MatMultTranspose_MPIAIJCUSPARSE final |yy| =
>> > 1.01853904152812e-01
>> >                  3) |R| = 1.018539041528117e-01
>> >                  4) |X| = 4.949616392884510e+01
>> >                  5) |x| = 9.309440014159884e+01
>> >                  6) post smooth |x| = 5.432486021529479e+01
>> >                  4) |X| = 5.432486021529479e+01
>> >                  5) |x| = 8.246142532204632e+01
>> >                  6) post smooth |x| = 7.605703654091440e+01
>> >    Linear solve did not converge due to DIVERGED_ITS iterations 1
>> > Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> iterations 0
>> > 06:50  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 4 -a 4 -c 4 -g 1
>> > ./ex56 -cells 8,12,16
>> > [0] 3465 global equations, 1155 vertices
>> > [0] 3465 equations in vector, 1155 vertices
>> >    0 SNES Function norm 1.725526579328e+01
>> >      0 KSP Residual norm 1.725526579328e+01
>> >          2) call Restrict with |r| = 1.402719214830704e+01
>> >                          MatMultTranspose_MPIAIJ |x in| =
>> > 1.40271921483070e+01
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 0.00000000000000e+00
>> > *                        MatMultTranspose_MPIAIJ |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJ |yy| =
>> > 3.43436359545813e+00
>> >                          MatMultTranspose_MPIAIJ final |yy| =
>> > 1.29055494844681e+01
>> >                  3) |R| = 1.290554948446809e+01
>> >          2) call Restrict with |r| = 4.109771717986956e+00
>> >                          MatMultTranspose_MPIAIJ |x in| =
>> > 4.10977171798696e+00
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 0.00000000000000e+00
>> > *                        MatMultTranspose_MPIAIJ |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJ |yy| =
>> > 1.79415048609143e-01
>> >                          MatMultTranspose_MPIAIJ final |yy| =
>> > 9.01083013948789e-01
>> >                  3) |R| = 9.010830139487889e-01
>> >                  4) |X| = 2.864698671963023e+02
>> >                  5) |x| = 9.763280000911785e+02
>> >                  6) post smooth |x| = 8.940011621494754e+02
>> >                  4) |X| = 8.940011621494754e+02
>> >                  5) |x| = 1.005081556495388e+03
>> >                  6) post smooth |x| = 1.029043994031627e+03
>> >      1 KSP Residual norm 8.102614049404e+00
>> >          2) call Restrict with |r| = 4.402603749876139e+00
>> >                          MatMultTranspose_MPIAIJ |x in| =
>> > 4.40260374987614e+00
>> > *                        MatMultTranspose_MPIAIJ |y in| =
>> > 1.29055494844681e+01
>> > *                        MatMultTranspose_MPIAIJ |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJ |yy| =
>> > 4.43650979822523e-01
>> >                          MatMultTranspose_MPIAIJ final |yy| =
>> > 1.18089369006243e+00
>> >                  3) |R| = 1.180893690062426e+00
>> >          2) call Restrict with |r| = 6.868764720156294e-01
>> >                          MatMultTranspose_MPIAIJ |x in| =
>> > 6.86876472015629e-01
>> >                          MatMultTranspose_MPIAIJ |y in| =
>> > 9.01083013948789e-01
>> >                          MatMultTranspose_MPIAIJ |a->lvec| =
>> > 0.00000000000000e+00
>> >                          *** MatMultTranspose_MPIAIJ |yy| =
>> > 3.36768099045088e-02
>> >                          MatMultTranspose_MPIAIJ final |yy| =
>> > 6.40334376876017e-02
>> >                  3) |R| = 6.403343768760170e-02
>> >                  4) |X| = 2.380471873599142e+01
>> >                  5) |x| = 6.932703848368443e+01
>> >                  6) post smooth |x| = 4.502536862656444e+01
>> >                  4) |X| = 4.502536862656444e+01
>> >                  5) |x| = 7.998534854728734e+01
>> >                  6) post smooth |x| = 7.660075651381680e+01
>> >    Linear solve did not converge due to DIVERGED_ITS iterations 1
>> > Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> iterations 0
>> > 06:50  /gpfs/alpine/geo127/scratch/adams$
>> >
>>
>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190928/32e47d74/attachment.html>