<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 28, 2019 at 12:55 AM Karl Rupp <<a href="mailto:rupp@iue.tuwien.ac.at">rupp@iue.tuwien.ac.at</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Mark,<br>
<br>
> OK, so now the problem has shifted somewhat in that it now manifests <br>
> itself on small cases. </blockquote><div><br></div><div>It is somewhat random and anecdotal but it does happen on the smaller test problem now. When I try to narrow down when the problem manifests by reducing the number of GPUs/procs the problem can not be too small (ie, the bug does not manifest on even smaller problems).</div><div><br></div><div>But it is much more stable and there does seem to be only this one problem with mat-transpose-mult. You made a lot of progress. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">In earlier investigation I was drawn to <br>
> MatTranspose but had a hard time pinning it down. The bug seems more <br>
> stable now or you probably fixed what looks like all the other bugs.<br>
> <br>
> I added print statements with norms of vectors in mg.c (v-cycle) and <br>
> found that the diffs between the CPU and GPU runs came in MatRestrict, <br>
> which calls MatMultTranspose. I added identical print statements in the <br>
> two versions of MatMultTranspose and see this. (pinning to the CPU does <br>
> not seem to make any difference). Note that the problem comes in the 2nd <br>
> iteration where the *output* vector is non-zero coming in (this should <br>
> not matter).<br>
> <br>
> Karl, I zeroed out the output vector (yy) when I come into this method <br>
> and it fixed the problem. This is with -n 4, and this always works with <br>
> -n 3. See the attached process layouts. It looks like this comes when <br>
> you use the 2nd socket.<br>
> <br>
> So this looks like an Nvidia bug. Let me know what you think and I can <br>
> pass it on to ORNL.<br>
<br>
Hmm, there were some issues with MatMultTranspose_MPIAIJ at some point. <br>
I've addressed some of them, but I can't confidently say that all of the <br>
issues were fixed. Thus, I don't think it's a problem in NVIDIA's <br>
cuSparse, but rather something we need to fix in PETSc. Note that the <br>
problem shows up with multiple MPI ranks; </blockquote><div><br></div><div>It seems to need to use two sockets. My current test works with 1,2, and 3 GPUs (one socket) but fails with 4, when you go to the second socket.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">if it were a problem in <br>
cuSparse, it would show up on a single rank as well.<br></blockquote><div><br></div><div>What I am seeing is consistent with CUSPARSE having a race condition in zeroing out the output vector in some way, But I don't know.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Best regards,<br>
Karli<br>
<br>
<br>
<br>
<br>
<br>
> 06:49  /gpfs/alpine/geo127/scratch/adams$ jsrun*-n 4 *-a 4 -c 4 -g 1 <br>
> ./ex56 -cells 8,12,16 *-ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse*<br>
> [0] 3465 global equations, 1155 vertices<br>
> [0] 3465 equations in vector, 1155 vertices<br>
>    0 SNES Function norm 1.725526579328e+01<br>
>      0 KSP Residual norm 1.725526579328e+01<br>
>          2) call Restrict with |r| = 1.402719214830704e+01<br>
>                          MatMultTranspose_MPIAIJCUSPARSE |x in| = <br>
> 1.40271921483070e+01<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 0.00000000000000e+00<br>
> *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| = <br>
> 3.43436359545813e+00<br>
>                          MatMultTranspose_MPIAIJCUSPARSE final |yy| = <br>
> 1.29055494844681e+01<br>
>                  3) |R| = 1.290554948446808e+01<br>
>          2) call Restrict with |r| = 4.109771717986951e+00<br>
>                          MatMultTranspose_MPIAIJCUSPARSE |x in| = <br>
> 4.10977171798695e+00<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 0.00000000000000e+00<br>
> *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| = <br>
> 1.79415048609144e-01<br>
>                          MatMultTranspose_MPIAIJCUSPARSE final |yy| = <br>
> 9.01083013948788e-01<br>
>                  3) |R| = 9.010830139487883e-01<br>
>                  4) |X| = 2.864698671963022e+02<br>
>                  5) |x| = 9.763280000911783e+02<br>
>                  6) post smooth |x| = 8.940011621494751e+02<br>
>                  4) |X| = 8.940011621494751e+02<br>
>                  5) |x| = 1.005081556495388e+03<br>
>                  6) post smooth |x| = 1.029043994031627e+03<br>
>      1 KSP Residual norm 8.102614049404e+00<br>
>          2) call Restrict with |r| = 4.402603749876137e+00<br>
>                          MatMultTranspose_MPIAIJCUSPARSE |x in| = <br>
> 4.40260374987614e+00<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 1.29055494844681e+01<br>
> *                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| = <br>
> 1.68544559626318e+00<br>
>                          MatMultTranspose_MPIAIJCUSPARSE final |yy| = <br>
> 1.82129824300863e+00<br>
>                  3) |R| = 1.821298243008628e+00<br>
>          2) call Restrict with |r| = 1.068309793900564e+00<br>
>                          MatMultTranspose_MPIAIJCUSPARSE |x in| = <br>
> 1.06830979390056e+00<br>
>                          MatMultTranspose_MPIAIJ |y in| = <br>
> 9.01083013948788e-01<br>
>                          MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJCUSPARSE |yy| = <br>
> 1.40519177065298e-01<br>
>                          MatMultTranspose_MPIAIJCUSPARSE final |yy| = <br>
> 1.01853904152812e-01<br>
>                  3) |R| = 1.018539041528117e-01<br>
>                  4) |X| = 4.949616392884510e+01<br>
>                  5) |x| = 9.309440014159884e+01<br>
>                  6) post smooth |x| = 5.432486021529479e+01<br>
>                  4) |X| = 5.432486021529479e+01<br>
>                  5) |x| = 8.246142532204632e+01<br>
>                  6) post smooth |x| = 7.605703654091440e+01<br>
>    Linear solve did not converge due to DIVERGED_ITS iterations 1<br>
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0<br>
> 06:50  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 4 -a 4 -c 4 -g 1 <br>
> ./ex56 -cells 8,12,16<br>
> [0] 3465 global equations, 1155 vertices<br>
> [0] 3465 equations in vector, 1155 vertices<br>
>    0 SNES Function norm 1.725526579328e+01<br>
>      0 KSP Residual norm 1.725526579328e+01<br>
>          2) call Restrict with |r| = 1.402719214830704e+01<br>
>                          MatMultTranspose_MPIAIJ |x in| = <br>
> 1.40271921483070e+01<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 0.00000000000000e+00<br>
> *                        MatMultTranspose_MPIAIJ |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJ |yy| = <br>
> 3.43436359545813e+00<br>
>                          MatMultTranspose_MPIAIJ final |yy| = <br>
> 1.29055494844681e+01<br>
>                  3) |R| = 1.290554948446809e+01<br>
>          2) call Restrict with |r| = 4.109771717986956e+00<br>
>                          MatMultTranspose_MPIAIJ |x in| = <br>
> 4.10977171798696e+00<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 0.00000000000000e+00<br>
> *                        MatMultTranspose_MPIAIJ |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJ |yy| = <br>
> 1.79415048609143e-01<br>
>                          MatMultTranspose_MPIAIJ final |yy| = <br>
> 9.01083013948789e-01<br>
>                  3) |R| = 9.010830139487889e-01<br>
>                  4) |X| = 2.864698671963023e+02<br>
>                  5) |x| = 9.763280000911785e+02<br>
>                  6) post smooth |x| = 8.940011621494754e+02<br>
>                  4) |X| = 8.940011621494754e+02<br>
>                  5) |x| = 1.005081556495388e+03<br>
>                  6) post smooth |x| = 1.029043994031627e+03<br>
>      1 KSP Residual norm 8.102614049404e+00<br>
>          2) call Restrict with |r| = 4.402603749876139e+00<br>
>                          MatMultTranspose_MPIAIJ |x in| = <br>
> 4.40260374987614e+00<br>
> *                        MatMultTranspose_MPIAIJ |y in| = <br>
> 1.29055494844681e+01<br>
> *                        MatMultTranspose_MPIAIJ |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJ |yy| = <br>
> 4.43650979822523e-01<br>
>                          MatMultTranspose_MPIAIJ final |yy| = <br>
> 1.18089369006243e+00<br>
>                  3) |R| = 1.180893690062426e+00<br>
>          2) call Restrict with |r| = 6.868764720156294e-01<br>
>                          MatMultTranspose_MPIAIJ |x in| = <br>
> 6.86876472015629e-01<br>
>                          MatMultTranspose_MPIAIJ |y in| = <br>
> 9.01083013948789e-01<br>
>                          MatMultTranspose_MPIAIJ |a->lvec| = <br>
> 0.00000000000000e+00<br>
>                          *** MatMultTranspose_MPIAIJ |yy| = <br>
> 3.36768099045088e-02<br>
>                          MatMultTranspose_MPIAIJ final |yy| = <br>
> 6.40334376876017e-02<br>
>                  3) |R| = 6.403343768760170e-02<br>
>                  4) |X| = 2.380471873599142e+01<br>
>                  5) |x| = 6.932703848368443e+01<br>
>                  6) post smooth |x| = 4.502536862656444e+01<br>
>                  4) |X| = 4.502536862656444e+01<br>
>                  5) |x| = 7.998534854728734e+01<br>
>                  6) post smooth |x| = 7.660075651381680e+01<br>
>    Linear solve did not converge due to DIVERGED_ITS iterations 1<br>
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0<br>
> 06:50  /gpfs/alpine/geo127/scratch/adams$<br>
> <br>
</blockquote></div></div>