<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div>  There really needs to be a usable extensive MPI test suite that can find these performance issues, we spend time helping users with these problems when it is really the MPI communities job.<div class=""><br class=""></div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 21, 2020, at 11:55 AM, Manav Bhatia <<a href="mailto:bhatiamanav@gmail.com" class="">bhatiamanav@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. <div class=""><br class=""></div><div class="">So, it appears that there is some issue with openmpi-4.0.1 on this machine. </div><div class=""><br class=""></div><div class="">I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. </div><div class=""><br class=""></div><div class="">Thank you again for your help. </div><div class=""><br class=""></div><div class="">Regards, </div><div class="">Manav</div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Aug 20, 2020, at 10:45 PM, Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Manav,<div class=""> I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.</div><div class=""> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured  --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg</div><div class=""> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg</div><div class=""><br class=""></div><div class="">mpirun -n 8 ./test<br class=""></div><div class="">rank: 1 : stdout.processor.1<br class="">rank: 4 : stdout.processor.4<br class="">rank: 0 : stdout.processor.0<br class="">rank: 5 : stdout.processor.5<br class="">rank: 6 : stdout.processor.6<br class="">rank: 7 : stdout.processor.7<br class="">rank: 3 : stdout.processor.3<br class="">rank: 2 : stdout.processor.2<br class="">rank: 1 : Beginning reading nnz...<br class="">rank: 4 : Beginning reading nnz...<br class="">rank: 0 : Beginning reading nnz...<br class="">rank: 5 : Beginning reading nnz...<br class="">rank: 7 : Beginning reading nnz...<br class="">rank: 2 : Beginning reading nnz...<br class="">rank: 3 : Beginning reading nnz...<br class="">rank: 6 : Beginning reading nnz...<br class="">rank: 5 : Finished reading nnz<br class="">rank: 5 : Beginning mat preallocation...<br class="">rank: 3 : Finished reading nnz<br class="">rank: 3 : Beginning mat preallocation...<br class="">rank: 4 : Finished reading nnz<br class="">rank: 4 : Beginning mat preallocation...<br class="">rank: 7 : Finished reading nnz<br class="">rank: 7 : Beginning mat preallocation...<br class="">rank: 1 : Finished reading nnz<br class="">rank: 1 : Beginning mat preallocation...<br class="">rank: 0 : Finished reading nnz<br class="">rank: 0 : Beginning mat preallocation...<br class="">rank: 2 : Finished reading nnz<br class="">rank: 2 : Beginning mat preallocation...<br class="">rank: 6 : Finished reading nnz<br class="">rank: 6 : Beginning mat preallocation...<br class="">rank: 5 : Finished preallocation<br class="">rank: 5 : Beginning reading and setting matrix values...<br class="">rank: 1 : Finished preallocation<br class="">rank: 1 : Beginning reading and setting matrix values...<br class="">rank: 7 : Finished preallocation<br class="">rank: 7 : Beginning reading and setting matrix values...<br class="">rank: 2 : Finished preallocation<br class="">rank: 2 : Beginning reading and setting matrix values...<br class="">rank: 4 : Finished preallocation<br class="">rank: 4 : Beginning reading and setting matrix values...<br class="">rank: 0 : Finished preallocation<br class="">rank: 0 : Beginning reading and setting matrix values...<br class="">rank: 3 : Finished preallocation<br class="">rank: 3 : Beginning reading and setting matrix values...<br class="">rank: 6 : Finished preallocation<br class="">rank: 6 : Beginning reading and setting matrix values...<br class="">rank: 1 : Finished reading and setting matrix values<br class="">rank: 1 : Beginning mat assembly...<br class="">rank: 5 : Finished reading and setting matrix values<br class="">rank: 5 : Beginning mat assembly...<br class="">rank: 4 : Finished reading and setting matrix values<br class="">rank: 4 : Beginning mat assembly...<br class="">rank: 2 : Finished reading and setting matrix values<br class="">rank: 2 : Beginning mat assembly...<br class="">rank: 3 : Finished reading and setting matrix values<br class="">rank: 3 : Beginning mat assembly...<br class="">rank: 7 : Finished reading and setting matrix values<br class="">rank: 7 : Beginning mat assembly...<br class="">rank: 6 : Finished reading and setting matrix values<br class="">rank: 6 : Beginning mat assembly...<br class="">rank: 0 : Finished reading and setting matrix values<br class="">rank: 0 : Beginning mat assembly...<br class="">rank: 1 : Finished mat assembly<br class="">rank: 3 : Finished mat assembly<br class="">rank: 7 : Finished mat assembly<br class="">rank: 0 : Finished mat assembly<br class="">rank: 5 : Finished mat assembly<br class="">rank: 2 : Finished mat assembly<br class="">rank: 4 : Finished mat assembly<br class="">rank: 6 : Finished mat assembly<br class=""></div><div class=""><br class=""></div><div class=""><div class=""><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr" class="">--Junchao Zhang</div></div></div><br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">I will have a look and report back to you. Thanks.<br clear="all" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class="">--Junchao Zhang</div></div></div><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <<a href="mailto:bhatiamanav@gmail.com" target="_blank" class="">bhatiamanav@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. <span style="" class=""> from my problem </span>in a text file for each rank, which I use to initialize the matrix.<div class="">Please note that the test is specifically for 8 ranks. </div><div class=""><br class=""></div><div class="">The .tgz file is on my google drive: <a href="https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing" target="_blank" class="">https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing</a> </div><div class=""><br class=""></div><div class="">This contains a README file with instructions on running. Please note that the work directory needs the index files. </div><div class=""><br class=""></div><div class="">Please let me know if I can provide any further information. </div><div class=""><br class=""></div><div class="">Thank you all for your help. </div><div class=""><br class=""></div><div class="">Regards,</div><div class="">Manav<br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Aug 20, 2020, at 12:54 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank" class="">jed@jedbrown.org</a>> wrote:</div><br class=""><div class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">Matthew Knepley <</span><a href="mailto:knepley@gmail.com" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">knepley@gmail.com</a><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">> writes:</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class="">On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <<a href="mailto:bhatiamanav@gmail.com" target="_blank" class="">bhatiamanav@gmail.com</a>> wrote:<br class=""><br class=""><blockquote type="cite" class=""><br class=""><br class="">On Aug 20, 2020, at 8:31 AM, Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" target="_blank" class="">stefano.zampini@gmail.com</a>><br class="">wrote:<br class=""><br class="">Can you add a MPI_Barrier before<br class=""><br class="">ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);<br class=""><br class=""><br class="">With a MPI_Barrier before this function call:<br class="">—  three of the processes have already hit this barrier,<br class="">—  the other 5 are inside MatStashScatterGetMesg_Private -><br class="">MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3<br class="">processes)<br class=""></blockquote></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">This is not itself evidence of inconsistent state.  You can use</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class=""> -build_twosided allreduce</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">to avoid the nonblocking sparse algorithm.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br class="">Okay, you should run this with -matstash_legacy just to make sure it is not<br class="">a bug in your MPI implementation. But it looks like<br class="">there is inconsistency in the parallel state. This can happen because we<br class="">have a bug, or it could be that you called a collective<br class="">operation on a subset of the processes. Is there any way you could cut down<br class="">the example (say put all 1s in the matrix, etc) so<br class="">that you could give it to us to run?</blockquote></div></blockquote></div><br class=""></div></div></blockquote></div>
</blockquote></div>
</div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>