<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Dear PETSc users,<div class=""><br class=""></div><div class="">We are trying to understand an issue that has come up in running our code on a large cloud cluster with a large number of processes and subcomms.</div><div class="">This is code that we use daily on multiple clusters without problems, and that runs valgrind clean for small test problems.</div><div class=""><br class=""></div><div class="">The run generates the following messages, but doesn’t crash, just seems to hang with all processes continuing to show activity:</div><div class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 10pt; font-family: Verdana, sans-serif; color: rgb(31, 73, 125);" class="">[492]PETSC ERROR: #1 PetscGatherMessageLengths() line 117 in /mnt/home/cgg/PETSc/petsc-3.12.4/src/sys/utils/mpimesg.c<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 10pt; font-family: Verdana, sans-serif; color: rgb(31, 73, 125);" class="">[492]PETSC ERROR: #2 VecScatterSetUp_SF() line 658 in /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/impls/sf/vscatsf.c<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 10pt; font-family: Verdana, sans-serif; color: rgb(31, 73, 125);" class="">[492]PETSC ERROR: #3 VecScatterSetUp() line 209 in /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/interface/vscatfce.c<o:p class=""></o:p></span></div><div class=""><span style="color: rgb(31, 73, 125); font-family: Verdana, sans-serif; font-size: 10pt;" class="">[492]PETSC ERROR: #4 VecScatterCreate() line 282 in /mnt/home/cgg/PETSc/petsc-3.12.4/src/vec/vscat/interface/vscreate.c</span></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Looking at line 117 in PetscGatherMessageLengths we find the offending statement is the MPI_Isend:</div><div class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="color: rgb(31, 73, 125);" class=""> </span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> /* Post the Isends with the message length-info */</span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> for (i=0,j=0; i<size; ++i) {</span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> if (ilengths[i]) {</span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> ierr = MPI_Isend((void*)(ilengths+i),1,MPI_INT,i,tag,comm,s_waits+j);CHKERRQ(ierr);</span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> j++;</span><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);" class=""> }</span><o:p class=""></o:p></div><div class=""><span style="color: rgb(31, 73, 125); font-family: "Courier New"; font-size: 11pt;" class=""> }</span> </div><div class=""><br class=""></div><div class="">We have tried this with Intel MPI 2018, 2019, and mpich, all giving the same problem.</div><div class=""><br class=""></div><div class="">We suspect there is some limit being set on this cloud cluster on the number of file connections or something, but we don’t know.</div><div class=""><br class=""></div><div class="">Anyone have any ideas? We are sort of grasping for straws at this point.</div><div class=""><br class=""></div><div class="">Thanks, Randy M.</div></body></html>