<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Junchao,<div class=""><br class=""></div><div class="">So I was able to create a small test code that duplicates the issue we have been having, and it is attached to this email in a zip file.</div><div class="">Included is the test.F90 code, the commands to duplicate crash and to duplicate a successful run, output errors, and our petsc configuration.</div><div class=""><br class=""></div><div class="">Our findings to date include:</div><div class=""><br class=""></div><div class=""><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class="">The error is reproducible in a very short time with this script</div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class=""><span style="font-size: 11pt; text-indent: -0.25in;" class="">It is related to nproc*nsubs and (although to a less extent) to DM grid size</span></div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class=""><o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class="">It happens regardless of MPI implementation (mpich, intel mpi 2018, 2019, openmpi) or compiler (gfortran/gcc , intel 2018)<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class="">No effect changing vecscatter_type to mpi1 or mpi3. Mpi1 seems to slightly increase the limit, but still fails on the full machine set.<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt 0.5in; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: -0.25in;" class="">Nothing looks interesting on valgrind</div><div><br class=""></div><div>Our initial tests were carried out on an Azure cluster, but we also tested on our smaller cluster, and we found the following:</div><div><br class=""></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Works:<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">$PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1280 -hostfile hostfile ./test -nsubs 80 -nx 100 -ny 100 -nz 100<span style="background-color: black;" class=""><o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><span style="color: rgb(31, 73, 125);" class=""> </span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><span style="color: rgb(31, 73, 125);" class="">Crashes (this works on Azure)<o:p class=""></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">$PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2560 -hostfile hostfile ./test -nsubs 80 -nx 100 -ny 100 -nz 100</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">So it looks like it may also be related to the physical number of nodes as well.</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">In any case, even with 2560 processes on 192 cores the memory does not go above 3.5 Gbyes so you don’t need a huge cluster to test.</div></div><div><br class=""></div><div>Thanks,</div></div><div><br class=""></div><div>Randy M.</div><div><br class=""></div><div></div></body></html>