<div dir="ltr"><div dir="ltr">On Thu, Nov 23, 2023 at 9:01 AM Joauma Marichal <<a href="mailto:joauma.marichal@uclouvain.be">joauma.marichal@uclouvain.be</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg2433361434654699384">
<div lang="FR-BE" style="overflow-wrap: break-word;">
<div class="m_2433361434654699384WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Hello,
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">My problem persists… Is there anything I could try?</span></p></div></div></div></blockquote><div><br></div><div>Yes. It appears to be failing from a call inside PetscSFSetUpRanks(). It does allocation, and the failure</div><div>is in libc, and it only happens on larger examples, so I suspect some allocation problem. Can you rebuild with debugging and run this example? Then we can see if the allocation fails.</div><div><br></div> Thanks,<br><br> Matt<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg2433361434654699384"><div lang="FR-BE" style="overflow-wrap: break-word;"><div class="m_2433361434654699384WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Thanks a lot.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Best regards,
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Joauma<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt"><u></u> <u></u></span></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(181,196,223);padding:3pt 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12pt"><b><span style="font-size:12pt;color:black">De :
</span></b><span style="font-size:12pt;color:black">Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>><br>
<b>Date : </b>mercredi, 25 octobre 2023 à 14:45<br>
<b>À : </b>Joauma Marichal <<a href="mailto:joauma.marichal@uclouvain.be" target="_blank">joauma.marichal@uclouvain.be</a>><br>
<b>Cc : </b><a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a> <<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>>, <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Objet : </b>Re: [petsc-maint] DMSwarm on multiple processors<u></u><u></u></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint <<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>> wrote:<u></u><u></u></span></p>
</div>
<div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">Hello, </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number
of processors used to run the simulation, I get the following error:</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">free(): invalid size</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] *** Process received signal ***</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] Signal: Aborted (6)</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] Signal code:</span></span><span class="m_2433361434654699384m-1889958021554448809apple-converted-space"><span lang="EN-US">
</span></span><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">(-6)</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1">[cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]</span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1">[cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]</span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1">[cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]</span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1">[cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]</span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [13] ./cobpor[0x4418dc]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [14] ./cobpor[0x408b63]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] [16] ./cobpor[0x40bdee]</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">[cns136:590327] *** End of error message ***</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">--------------------------------------------------------------------------</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">Primary job</span></span><span class="m_2433361434654699384m-1889958021554448809apple-converted-space"><span lang="EN-US">
</span></span><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">terminated normally, but 1 process returned</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1">a non-zero exit code.
</span><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">Per user-direction, the job has been aborted.</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">--------------------------------------------------------------------------</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">--------------------------------------------------------------------------</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).</span></span><u></u><u></u></p>
<p class="m_2433361434654699384m-1889958021554448809p1"><span class="m_2433361434654699384m-1889958021554448809s1"><span lang="EN-US">--------------------------------------------------------------------------</span></span><u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">The problem seems to take place at this moment:</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal" style="line-height:13.5pt;background:rgb(30,30,30)">
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(220,220,170)">DMCreate</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">(PETSC_COMM_WORLD,swarm);</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal" style="line-height:13.5pt;background:rgb(30,30,30)">
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)"> </span>
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(220,220,170)">DMSetType</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">(*swarm,DMSWARM);</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal" style="line-height:13.5pt;background:rgb(30,30,30)">
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)"> </span>
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(220,220,170)">DMSetDimension</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">(*swarm,</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(181,206,168)">3</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">);</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal" style="line-height:13.5pt;background:rgb(30,30,30)">
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)"> </span>
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(220,220,170)">DMSwarmSetType</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">(*swarm,DMSWARM_PIC);</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal" style="line-height:13.5pt;background:rgb(30,30,30)">
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)"> </span>
<span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(220,220,170)">DMSwarmSetCellDM</span><span lang="EN-US" style="font-size:9pt;font-family:Menlo;color:rgb(212,212,212)">(*swarm,*dmcell);</span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">Thanks a lot for your help. </span><span style="font-size:11pt"><u></u><u></u></span></p>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">Things that would help us track this down:<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">1) The smallest example where it fails<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">2) The smallest number of processes where it fails<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">3) A stack trace of the failure<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">4) A simple example that we can run that also fails<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> Thanks,<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> Matt<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> <u></u><u></u></span></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">Best regards, </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)"> </span><span style="font-size:11pt"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;color:rgb(33,33,33)">Joauma</span><span style="font-size:11pt"><u></u><u></u></span></p>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span style="font-size:11pt"><br clear="all">
<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<p class="MsoNormal"><span class="m_2433361434654699384gmailsignatureprefix"><span style="font-size:11pt">--
</span></span><span style="font-size:11pt"><u></u><u></u></span></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><u></u><u></u></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>