On Tue, Oct 23, 2012 at 1:03 PM, Jinquan Zhong <span dir="ltr"><<a href="mailto:jzhong@scsolutions.com" target="_blank">jzhong@scsolutions.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Dear folks,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I have a question on how to use mumps properly in PETSc. It appears that I didn’t set up mumps right. I followed the example in
<u></u><u></u></span></p>
<p class="MsoNormal" style="text-indent:.5in"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="http://www.mcs.anl.gov/petsc/petsc-dev/src/mat/examples/tests/ex125.c.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/src/mat/examples/tests/ex125.c.html</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">to set up my program. Here is my situation on using the default setting
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<pre> <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Sys/PetscInt.html#PetscInt" target="_blank">PetscInt</a> icntl_7 = 5;<u></u><u></u></pre>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">
<a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatMumpsSetIcntl.html#MatMumpsSetIcntl" target="_blank">
MatMumpsSetIcntl</a>(F,7,icntl_7);<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">in the example ex125.c:
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>1.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The program work fine on all
<b>small</b> models (sparse matrices at the order of m= 894, 1097, 31k with a dense matrix included in the sparse matrix). The residuals are at the magnitude of 10^-3.</span></p></div></div></div></blockquote><div><br></div>
<div>This suggests that your systems are nearly singular. If you have a condition number of 1e12, it's time to reconsider the model.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple"><div><div><p><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>2.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The program has some issues on
<b>medium</b> size problem (m=460k with a dense matrix at the order of n=30k included in the sparse matrix). The full sparse matrix is sized at 17GB.<u></u><u></u></span></p>
<p style="margin-left:1.0in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">We used another software to generate sparse matrix by using 144 cores:
<u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>i.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 144 cores (12 nodes with 48GB/node), it could not provide
the solution. There was a complain on the memory violation.</span></p></div></div></div></blockquote><div>Always send the entire error message.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:1.5in"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>ii.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 432 cores (36 nodes with 48GB/node), it provided the solution.
<u></u><u></u></span></p>
<p style="margin-left:1.0in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>b.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">We used another software to generate the same sparse matrix by using 576 cores:
<u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>i.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 576 cores (48 nodes with 48GB/node), it could not provide
the solution. There was a complain on the memory violation.<u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>ii.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 1152 cores (96 nodes with 48GB/node), it provided the solution.
<u></u><u></u></span></p>
<p style="margin-left:1.5in"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>3.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The program could not solve the
<b>large</b> size problem (m=640k with a dense matrix at the order of n=178k included in the sparse matrix). The full sparse matrix is sized at 511GB.<u></u><u></u></span></p>
<p style="margin-left:1.0in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">We used another software to generate sparse matrix by using 900 cores:
<u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>i.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 900 cores (75 nodes with 48GB/node), it could not provide
the solution. There was a complain on the memory violation.<u></u><u></u></span></p>
<p style="margin-left:1.5in">
<u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span><span style="font:7.0pt "Times New Roman"">
</span>ii.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When I used the resource from 2400 cores (200 nodes with 48GB/node), it STILL COULD NOT
provide the solution. </span></p></div></div></div></blockquote><div><br></div><div>This has a huge dense block and we can't tell from your description how large the vertex separators are. MUMPS is well-known to have some non-scalable data structures. They do some analysis on rank 0 and require right hand sides to be provided entirely on rank 0.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:1.5in"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> <u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">My confusion starts from the medium size problem:<u></u><u></u></span></p>
<p style="margin-left:41.45pt">
<u></u><span style="font-size:11.0pt;font-family:Symbol;color:#1f497d"><span>·<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">It seems something was not right in the default setting in ex125.c for these problems. </span></p>
</div></div></div></blockquote><div>I don't know what you're asking. Do a heap profile if you want to find out where the memory is leaking. It's *much* better to call the solver directly from the process that assembles the matrix. Going through a file is terribly wasteful.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:41.45pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">
<u></u><u></u></span></p>
<p style="margin-left:41.45pt">
<u></u><span style="font-size:11.0pt;font-family:Symbol;color:#1f497d"><span>·<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I got the info that METIS was used instead of ParMETIS in solving these problems. </span></p>
</div></div></div></blockquote><div>Did you ask for parallel ordering (-mat_mumps_icntl_28) and parmetis (-mat_mumps_icntl_29)? These options are shown in -help. (It's not our fault the MUMPS developers have Fortran numbered option insanity baked in. Read their manual and translate to our numbered options. While you're at it, ask them to write a decent options and error reporting mechanism.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:41.45pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">
<u></u><u></u></span></p>
<p style="margin-left:41.45pt">
<u></u><span style="font-size:11.0pt;font-family:Symbol;color:#1f497d"><span>·<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Furthermore, it appears that there was unreasonable demand on the solver even on the medium size problem. </span></p>
</div></div></div></blockquote><div>Good, it's easier to debug. Check -log_summary for that run and use the heap profilers at your computing facility.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:41.45pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p style="margin-left:41.45pt">
<u></u><span style="font-size:11.0pt;font-family:Symbol;color:#1f497d"><span>·<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I suspect one rank was trying to collect all data from other ranks. </span></p></div></div></div>
</blockquote><div>Naturally.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><div><p style="margin-left:41.45pt">
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">What other addition setting is needed for mumps such that it could deal with medium and large size problems?<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Do you guys have similar experience on that?<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thanks,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Jinquan<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
</div>
</div>
</div>
</blockquote></div><br>