Wen:<div>Do you use superlu_dist as parallel direct solver?</div><div>Suggest also install mumps. </div><div>(need F90 compiler, configure petsc with '--download-blacs --download-scalapack --download-mumps').</div>
<div>When superlu_dist fails, switch to mumps</div><div>(use runtime option '-pc_factor_mat_solver_package mumps').</div><div>If both solvers fail, something might be wrong with your model or code.</div><div><br></div>
<div>Hong</div><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br><br>I reported this several days ago and I found my code just hanged inside Super LU Dist solve. For the test purpose, I let my code keep on solving a same linear system many times. My code will still hang at solving step but not at the same stage every time. My code was distributed on 4 nodes and each node had 4 processes(totally 16 processes). Before it gets stuck, one process will disappear, which means that I can no longer see it by the top command. The Other 15 processes are still running. I think those processes might not know that one has been lost and just keep on waiting for it. It looks like the cluster system kills that process without giving me any error information. I am pretty sure that the memory is quite big enough for my calculation (each core has 6GB), so I cannot figure out what causes. I have very little knowledge about the cluster system and could you give me any hints on this issue. Is this the problem with PETSc, Super LU or the cluster? Thanks.<br>
<br>Regards,<br><br>Wen<br>
</blockquote></div><br></div>