Hi Hong, <br><br>Thanks for your reply. <br><br>Yes, I am using superlu_dist as parallel direct solver. And I will ask our cluster administrator to install mumps and give it a shot. <br><br>But I still have a question. For the test purpose, I have already made my code as simple as just repeat solving a same well conditioning matrix. And for most cases, it can do the solving, maybe 2 times, 3 times or even more until it loses one process and get stuck. Basically what I did not quite understand is why it did not give out any error information and abort the job even one of their processes is gone. Is it just the problem with cluster or might it also be the problem with the libraries. Thanks.<br>
<br>Regards,<br><br>Wen<br><br><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Wen:<br>
Do you use superlu_dist as parallel direct solver?<br>
Suggest also install mumps.<br>
(need F90 compiler, configure petsc with<br>
'--download-blacs --download-scalapack --download-mumps').<br>
When superlu_dist fails, switch to mumps<br>
(use runtime option '-pc_factor_mat_solver_package mumps').<br>
If both solvers fail, something might be wrong with your model or code.<br>
<br>
Hong<br>
<br></blockquote></div>