[petsc-users] one process is lost but others are still running (Hong Zhang)

Thu Apr 5 22:09:12 CDT 2012

Hi Hong,

Thanks for your reply.

Yes, I am using superlu_dist as parallel direct solver. And I will ask our
cluster administrator to install mumps and give it a shot.

But I still have a question. For the test purpose, I have already made my
code as simple as just repeat solving a same well conditioning matrix. And
for most cases, it can do the solving, maybe 2 times, 3 times or even more
until it loses one process and get stuck. Basically what I did not quite
understand is why it did not give out any error information and abort the
job even one of their processes is gone. Is it just the problem with
cluster or might it also be the problem with the libraries. Thanks.

Regards,

Wen

Wen:
> Do you use superlu_dist as parallel direct solver?
> Suggest also install mumps.
> (need F90 compiler, configure petsc with
> '--download-blacs --download-scalapack --download-mumps').
> When superlu_dist fails, switch to mumps
> (use runtime option '-pc_factor_mat_solver_package mumps').
> If both solvers fail, something might be wrong with your model or code.
>
> Hong
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120405/da7879e8/attachment.htm>