[petsc-users] one process is lost but others are still running (Hong Zhang)

Satish Balay balay at mcs.anl.gov
Thu Apr 5 22:15:56 CDT 2012


No point in trying to debug old code. If you wish to debug this issue
- try upgrading to petsc-3.2 [or better yet petsc-dev] - and see if
the problem persists.

petsc-dev uses superlu_dist_3.0

Satish

On Thu, 5 Apr 2012, Wen Jiang wrote:

> Hi Hong,
> 
> Thanks for your reply.
> 
> Yes, I am using superlu_dist as parallel direct solver. And I will ask our
> cluster administrator to install mumps and give it a shot.
> 
> But I still have a question. For the test purpose, I have already made my
> code as simple as just repeat solving a same well conditioning matrix. And
> for most cases, it can do the solving, maybe 2 times, 3 times or even more
> until it loses one process and get stuck. Basically what I did not quite
> understand is why it did not give out any error information and abort the
> job even one of their processes is gone. Is it just the problem with
> cluster or might it also be the problem with the libraries. Thanks.
> 
> Regards,
> 
> Wen
> 
> 
> Wen:
> > Do you use superlu_dist as parallel direct solver?
> > Suggest also install mumps.
> > (need F90 compiler, configure petsc with
> > '--download-blacs --download-scalapack --download-mumps').
> > When superlu_dist fails, switch to mumps
> > (use runtime option '-pc_factor_mat_solver_package mumps').
> > If both solvers fail, something might be wrong with your model or code.
> >
> > Hong
> >
> >
> 



More information about the petsc-users mailing list