[petsc-dev] BG hang still broken in petsc-maint!
Satish Balay
balay at mcs.anl.gov
Wed Dec 18 13:45:32 CST 2013
On Wed, 18 Dec 2013, Jed Brown wrote:
> Satish Balay <balay at mcs.anl.gov> writes:
>
> > Works for me on vesta with [the following on sys/examples/tutorials/ex1]
> >
> > runjob --np 8192 --ranks-per-node 16 --cwd $PWD --block VST-00440-33771-512 : $PWD/ex1 -log_summary
>
> This is only 512 nodes. According to ALCF, the probability of MPI_Bcast
> crossing paths goes way up at more than 1024 nodes. IBM should really
> fix this problem, but until then, the workaround is to fall back to the
> reference implementations (PAMID_COLLECTIVES=0) which are sometimes
> also faster (go figure).
I had a chat with Derek today morning. The error case was with 512
nodes [same as above] with --ranks-per-node 4 or 8. And this was on
ceatus. The hang was confirmed to be in PetscInitialze [via the
debugger] and -skip_petscrc went past the hang.
Will try reproducing the problem on ceatus.
Satish
More information about the petsc-dev
mailing list