[MPICH] stuck in bcast
Martin Kleinschmidt
mk at theochem.uni-duesseldorf.de
Thu Oct 26 08:20:16 CDT 2006
Hi,
I'm having problems with my code. It hangs in broadcast:
call MPI_bcast(ediag, nsaf,
$ MPI_double_precision, 0, MPI_Comm_World, MPIerr)
when nsaf is large (see below).
symptom is:
process 0 is using 100% cpu, all others are idle.
process 0 cannot be killed not even with kill -9
in a loop, I increased nsaf and found that bcast goes well up to
nsaf=1495039 but fails with nsaf=1495040 (which is 0x16D000 )
as far as i can see, this is not a hard limit in message size, because
I am able do bcast approx. 45 million double complex (750 MB)
successfully, whereas ediag is only 12 MB.
any ideas?
...martin
(I'm using mpich2-1.0.4p1, intel compiler 9.0, fedora core 2, mpd)
More information about the mpich-discuss
mailing list