[MPICH] stuck in bcast

Martin Kleinschmidt mk at theochem.uni-duesseldorf.de
Thu Oct 26 08:20:16 CDT 2006


Hi,

I'm having problems with my code. It hangs in broadcast:

      call MPI_bcast(ediag, nsaf,
     $           MPI_double_precision, 0, MPI_Comm_World, MPIerr)


when nsaf is large (see below).

symptom is:
process 0 is using 100% cpu, all others are idle.
process 0 cannot be killed not even with kill -9

in a loop, I increased nsaf and found that bcast goes well up to
nsaf=1495039 but fails with nsaf=1495040 (which is 0x16D000 )

as far as i can see, this is not a hard limit in message size, because
I am able do bcast approx. 45 million double complex (750 MB)
successfully, whereas ediag is only 12 MB.

any ideas?


   ...martin


(I'm using mpich2-1.0.4p1, intel compiler 9.0, fedora core 2, mpd)




More information about the mpich-discuss mailing list