[mpich-discuss] Problem with MPI_Bcast
Gus Correa
gus at ldeo.columbia.edu
Tue Dec 14 11:46:56 CST 2010
Pavan Balaji wrote:
>
> On Tue, 14 Dec 2010, Hisham Adel wrote:
>> Thanks for your fast reply. The program runs well when I have removed
>> "node10"
>> and increased the number of processes.
>> Now, I don't know where is the problem with "node10". It has the same
>> Linux
>> version, the same configuration and on the same network.
>>
>> Do you have any ideas ?
>
> Unfortunately, with the information we have, it's hard to tell what's
> wrong with the node. Could be a hardware issue; could be a network
> configuration issue. I don't really know.
>
> You might be able to run some tests with just two nodes in the host file
> (node00 and node10) and see what errors it throws. It gets harder to
> debug with 11 nodes. Also try doing an ssh and ping from node00 to
> node10, and node10 to node00.
>
> I'm just randomly throwing out things you can try here. Maybe something
> will stick.
>
> -- Pavan
>
Hi Hisham
Have you tried to reboot node10?
Sometimes it is all it takes.
A quick and dirty way to restore a node to a sane state.
My two cents,
Gus Correa
More information about the mpich-discuss
mailing list