[mpich-discuss] Problem with MPI_Bcast

Gus Correa gus at ldeo.columbia.edu
Tue Dec 14 11:46:56 CST 2010


Pavan Balaji wrote:
> 
> On Tue, 14 Dec 2010, Hisham Adel wrote:
>> Thanks for your fast reply. The program runs well when I have removed 
>> "node10"
>> and increased the number of processes.
>> Now, I don't know where is the problem with "node10". It has the same 
>> Linux
>> version, the same configuration and on the same network.
>>
>> Do you have any ideas ?
> 
> Unfortunately, with the information we have, it's hard to tell what's 
> wrong with the node. Could be a hardware issue; could be a network 
> configuration issue. I don't really know.
> 
> You might be able to run some tests with just two nodes in the host file 
> (node00 and node10) and see what errors it throws. It gets harder to 
> debug with 11 nodes. Also try doing an ssh and ping from node00 to 
> node10, and node10 to node00.
> 
> I'm just randomly throwing out things you can try here. Maybe something 
> will stick.
> 
>  -- Pavan
> 
Hi Hisham

Have you tried to reboot node10?
Sometimes it is all it takes.
A quick and dirty way to restore a node to a sane state.

My two cents,
Gus Correa


More information about the mpich-discuss mailing list