[MPICH] Re: collective abort of all ranks

Kamaraju Kusumanchi kamaraju at gmail.com
Tue Jun 12 13:24:52 CDT 2007


On 6/12/07, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> There might be some problem on one of the nodes. Can you try running
> individually on each of the nodes (not across the nodes).
>
> Rajeev
>

When the program is run individually on each nodes, it works fine. I
have tried it on all the individual nodes. The problem comes up only
when I run it across the nodes.

AFAIK, the only difference across the nodes is that their times are
not synchronized.

node1Tue Jun 12 14:18:07 EDT 2007
node2Tue Jun 12 08:49:17 EDT 2007
node3Tue Jun 12 14:19:49 EDT 2007
node4Tue Jun 12 13:48:01 EDT 2007

I asked the administrator to synchronize the timings. I will inform
here if synchronizing the timings has any affect on the code's
behavior.

raju




More information about the mpich-discuss mailing list