[MPICH] Re: collective abort of all ranks
Kamaraju Kusumanchi
kamaraju at gmail.com
Tue Jun 12 13:24:52 CDT 2007
On 6/12/07, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> There might be some problem on one of the nodes. Can you try running
> individually on each of the nodes (not across the nodes).
>
> Rajeev
>
When the program is run individually on each nodes, it works fine. I
have tried it on all the individual nodes. The problem comes up only
when I run it across the nodes.
AFAIK, the only difference across the nodes is that their times are
not synchronized.
node1Tue Jun 12 14:18:07 EDT 2007
node2Tue Jun 12 08:49:17 EDT 2007
node3Tue Jun 12 14:19:49 EDT 2007
node4Tue Jun 12 13:48:01 EDT 2007
I asked the administrator to synchronize the timings. I will inform
here if synchronizing the timings has any affect on the code's
behavior.
raju
More information about the mpich-discuss
mailing list