[MPICH] MPICH2 freezes at Send/Recv pair that it previously executed

Rajeev Thakur thakur at mcs.anl.gov
Tue May 29 17:05:45 CDT 2007


One way is to try to simplify the program and get the smallest version that
fails. For example, delete all the computation, leave only the communication
and see if it still hangs. Does it hang on a small number of processes (2 or
4)?
 
Rajeev


  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Christian Zemlin
Sent: Tuesday, May 29, 2007 4:02 PM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] MPICH2 freezes at Send/Recv pair that it previously
executed


I am running a parallel simulation using MPICH2, and occasionally this
simulation freezes in the middle of the execution, as far as I can tell at a
point where two slave nodes exchange data. 
What I don't understand is that this happens although the Send/Recv pair is
executed thousands of times without problems, and then it still freezes, as
if the nodes cannot communicate.
 
Any ideas how I can solve or better understand what is going wrong?
 
Christian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070529/62ce3f77/attachment.htm>


More information about the mpich-discuss mailing list