[MPICH] mpi jobs not exiting

Steve Angelovich sangelovich at lgc.com
Fri Jun 30 14:10:50 CDT 2006


It is really a hang.

Thanks,
Steve


Rajeev Thakur wrote:

>Sometimes I have found that the job appears to hang, but if I hit the return
>key a few times, the prompt comes back. Does that work for you or is it a
>real hang?
>
>Rajeev 
>
>  
>
>>-----Original Message-----
>>From: owner-mpich-discuss at mcs.anl.gov 
>>[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Steve Angelovich
>>Sent: Thursday, June 29, 2006 5:15 PM
>>To: mpich-discuss at mcs.anl.gov
>>Subject: [MPICH] mpi jobs not exiting
>>
>>We have a cluster running redhat aw 3 that has starting 
>>having problems 
>>with mpi jobs not terminating properly.  I've been able to 
>>reproduce the 
>>problem by doing the following;
>> - start the mpd ring
>>
>>mpdboot -n 16 -f ~/mpd.hosts
>>
>> - Running the following command;
>>
>>mpiexec -n 16 uptime
>>
>>It usually takes several iterations before the mpiexec command will 
>>hang.  Best I can figure out the process that was created on 
>>each of the 
>>nodes has completed and exited but for some reason the mpd 
>>daemon still 
>>thinks it is running.  If I list the jobs running on the ring the job 
>>still shows up.  I can signal the job but there is no response.
>>
>>I've looked in the log file for the head node on the cluster and have 
>>found nothing useful.  Any insight into how to track down this issue 
>>would be greatly appreciated.
>>
>>Thanks,
>>Steve
>>
>>
>>
>>
>>----------------------------------------------------------------------
>>This e-mail, including any attached files, may contain 
>>confidential and privileged information for the sole use of 
>>the intended recipient.  Any review, use, distribution, or 
>>disclosure by others is strictly prohibited.  If you are not 
>>the intended recipient (or authorized to receive information 
>>for the intended recipient), please contact the sender by 
>>reply e-mail and delete all copies of this message.
>>
>>
>>    
>>
>
>  
>




More information about the mpich-discuss mailing list