[mpich-discuss] version 1.1 strange behavior : all processes becomeidle for extensive period

chong tan chong_guan_tan at yahoo.com
Mon Jul 13 12:51:56 CDT 2009


Rajeev,
when the problem/issue#2 popped up, the processes had already ran for a few minutes.  Many MPICH
call had been made and completed successfully,  I don't know the counts, maybe 10 millions ?

All the processes were running on 1 physical machine.  I don't know for sure what they were doing
when the 'idle' happened, my guess is MPI calls, that is the only place the processes ever wait
for external input.

I forgot to mention  that the processes share a global memory created by explicit shm call.
However, I have 3 other tests that also use this shm feature in my code, and they don't have this issue.

by any mean, 1 minute of continuous idle for all processes in parallized run is very strange.

tan

 



________________________________
From: Rajeev Thakur <thakur at mcs.anl.gov>
To: mpich-discuss at mcs.anl.gov
Sent: Saturday, July 11, 2009 9:27:23 AM
Subject: Re: [mpich-discuss] version 1.1 strange behavior : all processes becomeidle for extensive period


The first issue has been fixed. If you try one of the nightly snapshots, it should go away. It will be included in 1.1.1 to be out next week.
 
Can you tell us more about the second issue. What are the processes doing when they suddenly become idle? Have they already communicated before? Are they all running on a single machine?
 
Rajeev
 


________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of chong tan
>Sent: Friday, July 10, 2009 6:20 PM
>To: mpich-discuss at mcs.anl.gov
>Subject: [mpich-discuss] version 1.1 strange behavior : all processes becomeidle for extensive period
>  
>
>I am seeing this funny situation which I did not see on 1.0.6 and 1.0.8.  Some background:
>
>machine : INTEL 4Xcore 2
>
>running mpiexec -n 4
>
>machine has 32G of mem.  
>
>when my application runs,  almost all memory are used.  However, there is no swapping.
>I have exclusive use of the machine, so contention is not an issue.
>
>issue #1 :  processes take extra long to be initialized, compared to 1.0.6
>issue #2 : during the run, at time all of them will become idle at the same time, for almost a
>                minute.  We never observed this with 1.0.6
>
>
>The codes are the same, only linked with different versions of MPICH2.
>
>MPICH2 was built with --enable-threads=multiple for 1.1.  without for 1.0.6 or 1.0.8
>
>MPI calls are all in the main application thread.  I used only 4 MPI functions :
>init(), Send(), Recv() and Barrier().  
>
>
>
>any suggestion ?
>
>thanks
>tan
>
> 
>
>
> 
>


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090713/6de33967/attachment.htm>


More information about the mpich-discuss mailing list