[mpich-discuss] version 1.1 strange behavior : all processes become idle for extensive period

chong tan chong_guan_tan at yahoo.com
Mon Jul 13 13:35:55 CDT 2009


Sorry can't do that.  The benchmark involves 2 things.  One from my customer which
I am not allowed to distribute.    I may be able to get a limited license of my product
for you to try, but I definately can not send source code.

tan




________________________________
From: Darius Buntinas <buntinas at mcs.anl.gov>
To: mpich-discuss at mcs.anl.gov
Sent: Monday, July 13, 2009 10:54:50 AM
Subject: Re: [mpich-discuss] version 1.1 strange behavior : all processes become idle for extensive period


Can you send us the benchmark you're using?  This will help us figure
out what's going on.

Thanks,
-d

On 07/13/2009 12:36 PM, chong tan wrote:
> 
> thanks darius,
>  
> When I did the comparison (or benchmarking), I have 2 identical source
> trees.  Everything
> were recompiled group up and compiled/linked accordinglyto the version
> of MPICH2
> to be used.
>  
> I have many tests, this is the only one showing this behavior, and is
> predictably repeatable.
> most of my tests are showing comaptible performance and many do better
> with 1.1.
>  
> The 'weirdest' thing is the ~1 minute span where there is no activity on
> the box at all, zipo
> activity except 'top', with machine load at around 0.12.  I don't know
> how to explain this
> 'behavior', and I am extremely curious if anyone can explain this.
>  
> I can't repeat this on AMD boxes as I don't have one that has only 32G
> of memory.  I can't
> repeat this on Niagara box as thread multiple won't build.
>  
> I will try to rebuild 1.1 without thread-multiple.  Will keep you posted.
>  
> Meanwhile, if anyone has any speculations on this, please bring them up.
>  
> thanks
> tan
>  
> ------------------------------------------------------------------------
> *From:* Darius Buntinas <buntinas at mcs.anl.gov>
> *To:* mpich-discuss at mcs.anl.gov
> *Sent:* Monday, July 13, 2009 8:30:19 AM
> *Subject:* Re: [mpich-discuss] version 1.1 strange behavior : all
> processes become idle for extensive period
> 
> Tan,
> 
> Did you just re-link the applications, or did you recompile them?
> Version 1.1 is most likely not binary compatible with 1.0.6, so you
> really need to recompile the application.
> 
> Next, don't use the --enable-threads=multiple flag when configuring
> mpich2.  By default, mpich2 supports all thread levels and will select
> the thread level at run time (depending on the parameters passed to
> MPI_Init_thread).  By allowing the thread level to be selected
> automatically at run time, you'll avoid the overhead of thread safety
> when it's not needed, allowing your non-threaded applications to run faster.
> 
> Let us know if either of these fixes the problem, especially if just
> removing the --enable-threads option fixes this.
> 
> Thanks,
> -d
> 
> On 07/10/2009 06:19 PM, chong tan wrote:
>> I am seeing this funny situation which I did not see on 1.0.6 and
>> 1.0.8.  Some background:
>> 
>> machine : INTEL 4Xcore 2
>> 
>> running mpiexec -n 4
>> 
>> machine has 32G of mem.
>> 
>> when my application runs,  almost all memory are used.  However, there
>> is no swapping.
>> I have exclusive use of the machine, so contention is not an issue.
>> 
>> issue #1 :  processes take extra long to be initialized, compared to 1.0.6
>> issue #2 : during the run, at time all of them will become idle at the
>> same time, for almost a
>>                minute.  We never observed this with 1.0.6
>> 
>> 
>> The codes are the same, only linked with different versions of MPICH2.
>> 
>> MPICH2 was built with --enable-threads=multiple for 1.1.  without for
>> 1.0.6 or 1.0.8
>> 
>> MPI calls are all in the main application thread.  I used only 4 MPI
>> functions :
>> init(), Send(), Recv() and Barrier().
>> 
>> 
>> 
>> any suggestion ?
>> 
>> thanks
>> tan
>>
>> 
>>
>>    
>>
>>
> 



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090713/5ae4023b/attachment-0001.htm>


More information about the mpich-discuss mailing list