[mpich-discuss] version 1.1 strange behavior : all processes become idle for extensive period
chong tan
chong_guan_tan at yahoo.com
Mon Jul 13 13:35:55 CDT 2009
Sorry can't do that. The benchmark involves 2 things. One from my customer which
I am not allowed to distribute. I may be able to get a limited license of my product
for you to try, but I definately can not send source code.
tan
________________________________
From: Darius Buntinas <buntinas at mcs.anl.gov>
To: mpich-discuss at mcs.anl.gov
Sent: Monday, July 13, 2009 10:54:50 AM
Subject: Re: [mpich-discuss] version 1.1 strange behavior : all processes become idle for extensive period
Can you send us the benchmark you're using? This will help us figure
out what's going on.
Thanks,
-d
On 07/13/2009 12:36 PM, chong tan wrote:
>
> thanks darius,
>
> When I did the comparison (or benchmarking), I have 2 identical source
> trees. Everything
> were recompiled group up and compiled/linked accordinglyto the version
> of MPICH2
> to be used.
>
> I have many tests, this is the only one showing this behavior, and is
> predictably repeatable.
> most of my tests are showing comaptible performance and many do better
> with 1.1.
>
> The 'weirdest' thing is the ~1 minute span where there is no activity on
> the box at all, zipo
> activity except 'top', with machine load at around 0.12. I don't know
> how to explain this
> 'behavior', and I am extremely curious if anyone can explain this.
>
> I can't repeat this on AMD boxes as I don't have one that has only 32G
> of memory. I can't
> repeat this on Niagara box as thread multiple won't build.
>
> I will try to rebuild 1.1 without thread-multiple. Will keep you posted.
>
> Meanwhile, if anyone has any speculations on this, please bring them up.
>
> thanks
> tan
>
> ------------------------------------------------------------------------
> *From:* Darius Buntinas <buntinas at mcs.anl.gov>
> *To:* mpich-discuss at mcs.anl.gov
> *Sent:* Monday, July 13, 2009 8:30:19 AM
> *Subject:* Re: [mpich-discuss] version 1.1 strange behavior : all
> processes become idle for extensive period
>
> Tan,
>
> Did you just re-link the applications, or did you recompile them?
> Version 1.1 is most likely not binary compatible with 1.0.6, so you
> really need to recompile the application.
>
> Next, don't use the --enable-threads=multiple flag when configuring
> mpich2. By default, mpich2 supports all thread levels and will select
> the thread level at run time (depending on the parameters passed to
> MPI_Init_thread). By allowing the thread level to be selected
> automatically at run time, you'll avoid the overhead of thread safety
> when it's not needed, allowing your non-threaded applications to run faster.
>
> Let us know if either of these fixes the problem, especially if just
> removing the --enable-threads option fixes this.
>
> Thanks,
> -d
>
> On 07/10/2009 06:19 PM, chong tan wrote:
>> I am seeing this funny situation which I did not see on 1.0.6 and
>> 1.0.8. Some background:
>>
>> machine : INTEL 4Xcore 2
>>
>> running mpiexec -n 4
>>
>> machine has 32G of mem.
>>
>> when my application runs, almost all memory are used. However, there
>> is no swapping.
>> I have exclusive use of the machine, so contention is not an issue.
>>
>> issue #1 : processes take extra long to be initialized, compared to 1.0.6
>> issue #2 : during the run, at time all of them will become idle at the
>> same time, for almost a
>> minute. We never observed this with 1.0.6
>>
>>
>> The codes are the same, only linked with different versions of MPICH2.
>>
>> MPICH2 was built with --enable-threads=multiple for 1.1. without for
>> 1.0.6 or 1.0.8
>>
>> MPI calls are all in the main application thread. I used only 4 MPI
>> functions :
>> init(), Send(), Recv() and Barrier().
>>
>>
>>
>> any suggestion ?
>>
>> thanks
>> tan
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090713/5ae4023b/attachment-0001.htm>
More information about the mpich-discuss
mailing list