[mpich-discuss] mpich-discuss Digest, Vol 14, Issue 37

Tue Nov 24 14:37:06 CST 2009

Also, if you need particularly high bandwidth intranode communication  
and you have root privilege on your machines, you might consider  
installing the knem kernel module included in the contrib/knem  
directory of the MPICH2 source tree.  Then configure MPICH2 with "-- 
with-nemesis-local-lmt=knem" (see the README in the same directory).   
Information about knem performance benefits can be found in the  
following paper:

http://hal.archives-ouvertes.fr/docs/00/39/00/64/PDF/article.pdf

This is an alternative mechanism to the default nemesis mechanism and  
is not strictly a "fair" comparison to the mechanism in the shm  
channel, which does not have any kernel-level assistance.

-Dave

On Nov 24, 2009, at 2:06 PM, Darius Buntinas wrote:

> We have noticed performance issues as well on certain machines, but we
> haven't pinned the cause down yet.
>
> BTW, process-core binding is very important for getting consistent
> performance measurements, especially for intranode communication.
> You'll get the best performance when the processes share an L2 cache.
>
> Here's a page with hints on how to get the best benchmark performance
> from mpich2:
> http://wiki.mcs.anl.gov/mpich2/index.php/Measuring_Nemesis_Performance
>
> Can you send us the output of the following commands on the machine
> you're running the benchmarks on to help us debug the performance
> problem you're seeing?
>
> ls /sys/devices/system/cpu/cpu*/cache/index*/shared_cpu_map
> cat /sys/devices/system/cpu/cpu*/cache/index*/shared_cpu_map
> cat /sys/devices/system/cpu/cpu*/cache/index*/size
>
> Thanks
> -d
>
>
>
> On 11/24/2009 03:08 AM, BryantMichael Bryant wrote:
>> hello,recently i had do a research about mpich2 to compare the
>> "nemesis" and "shm" channel.in the research,first, i make install  
>> mpich2
>> with nemesis,and then make install with shm. my purpose is to compare
>> the latency and bandwidth by using the two channel in a single SMP  
>> node.
>> the test enviroment is Intel(R) Xeon(R) CPU X5355 2.66GH,cache size :
>> 4096 KB
>> and  the test result shows that the bandwidth,when the mpi message  
>> size
>> from 1-32Byte,the bandwidth shows similar;then the message size from
>> 32-512B,the nemesis shows little better than shm;but after 512B to
>> 1MB(my test use OSU benchmark,message size from 1Byte-4MB),the shm  
>> show
>> more better than the nemesis,the peak is about 1800MBps(shm),but the
>> nemesis is only 1100MBps;from 1MB to 4MB,the bandwidth in shm  
>> decrease
>> rapidly,anyway it also better than the nemesis,when the message size
>> come to 4MB,the nemesis and shm show the similar bandwidth.
>> my question is: why the shm show the more better performance than the
>> nemesis? is it owing to the data transfer method or shm has some
>> optimize functions ?
>> pls tell me ,thank you!
>>
>> ------------------------------------------------------------------------
>> 使用Messenger保护盾2.0，支持多账号登录！ 现在就下 
>> 载！
>> <http://im.live.cn/safe/>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss