[mpich-discuss] MPICH2 (or MPI_Init) limitation | scalability

Bernard Chambon bernard.chambon at cc.in2p3.fr
Wed Jan 11 03:36:08 CST 2012


Hi,

Le 10 janv. 2012 à 19:20, Darius Buntinas a écrit :

> I think Dave has the right idea.  You may not have enough shared memory available to support that many processes.  There are two ways MPICH2 allocates shared memory, System V or mmap.  System V typically has very low limits on the size of shared memory regions, so we use mmap be default.  To make sure mmap is being used, send us the output of:
> 
> grep "shared memory" src/mpid/ch3/channels/nemesis/config.log
> 
> Thanks


yes mmap is used

>grep "shared memory" src/mpid/ch3/channels/nemesis/config.log
configure:7220: Using a memory-mapped file for shared memory

The bad news is that there is NO influence of the shm* parameters
I always get failure reaching 153 task even after incresing values by 8

 >sysctl -A | egrep "sem|shm"
vm.hugetlb_shm_group = 0
kernel.sem = 250	32000	32	128
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 33554432

>mpiexec -genvall -profile -np 152 bin/my_test ; echo $status

================================================================================
[mpiexec at ccwpge0062] Number of PMI calls seen by the server: 306
================================================================================

0

>mpiexec -genvall -profile -np 153 bin/my_test
Assertion failed in file /scratch/BC/mpich2-1.4.1p1/src/util/wrappers/mpiu_shm_wrappers.h at line 889: seg_sz > 0
internal ABORT - process 0
[proxy:0:0 at ccwpge0062] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
[proxy:0:0 at ccwpge0062] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response
[proxy:0:0 at ccwpge0062] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error
...

lowering shm* values (e.g. by 16) has also no influence

Thanks,

---------------
Bernard CHAMBON
IN2P3 / CNRS
04 72 69 42 18

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120111/e910ccb5/attachment.htm>


More information about the mpich-discuss mailing list