[mpich-discuss] MPICH2 (or MPI_Init) limitation | scalability
Bernard Chambon
bernard.chambon at cc.in2p3.fr
Wed Jan 11 03:36:08 CST 2012
Hi,
Le 10 janv. 2012 à 19:20, Darius Buntinas a écrit :
> I think Dave has the right idea. You may not have enough shared memory available to support that many processes. There are two ways MPICH2 allocates shared memory, System V or mmap. System V typically has very low limits on the size of shared memory regions, so we use mmap be default. To make sure mmap is being used, send us the output of:
>
> grep "shared memory" src/mpid/ch3/channels/nemesis/config.log
>
> Thanks
yes mmap is used
>grep "shared memory" src/mpid/ch3/channels/nemesis/config.log
configure:7220: Using a memory-mapped file for shared memory
The bad news is that there is NO influence of the shm* parameters
I always get failure reaching 153 task even after incresing values by 8
>sysctl -A | egrep "sem|shm"
vm.hugetlb_shm_group = 0
kernel.sem = 250 32000 32 128
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 33554432
>mpiexec -genvall -profile -np 152 bin/my_test ; echo $status
================================================================================
[mpiexec at ccwpge0062] Number of PMI calls seen by the server: 306
================================================================================
0
>mpiexec -genvall -profile -np 153 bin/my_test
Assertion failed in file /scratch/BC/mpich2-1.4.1p1/src/util/wrappers/mpiu_shm_wrappers.h at line 889: seg_sz > 0
internal ABORT - process 0
[proxy:0:0 at ccwpge0062] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
[proxy:0:0 at ccwpge0062] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response
[proxy:0:0 at ccwpge0062] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error
...
lowering shm* values (e.g. by 16) has also no influence
Thanks,
---------------
Bernard CHAMBON
IN2P3 / CNRS
04 72 69 42 18
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120111/e910ccb5/attachment.htm>
More information about the mpich-discuss
mailing list