[mpich-discuss] MPICH2 (or MPI_Init) limitation | scalability

Bernard Chambon bernard.chambon at cc.in2p3.fr
Tue Jan 10 09:20:46 CST 2012


Le 10 janv. 2012 à 00:52, Dave Goodell a écrit :

> Make sure you include a call to MPI_Finalize in your test program as well.
> 
> -Dave


I'm afraid that not the problem

The question is that there is a limitation in mpich2 software or more probably in my OS|Machine, but I can't find it ?
there is clearly a limit at 152 tasks even after getting rid of limits (*),  and increasing shared memory values (**)

 >mpiexec -genvall -profile -np 152 bin/my_test   (my_test = MPI_Init + MPI_Finalize)

================================================================================
[mpiexec at ccdvli10] Number of PMI calls seen by the server: 306
================================================================================

 >mpiexec -genvall -profile -np 153 bin/my_test
Assertion failed in file /scratch/BC/mpich2-1.4.1p1/src/util/wrappers/mpiu_shm_wrappers.h at line 889: seg_sz > 0
internal ABORT - process 0
[proxy:0:0 at ccdvli10] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
[proxy:0:0 at ccdvli10] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response
[proxy:0:0 at ccdvli10] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error
[proxy:0:0 at ccdvli10] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at ccdvli10] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at ccdvli10] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec at ccdvli10] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at ccdvli10] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at ccdvli10] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion


(*)
>limit
cputime      unlimited
filesize     unlimited
datasize     unlimited
stacksize    unlimited
coredumpsize unlimited
memoryuse    unlimited
vmemoryuse   unlimited
descriptors  1000000 
memorylocked unlimited
maxproc      409600 

(**)

>sysctl -A | grep kernel.sh
kernel.shmmni = 16000
kernel.shmall = 8388608000
kernel.shmmax = 33554432



Best regards
---------------
Bernard CHAMBON
IN2P3 / CNRS
04 72 69 42 18

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120110/519d48f7/attachment.htm>


More information about the mpich-discuss mailing list