<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi,<div><br><div><div>Le 10 janv. 2012 à 19:20, Darius Buntinas a écrit :</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Courier; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; ">I think Dave has the right idea. You may not have enough shared memory available to support that many processes. There are two ways MPICH2 allocates shared memory, System V or mmap. System V typically has very low limits on the size of shared memory regions, so we use mmap be default. To make sure mmap is being used, send us the output of:<br><br>grep "shared memory" src/mpid/ch3/channels/nemesis/config.log<br><br>Thanks</span></blockquote></div><div><br></div><div>yes mmap is used</div><div><br></div><div style="font-size: 16px; "><i>>grep "shared memory" src/mpid/ch3/channels/nemesis/config.log</i></div><div style="font-size: 16px; "><i>configure:7220: Using a memory-mapped file for shared memory</i></div><div><br></div><div>The bad news is that there is NO influence of the shm* parameters</div><div>I always get failure reaching 153 task even after incresing values by 8</div><div><br></div><div><font class="Apple-style-span" size="4"><span class="Apple-style-span" style="font-size: 14px;"><div style="font-size: 15px; "><i> >sysctl -A | egrep "sem|shm"</i></div><div style="font-size: 15px; "><i>vm.hugetlb_shm_group = 0</i></div><div style="font-size: 15px; "><i>kernel.sem = 250</i><span class="Apple-tab-span" style="white-space: pre; "><i>        </i></span><i>32000</i><span class="Apple-tab-span" style="white-space: pre; "><i>        </i></span><i>32</i><span class="Apple-tab-span" style="white-space: pre; "><i>        </i></span><i>128</i></div><div style="font-size: 15px; "><i>kernel.shmmni = 4096</i></div><div style="font-size: 15px; "><i>kernel.shmall = 2097152</i></div><div style="font-size: 15px; "><i>kernel.shmmax = 33554432</i></div><div style="font-size: medium; "><i><br></i></div><div style="font-size: medium; "><div style="font-size: 14px; "><i>>mpiexec -genvall -profile -np </i><b><i>152</i></b><i> bin/my_test ; echo $status</i></div><div style="font-size: 14px; "><i><br></i></div><div style="font-size: 14px; "><i>================================================================================</i></div><div style="font-size: 14px; "><i>[mpiexec@ccwpge0062] Number of PMI calls seen by the server: 306</i></div><div style="font-size: 14px; "><i>================================================================================</i></div><div style="font-size: 14px; "><i><br></i></div><div style="font-size: 14px; "><i>0</i></div></div><div style="font-size: 14px; "><i><br></i></div><div style="font-size: 14px; "><div><i>>mpiexec -genvall -profile -np </i><b><i>153</i></b><i> bin/my_test</i></div><div><i>Assertion failed in file /scratch/BC/mpich2-1.4.1p1/src/util/wrappers/mpiu_shm_wrappers.h at line 889: seg_sz > 0</i></div><div><i>internal ABORT - process 0</i></div><div><i>[proxy:0:0@ccwpge0062] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed</i></div><div><i>[proxy:0:0@ccwpge0062] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response</i></div><div><i>[proxy:0:0@ccwpge0062] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error</i></div><div>...</div></div></span></font></div><div><br></div><div>lowering shm* values (e.g. by 16) has also no influence</div><div><br></div><div>Thanks,<br></div><div><br></div><div>
<div style="font-size: 18px; "><div><div><div><div>---------------<br>Bernard CHAMBON<br>IN2P3 / CNRS<br>04 72 69 42 18<br></div></div></div></div></div>
</div>
<br></div></body></html>