<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div style="font-size: 17px; ">Hello,</div><div style="font-size: 17px; "><br></div><div style="font-size: 17px; ">I confirm a failure when specifying -iface + a high number of tasks.</div><div style="font-size: 17px; ">I run hydra version 1.4.1p1 with a shared memory patch (seg_sz.patch)</div><div style="font-size: 17px; "><br></div><div style="font-size: 17px; "><div style="font-size: 15px; ">Test , "by hand" (= not thru batch system) , between the two following machines:</div><div style="font-size: 15px; "><span class="Apple-style-span" style="font-size: 17px; "><font class="Apple-style-span" size="4" style="font-size: 21px; "><span class="Apple-style-span" style="font-size: 15px; "><div style="font-size: 13px; "><i>>more /tmp/machines </i></div><div style="font-size: 13px; "><i>ccwpge0061:128</i></div><div style="font-size: 13px; "><i>ccwpge0062:128</i></div></span></font></span></div></div><div style="font-size: 17px; "><br></div><div style="font-size: 15px; "><span class="Apple-style-span" style="font-size: 17px; "><font class="Apple-style-span" size="4" style="font-size: 21px; "><span class="Apple-style-span" style="font-size: 15px; "><div>1/ without specifying -iface, It's OK (more than 10 tries)</div><div><br></div></span></font></span></div><div style="font-size: 17px; "><div><div style="font-size: 13px; "><div>mpiexec -f /tmp/machines -n 150 bin/advance_test</div><div>bchambon@ccwpge0062's password: </div><div><br></div><div>I am there </div><div>Running MPI version 2, subversion 2 </div><div>ref_message is ready </div><div>I am the master task 0 sur ccwpge0061, for 149 slaves tasks, we will exchange a buffer of 1 MB</div><div><br></div><div>slave number 1, iteration = 1</div><div>slave number 2, iteration = 1</div><div>slave number 3, iteration = 1</div><div>…</div><div><br></div></div></div></div><div style="font-size: 14px; ">>echo $status</div><div style="font-size: 14px; ">0</div><div style="font-size: 17px; "><br></div><div style="font-size: 17px; ">2/ When <span class="Apple-style-span" style="font-size: 15px; ">specifying -iface eth0 </span><span class="Apple-style-span" style="font-size: 15px; ">I <u>always</u> get a assertion failure</span></div><div style="font-size: 17px; "><font class="Apple-style-span" size="4" style="font-size: 21px; "><span class="Apple-style-span" style="font-size: 15px; "><br></span></font></div><div style="font-size: 17px; "><div style="font-size: 12px; ">>mpiexec -iface eth0 -f /tmp/machines -n 150 bin/advance_test <span class="Apple-style-span" style="font-size: 15px; ">(as previous, more than 10 tries)</span></div><div style="font-size: 12px; ">bchambon@ccwpge0062's password: </div><div style="font-size: 12px; "><br></div><div style="font-size: 12px; ">Segmentation fault</div><div style="font-size: 12px; ">[mpiexec@ccwpge0061] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed</div><div style="font-size: 12px; ">[mpiexec@ccwpge0061] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status</div><div style="font-size: 12px; ">[mpiexec@ccwpge0061] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event</div><div style="font-size: 12px; ">[mpiexec@ccwpge0061] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion</div></div><div style="font-size: 17px; "><br></div><div style="font-size: 17px; "><br></div><div style="font-size: 17px; ">I'm quite sure that the failure occurs when increasing the number of tasks</div><div style="font-size: 17px; ">with a machine file like : </div><div style="font-size: 17px; "><div>ccwpge0061:8</div><div>ccwpge0062:8</div></div><div style="font-size: 17px; "><br></div><div><font class="Apple-style-span" size="5"><span class="Apple-style-span" style="font-size: 17px; "> >mpiexec -verbose -iface eth0 -f /tmp/machines -n 16 bin/advance_test</span></font></div><div><font class="Apple-style-span" size="5"><span class="Apple-style-span" style="font-size: 17px;"><br></span></font></div><div><font class="Apple-style-span" size="5"><span class="Apple-style-span" style="font-size: 17px;">seems to be ok !</span></font></div><div><font class="Apple-style-span" size="5"><span class="Apple-style-span" style="font-size: 17px;"><br></span></font></div><div><span class="Apple-style-span" style="font-size: 17px; ">Best regards.</span></div><div><font class="Apple-style-span" size="5"><span class="Apple-style-span" style="font-size: 17px;"><br></span></font></div><div><span class="Apple-style-span" style="font-size: 17px; ">PS : </span></div><div><span class="Apple-style-span" style="font-size: 17px; "><br></span></div><div><span class="Apple-style-span" style="font-size: 17px; "><div> >limit</div><div>cputime unlimited</div><div>filesize unlimited</div><div>datasize unlimited</div><div>stacksize unlimited</div><div>coredumpsize unlimited</div><div>memoryuse unlimited</div><div>vmemoryuse unlimited</div><div>descriptors 1000000 </div><div>memorylocked unlimited</div><div>maxproc 409600 </div><div><br></div></span></div><div><span class="Apple-style-span" style="font-size: 17px; "><br></span></div><div><span class="Apple-style-span" style="font-size: 17px; ">---------------</span></div><div style="font-size: 17px; "><div><div><div><div><div>Bernard CHAMBON<br>IN2P3 / CNRS<br>04 72 69 42 18<br></div></div></div></div></div>
</div>
<br style="font-size: 17px; "></body></html>