<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div style="font-size: 14px; ">Hello,</div><div style="font-size: 14px; "><br></div><div style="font-size: 14px; ">I'm still working on failures encountered as the number of tasks increases</div><div style="font-size: 14px; ">(Using mpich2-1.4, compiled with gcc 4.1, on Scientific Linux 5 , 2.6.18-238.12cc.el5)</div><div style="font-size: 14px; "><br></div><div style="font-size: 14px; ">Here is the smallest mpich2 code, with which I got failure above ~150 tasks</div><div style="font-size: 14px; ">No communication, only basic call</div><div style="font-size: 14px; "><br></div><div style="font-size: 14px; ">The code :</div><div style="font-size: 14px; "><br></div><div style="font-size: 12px; ">// Compilation with :</div><div style="font-size: 12px; ">// mpicc -O2 -I $HOME/mpich2-1.4/include -L $HOME/mpich2-1.4/lib -o bin/basic_test basic_test.c</div><div style="font-size: 14px; "><div style="font-size: 12px; "><br></div><div style="font-size: 12px; ">if (MPI_Init(&argc, &argv) != MPI_SUCCESS ) {</div><div style="font-size: 12px; "> printf("Error calling MPI_Init !!, exiting \n") ; fflush(stdout);</div><div style="font-size: 12px; "> return(1);</div><div style="font-size: 12px; "> }</div><div style="font-size: 12px; "><br></div><div style="font-size: 12px; "> int rank;</div><div style="font-size: 12px; "> if ( MPI_Comm_rank(MPI_COMM_WORLD, &rank)!= MPI_SUCCESS ) {</div><div style="font-size: 12px; "> printf("Error calling MPI_Comm_rank !!, exiting \n") ; fflush(stdout);</div><div style="font-size: 12px; "> MPI_Abort(MPI_COMM_WORLD, 1);</div><div style="font-size: 12px; "> return(1);</div><div style="font-size: 12px; "> }</div><div style="font-size: 12px; "> </div><div style="font-size: 12px; "> if (rank == 0) {</div><div style="font-size: 12px; "> int nprocs;</div><div style="font-size: 12px; "> if (MPI_Comm_size(MPI_COMM_WORLD, &nprocs)!= MPI_SUCCESS ) {</div><div style="font-size: 12px; "> printf("Error calling MPI_Comm_size !!, exiting \n") ; fflush(stdout);</div><div style="font-size: 12px; "> MPI_Abort(MPI_COMM_WORLD, 1);</div><div style="font-size: 12px; "> return(1);</div><div style="font-size: 12px; "> }</div><div style="font-size: 12px; "> </div><div style="font-size: 12px; "> printf("Running %d tasks \n", nprocs) ; fflush(stdout);</div><div style="font-size: 12px; "> MPI_Finalize(); </div><div style="font-size: 12px; "> return(0); </div><div style="font-size: 12px; "> } else {</div><div style="font-size: 12px; "> sleep(1);</div><div style="font-size: 12px; "> return(0);</div><div style="font-size: 12px; "> }</div><div><br></div><div><br></div><div>Runnning the code (On Scientific Linux 5 , 2.6.18-238.12cc.el5 )</div><div>Everything works fine up to around 150 tasks</div></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; "> >mpiexec -np 128 bin/basic_test</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">Running 128 tasks </font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; "><br></font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; "> >mpiexec -np 150 bin/basic_test</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">Running 150 tasks </font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; "><br></font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; "> >mpiexec -np 160 bin/basic_test</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[proxy:0:0@ccdvli10] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[proxy:0:0@ccdvli10] fn_get_maxes (./pm/pmiserv/pmip_pmi_v1.c:205): error sending PMI response</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[proxy:0:0@ccdvli10] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[proxy:0:0@ccdvli10] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[proxy:0:0@ccdvli10] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[mpiexec@ccdvli10] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[mpiexec@ccdvli10] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[mpiexec@ccdvli10] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event</font></div><div style="font-size: 13px; "><font class="Apple-style-span" size="3" style="font-size: 13px; ">[mpiexec@ccdvli10] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion</font></div><div style="font-size: 14px; "><br></div><div style="font-size: 14px; "><br></div><div style="font-size: 14px; "><div>Has anybody an idea of my probable error code ?</div><div><div>What is the high limit for number of tasks ?</div></div><div><br></div><div>Best regards</div></div><div style="font-size: 14px; ">
<div><div><div><div><div>---------------<br>Bernard CHAMBON<br>IN2P3 / CNRS<br>04 72 69 42 18<br></div></div></div></div></div>
</div>
<br style="font-size: 14px; "></body></html>