<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:arial, helvetica, sans-serif;font-size:10pt"><div style="color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; "></div><div style="color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; ">Hi All,</div><div style="color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; "><br></div><div style="color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; ">I have installed the new MPICH2 version "1.3.1" with this configuration:</div><div style="color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; "><br></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><b>./configure --without-mpe --disable-f77 --disable-fc </b></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif"
size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">After the installation, I started run some old programs I have written before with MPI....</font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">All the programs I have written before with MPI hang when number of cores > 20. They hang when there is an<b> MPI_Bcast</b> call.</font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">So, I got the "Hello_world" example and I executed it. It works well. So, I have modified it and added a simple <b>MPI_Bcast</b> call, the program </font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">start to hang when number of cores > 20.</font></div><div><font
class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">I also have tried the new installation with the "<b>cpi</b>" example included in the package and it hangs when the number of nodes > 20.....</font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2">Do you have any ideas about that ?</font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font
class="Apple-style-span" face="arial, helvetica, sans-serif" size="6"><u>Here is the "Hello World" example:</u></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif" size="2"><br></font></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif"><div style="font-size: small; "><b>#include <stdio.h></b></div><div style="font-size: small; "><b>#include "mpi.h"</b></div><div style="font-size: small; "><b>#include <string.h></b></div><div style="font-size: small; "><b><br></b></div><div style="font-size: small; "><b>int main(int argc, char **argv)</b></div><div style="font-size: small; "><b>{</b></div><div style="font-size: small; "><b> int my_rank;</b></div><div style="font-size: small; "><b> int source;</b></div><div style="font-size: small; "><b> int dest;</b></div><div style="font-size: small;
"><b> int p,len;</b></div><div style="font-size: small; "><b> int tag = 50;</b></div><div style="font-size: small; "><b> char message [100];</b></div><div style="font-size: small; "><b> char name[MPI_MAX_PROCESSOR_NAME];</b></div><div style="font-size: small; "><b> MPI_Status status;</b></div><div style="font-size: small; "><b><br></b></div><div style="font-size: small; "><b> MPI_Init(&argc, &argv);</b></div><div style="font-size: small; "><b> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);</b></div><div style="font-size: small; "><b> MPI_Comm_size(MPI_COMM_WORLD, &p);</b></div><div style="font-size: small; "><b> int x=0;</b></div><div style="font-size: small;
"><b> if(my_rank==0)</b></div><div style="font-size: small; "><b> {</b></div><div style="font-size: small; "><b> x=923;</b></div><div style="font-size: small; "><b> }</b></div><div style="font-size: small; "><b> MPI_Bcast(&x,1,MPI_INT,0,MPI_COMM_WORLD);</b></div><div style="font-size: small; "><b> printf("\nI %d got %d from node 0\n",my_rank,x);</b></div><div style="font-size: small; "><b> if (my_rank != 0) {</b></div><div style="font-size: small; "><b> MPI_Get_processor_name(name, &len);</b></div><div style="font-size: small; "><b> sprintf(message, "Greetings from process %d, I
am %s !", my_rank, name);</b></div><div style="font-size: small; "><b> dest = 0;</b></div><div style="font-size: small; "><b> MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag,</b></div><div style="font-size: small; "><b> MPI_COMM_WORLD);</b></div><div style="font-size: small; "><b> } else {</b></div><div style="font-size: small; "><b> for (source = 1; source < p; source++) {</b></div><div style="font-size: small; "><b> MPI_Recv(message, 100, MPI_CHAR, source, tag,</b></div><div style="font-size: small; "><b>
MPI_COMM_WORLD, &status);</b></div><div style="font-size: small; "><b> printf("%s\n", message);</b></div><div style="font-size: small; "><b> }</b></div><div style="font-size: small; "><b> }</b></div><div style="font-size: small; "><b> MPI_Finalize();</b></div><div style="font-size: small; "><b>}</b></div><div style="font-size: small; "><br></div><div><font class="Apple-style-span" size="6"><u>Here is the error message I got, when I run the "Hello World" Example:</u></font></div><div style="font-size: small; "><br></div><div style="font-size: small; ">Fatal error in PMPI_Bcast: Other MPI error, error stack:<br><div>PMPI_Bcast(1306)......................:
MPI_Bcast(buf=0x7fff463d2ad4, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(187)..............: </div><div>MPIC_Send(66).........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1830)...........: Communication error with rank 20: </div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(1306)......................: MPI_Bcast(buf=0x7fff8c374d84, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(187)..............: </div><div>MPIC_Send(66).........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1843)...........: </div><div>state_commrdy_handler(1674)...........: </div><div>MPID_nem_tcp_recv_handler(1653).......: Communication error with rank 16</div><div>MPID_nem_tcp_recv_handler(1554).......: socket closed</div><div>APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)</div></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div><font class="Apple-style-span" size="6"><u>Here is the error message I got, when
I run "cpi" example:</u></font></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><div>Process 1 of 22 is on node00</div><div>Process 0 of 22 is on node00</div><div>Process 4 of 22 is on node02</div><div>Process 5 of 22 is on node02</div><div>Process 6 of 22 is on node03</div><div>Process 7 of 22 is on node03</div><div>Process 20 of 22 is on node10</div><div>Process 21 of 22 is on node10</div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(1306)......................: MPI_Bcast(buf=0x7fff44bcfd3c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(187)..............: </div><div>MPIC_Send(66).........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1843)...........: </div><div>state_commrdy_handler(1674)...........: </div><div>MPID_nem_tcp_recv_handler(1653).......: Communication error with rank 16</div><div>MPID_nem_tcp_recv_handler(1554).......: socket closed</div><div>Process 2 of 22 is on node01</div><div>Process 3 of 22 is on node01</div><div>[proxy:0:2@node02] HYDT_dmxu_poll_wait_for_event (/home/k/mpich2-1.3.1/src/pm/hydra/tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN &
~POLLOUT & ~POLLHUP)) failed</div><div>[proxy:0:2@node02] main (/home/k/mpich2-1.3.1/src/pm/hydra/pm/pmiserv/pmip.c:225): demux engine error waiting for event</div><div>Process 8 of 22 is on node04</div><div>Process 9 of 22 is on node04</div><div>Process 18 of 22 is on node09</div><div>Process 19 of 22 is on node09</div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(1306)......................: MPI_Bcast(buf=0x7ffff9d75dec, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(157)..............: </div><div>MPIC_Recv(108)........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1830)...........: Communication error with rank 0: </div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(1306)......................: MPI_Bcast(buf=0x7fff9645255c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(187)..............: </div><div>MPIC_Send(66).........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1843)...........: </div><div>state_commrdy_handler(1674)...........: </div><div>MPID_nem_tcp_recv_handler(1653).......: Communication error with rank 0</div><div>MPID_nem_tcp_recv_handler(1554).......: socket closed</div><div>Process 16 of 22 is on node08</div><div>Process 17 of 22 is on node08</div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(1306)......................: MPI_Bcast(buf=0x7fff02102e6c, count=1, MPI_INT, root=0,
MPI_COMM_WORLD) failed</div><div>MPIR_Bcast_impl(1150).................: </div><div>MPIR_Bcast_intra(990).................: </div><div>MPIR_Bcast_scatter_ring_allgather(840): </div><div>MPIR_Bcast_binomial(187)..............: </div><div>MPIC_Send(66).........................: </div><div>MPIC_Wait(528)........................: </div><div>MPIDI_CH3I_Progress(335)..............: </div><div>MPID_nem_mpich2_blocking_recv(906)....: </div><div>MPID_nem_tcp_connpoll(1830)...........: Communication error with rank 20: </div><div>Process 12 of 22 is on node06</div><div>Process 13 of 22 is on node06</div><div>Process 14 of 22 is on node07</div><div>Process 15 of 22 is on node07</div><div>[mpiexec@node00] HYDT_bscu_wait_for_completion (/home/k/mpich2-1.3.1/src/pm/hydra/tools/bootstrap/utils/bscu_wait.c:99): one of the processes terminated badly; aborting</div><div>[mpiexec@node00] HYDT_bsci_wait_for_completion
(/home/k/mpich2-1.3.1/src/pm/hydra/tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion</div><div>[mpiexec@node00] HYD_pmci_wait_for_completion (/home/k/mpich2-1.3.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error waiting for completion</div><div>[mpiexec@node00] main (/home/k/mpich2-1.3.1/src/pm/hydra/ui/mpich/mpiexec.c:302): process manager error waiting for completion</div></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div><font class="Apple-style-span" size="6"><u>Here is also the running command:</u></font></div><div style="font-size: small; "><br></div><div style="font-size: small; ">>mpiexec -f hosts -n 22 ./mpi-Hello.exe</div><div style="font-size: small; ">> mpiexec.hydra -f hosts
-n 22 ./mpi-Hello.exe</div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><meta http-equiv="content-type" content="text/html; charset=utf-8">When number of cores is 20, the program executed well.</div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div><font class="Apple-style-span" size="6"><u>Here is also the "hosts" file:</u></font></div><div style="font-size: small; "><div>node00:2</div><div>node01:2</div><div>node02:2</div><div>node03:2</div><div>node04:2</div><div>node05:2</div><div>node06:2</div><div>node07:2</div><div>node08:2</div><div>node09:2</div><div>node10:2</div></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small; "><br></div><div style="font-size: small;
"><br></div><div style="font-size: small; "><br></div></font></div><div style="position: fixed; color: black; font-family: arial, helvetica, sans-serif; font-size: 10pt; "></div>
</div><br>
</body></html>