[mpich-discuss] Problem while running example program Cpi with morethan 1 task

Thejna Tharammal ttharammal at marum.de
Fri Sep 3 07:43:11 CDT 2010


 Ok, I tried that, 

No.of hosts 1:
-bash-3.2$ mpiexec -n 7 ./cpi
Process 1 of 7 is on k1
Process 4 of 7 is on k1
Process 5 of 7 is on k1
Process 2 of 7 is on k1
Process 6 of 7 is on k1
Process 0 of 7 is on k1
Process 3 of 7 is on k1
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000198
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

===========
No.of hosts 2:
 -bash-3.2$ mpiexec -n 7 ./cpi
Process 0 of 7 is on k1
Process 4 of 7 is on k1
Process 2 of 7 is on k1
Process 6 of 7 is on k1
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000522
Process 1 of 7 is on k2
Process 5 of 7 is on k2
Process 3 of 7 is on k2
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

========
No.of hosts 3:
 -bash-3.2$ mpiexec -n 7 ./cpi
Process 4 of 7 is on k2
Process 1 of 7 is on k2
Process 2 of 7 is on k3
Process 5 of 7 is on k3
Process 0 of 7 is on k1
Process 3 of 7 is on k1
Process 6 of 7 is on k1
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000855
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
====================
No.of hosts 4:
-bash-3.2$ mpiexec -n 7 ./cpi
Process 0 of 7 is on k1
Process 4 of 7 is on k1
Process 3 of 7 is on k4
Process 5 of 7 is on k2
Process 1 of 7 is on k2
Process 2 of 7 is on k3
Process 6 of 7 is on k3
Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1207).................: MPI_Reduce(sbuf=0x7fff66b3c220,
rbuf=0x7fff66b3c228, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
failed
MPIR_Reduce_impl(1025)............: 
MPIR_Reduce_intra(803)............: 
MPIR_Reduce_impl(1025)............: 
MPIR_Reduce_intra(855)............: 
MPIR_Reduce_binomial(171).........: 
MPIC_Recv(108)....................: 
MPIC_Wait(534)....................: 
MPIDI_CH3I_Progress(184)..........: 
MPID_nem_mpich2_blocking_recv(895): 
MPID_nem_tcp_connpoll(1760).......: 
state_commrdy_handler(1588).......: 
MPID_nem_tcp_recv_handler(1567)...: Communication error with rank 1
MPID_nem_tcp_recv_handler(1467)...: socket closed
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1314)......................: MPI_Bcast(buf=0x7fffed0286f8,
count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1158).................: 
MPIR_Bcast_intra(996).................: 
MPIR_Bcast_scatter_ring_allgather(840): 
MPIR_Bcast_binomial(187)..............: 
MPIC_Send(66).........................: 
MPIC_Wait(534)........................: 
MPIDI_CH3I_Progress(184)..............: 
MPID_nem_mpich2_blocking_recv(895)....: 
MPID_nem_tcp_connpoll(1760)...........: 
state_commrdy_handler(1588)...........: 
MPID_nem_tcp_recv_handler(1567).......: Communication error with rank 0
MPID_nem_tcp_recv_handler(1467).......: socket closed
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1314)......................: MPI_Bcast(buf=0x7fff5f6ac518,
count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1158).................: 
MPIR_Bcast_intra(996).................: 
MPIR_Bcast_scatter_ring_allgather(840): 
MPIR_Bcast_binomial(157)..............: 
MPIC_Recv(108)........................: 
MPIC_Wait(534)........................: 
MPIDI_CH3I_Progress(184)..............: 
MPID_nem_mpich2_blocking_recv(895)....: 
MPID_nem_tcp_connpoll(1746)...........: Communication error with rank 2: 
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
==============

The same error messages are produced when I use the same no. of hosts (6) ie
like -

Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1207).................: MPI_Reduce(sbuf=0x7fffc6433c20,
rbuf=0x7fffc6433c28, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
failed
MPIR_Reduce_impl(1025)............:
MPIR_Reduce_intra(803)............:
MPIR_Reduce_impl(1025)............:
MPIR_Reduce_intra(855)............:
MPIR_Reduce_binomial(171).........:
MPIC_Recv(108)....................:
MPIC_Wait(534)....................:
MPIDI_CH3I_Progress(184)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1760).......:
state_commrdy_handler(1588).......:
MPID_nem_tcp_recv_handler(1567)...: Communication error with rank 2
MPID_nem_tcp_recv_handler(1467)...: socket closed
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(302).................: MPI_Finalize failed

============================================

Thank you,
Thejna

 
----------------original message-----------------
From: "Pavan Balaji" balaji at mcs.anl.gov
To: "Thejna Tharammal" ttharammal at marum.de
CC: mpich-discuss at mcs.anl.gov
Date: Thu, 02 Sep 2010 17:30:53 -0500
-------------------------------------------------
 
 
> 
> On 09/02/2010 05:18 PM, Thejna Tharammal wrote:
>> Yes I run the 1.3b1 version,
>> I have set environment var HYDRA_HOST_FILE in the bash profile, so It's
>> running on the machines I specify(k1-k6) itself I think..
> 
> Ah, I see. This works fine for me (though I'm using the svn trunk, and 
> not 1.3b1).
> 
> Can you try using a smaller host file (with lesser hosts) and see if you 
> can reproduce this problem?
> 
> -- Pavan
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 




More information about the mpich-discuss mailing list