[mpich-discuss] MPICH2: MPI_Barrier(comm=0x44000002) failed

Mohammed Mediani medmediani at hotmail.com
Mon Aug 31 10:14:44 CDT 2009


Hello guys,
I am using MPICH2 under slurm, everything works fine if all the processes are on the same machine. 
if different machines are involved I get this error.
any help???
Here are some details:
Platform: OpenSuse Linux on a cluster of 13 nodes (in average every node has 8 processors and 32Gb memory)
MPICH version: mpich2-1.1.1
MPICH2 Build:
./configure --prefix=/project/mt/user/mmediani/tools/mpich-slurm --with-pmi=slurm --with-slurm=/usr/local/slurm --with-pm=no  --with-device=ch3:nemesis
$ srun -n2 cpi
      srun: job 17919 queued and waiting for resources
      srun: job 17919 has been allocated resources
      Process 1 of 2 is on i13hpc2
      Process 0 of 2 is on i13hpc2
      pi is approximately 3.1415926544231318, Error is 0.0000000008333387
      wall clock time = 0.000477
$ srun -n5 cpi
     srun: job 17921 queued and waiting for resources
     srun: job 17921 has been allocated resources
     Process 1 of 5 is on i13hpc2
     Process 0 of 5 is on i13hpc2
     Fatal error in PMPI_Bcast: Other MPI error, error stack:
     PMPI_Bcast(1301)......................: MPI_Bcast(buf=0x7fff892506e8, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
     MPIR_Bcast(998).......................:
     MPIR_Bcast_scatter_ring_allgather(842):
     MPIR_Bcast_binomial(187)..............:
     MPIC_Send(41).........................:
     MPIC_Wait(405)........................:
     MPIDI_CH3I_Progress(150)..............:
     MPID_nem_mpich2_blocking_recv(1074)...:
     MPID_nem_tcp_connpoll(1663)...........: Communication error
     Process 4 of 5 is on i13hpc3
     Process 3 of 5 is on i13hpc3
     Process 2 of 5 is on i13hpc3
     srun: error: i13hpc2: task 0: Exited with exit code 1
Best,
Mohammed

_________________________________________________________________
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090831/589acf1f/attachment.htm>


More information about the mpich-discuss mailing list