[mpich-discuss] MPICH2: MPI_Barrier(comm=0x44000002) failed
Mohammed Mediani
medmediani at hotmail.com
Mon Aug 31 10:14:44 CDT 2009
Hello guys,
I am using MPICH2 under slurm, everything works fine if all the processes are on the same machine.
if different machines are involved I get this error.
any help???
Here are some details:
Platform: OpenSuse Linux on a cluster of 13 nodes (in average every node has 8 processors and 32Gb memory)
MPICH version: mpich2-1.1.1
MPICH2 Build:
./configure --prefix=/project/mt/user/mmediani/tools/mpich-slurm --with-pmi=slurm --with-slurm=/usr/local/slurm --with-pm=no --with-device=ch3:nemesis
$ srun -n2 cpi
srun: job 17919 queued and waiting for resources
srun: job 17919 has been allocated resources
Process 1 of 2 is on i13hpc2
Process 0 of 2 is on i13hpc2
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000477
$ srun -n5 cpi
srun: job 17921 queued and waiting for resources
srun: job 17921 has been allocated resources
Process 1 of 5 is on i13hpc2
Process 0 of 5 is on i13hpc2
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1301)......................: MPI_Bcast(buf=0x7fff892506e8, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(998).......................:
MPIR_Bcast_scatter_ring_allgather(842):
MPIR_Bcast_binomial(187)..............:
MPIC_Send(41).........................:
MPIC_Wait(405)........................:
MPIDI_CH3I_Progress(150)..............:
MPID_nem_mpich2_blocking_recv(1074)...:
MPID_nem_tcp_connpoll(1663)...........: Communication error
Process 4 of 5 is on i13hpc3
Process 3 of 5 is on i13hpc3
Process 2 of 5 is on i13hpc3
srun: error: i13hpc2: task 0: Exited with exit code 1
Best,
Mohammed
_________________________________________________________________
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090831/589acf1f/attachment.htm>
More information about the mpich-discuss
mailing list