[mpich-discuss] Cluster problem running MPI programs

Brice Chaffin linuxmage at lavabit.com
Tue Apr 10 19:27:35 CDT 2012


Hi all,

I have built a small cluster, but seem to be having a problem.

I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
share for a common directory when running as a cluster.

According to mpdtrace the ring is fully functional. Both machines are
recognized and communicating.

I can run regular c programs compiled with gcc using mpiexec or mpirun,
and results are returned from both nodes. When running actual MPI
programs, such as the examples included with MPICH2, or ones I compile
myself with mpicc, I get this:

rank 1 in job 8  node1_33851   caused collective abort of all ranks
  exit status of rank 1: killed by signal 4

I am including mpich2version output so you can see exactly how I built
it.

MPICH2 Version:    	1.4.1p1
MPICH2 Release date:	Thu Sep  1 13:53:02 CDT 2011
MPICH2 Device:    	ch3:nemesis
MPICH2 configure: 	--disable-f77 --disable-fc --with-pm=mpd
--prefix=/home/bchaffin/mpich2
MPICH2 CC: 	gcc    -O2
MPICH2 CXX: 	c++   -O2
MPICH2 F77: 	  
MPICH2 FC:

This is my first time working with a cluster, so any advice or
suggestions are more than welcome.






More information about the mpich-discuss mailing list