[mpich-discuss] Cluster problem running MPI programs
Brice Chaffin
linuxmage at lavabit.com
Tue Apr 10 19:27:35 CDT 2012
Hi all,
I have built a small cluster, but seem to be having a problem.
I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
share for a common directory when running as a cluster.
According to mpdtrace the ring is fully functional. Both machines are
recognized and communicating.
I can run regular c programs compiled with gcc using mpiexec or mpirun,
and results are returned from both nodes. When running actual MPI
programs, such as the examples included with MPICH2, or ones I compile
myself with mpicc, I get this:
rank 1 in job 8 node1_33851 caused collective abort of all ranks
exit status of rank 1: killed by signal 4
I am including mpich2version output so you can see exactly how I built
it.
MPICH2 Version: 1.4.1p1
MPICH2 Release date: Thu Sep 1 13:53:02 CDT 2011
MPICH2 Device: ch3:nemesis
MPICH2 configure: --disable-f77 --disable-fc --with-pm=mpd
--prefix=/home/bchaffin/mpich2
MPICH2 CC: gcc -O2
MPICH2 CXX: c++ -O2
MPICH2 F77:
MPICH2 FC:
This is my first time working with a cluster, so any advice or
suggestions are more than welcome.
More information about the mpich-discuss
mailing list