[mpich-discuss] Cluster problem running MPI programs
Pavan Balaji
balaji at mcs.anl.gov
Tue Apr 10 19:41:48 CDT 2012
Hello,
How did you configure mpich2? Please note that mpd is now deprecated
and is not supported. In 1.4.1p1, mpd should not be built at all by
default.
-- Pavan
On 04/10/2012 07:27 PM, Brice Chaffin wrote:
> Hi all,
>
> I have built a small cluster, but seem to be having a problem.
>
> I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
> share for a common directory when running as a cluster.
>
> According to mpdtrace the ring is fully functional. Both machines are
> recognized and communicating.
>
> I can run regular c programs compiled with gcc using mpiexec or mpirun,
> and results are returned from both nodes. When running actual MPI
> programs, such as the examples included with MPICH2, or ones I compile
> myself with mpicc, I get this:
>
> rank 1 in job 8 node1_33851 caused collective abort of all ranks
> exit status of rank 1: killed by signal 4
>
> I am including mpich2version output so you can see exactly how I built
> it.
>
> MPICH2 Version: 1.4.1p1
> MPICH2 Release date: Thu Sep 1 13:53:02 CDT 2011
> MPICH2 Device: ch3:nemesis
> MPICH2 configure: --disable-f77 --disable-fc --with-pm=mpd
> --prefix=/home/bchaffin/mpich2
> MPICH2 CC: gcc -O2
> MPICH2 CXX: c++ -O2
> MPICH2 F77:
> MPICH2 FC:
>
> This is my first time working with a cluster, so any advice or
> suggestions are more than welcome.
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list