[mpich-discuss] Cluster problem running MPI programs

Brice Chaffin linuxmage at lavabit.com
Tue Apr 10 20:18:43 CDT 2012


Thank you.

I'll look at building mpich2 in a better way now that I at least got a
trial version running. The tutorials were Ubuntu specific and older, so
mpd probably wasn't deprecated when they were written. 

I'll check the README file and rebuild mpich2.

If the problem persists with Hydra, I'll let you know.

Thanks again.:)



On Tue, 2012-04-10 at 20:01 -0500, Pavan Balaji wrote:
> [please keep mpich-discuss cc'ed]
> 
> Sorry, we can't help you with mpd.
> 
> I'm not sure which tutorial you are following, but if you look at the 
> README, it will give you step-by-step instructions for building and 
> installing mpich2.
> 
>   -- Pavan
> 
> On 04/10/2012 07:58 PM, Brice Chaffin wrote:
> > I realize mpd is deprecated, but this being my first time, I was
> > following tutorials that relied on mpd, so I made the decision to use it
> > on my first run. I did very little configuration of mpich2, beyond
> > disabling Fortran options(since I have no Fortran compilers installed)
> > and setting the install directory. I am not familiar with all the config
> > options yet, so I may have left something important out. The problem
> > with the tutorials is that they assume everything works the first time.
> > I had to do some fine tuning afterwards, and may need to do some more to
> > correct this. I'm just not quite sure yet where the trouble is.
> >
> > On Tue, 2012-04-10 at 19:41 -0500, Pavan Balaji wrote:
> >> Hello,
> >>
> >> How did you configure mpich2?  Please note that mpd is now deprecated
> >> and is not supported.  In 1.4.1p1, mpd should not be built at all by
> >> default.
> >>
> >>    -- Pavan
> >>
> >> On 04/10/2012 07:27 PM, Brice Chaffin wrote:
> >>> Hi all,
> >>>
> >>> I have built a small cluster, but seem to be having a problem.
> >>>
> >>> I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
> >>> share for a common directory when running as a cluster.
> >>>
> >>> According to mpdtrace the ring is fully functional. Both machines are
> >>> recognized and communicating.
> >>>
> >>> I can run regular c programs compiled with gcc using mpiexec or mpirun,
> >>> and results are returned from both nodes. When running actual MPI
> >>> programs, such as the examples included with MPICH2, or ones I compile
> >>> myself with mpicc, I get this:
> >>>
> >>> rank 1 in job 8  node1_33851   caused collective abort of all ranks
> >>>     exit status of rank 1: killed by signal 4
> >>>
> >>> I am including mpich2version output so you can see exactly how I built
> >>> it.
> >>>
> >>> MPICH2 Version:    	1.4.1p1
> >>> MPICH2 Release date:	Thu Sep  1 13:53:02 CDT 2011
> >>> MPICH2 Device:    	ch3:nemesis
> >>> MPICH2 configure: 	--disable-f77 --disable-fc --with-pm=mpd
> >>> --prefix=/home/bchaffin/mpich2
> >>> MPICH2 CC: 	gcc    -O2
> >>> MPICH2 CXX: 	c++   -O2
> >>> MPICH2 F77: 	
> >>> MPICH2 FC:
> >>>
> >>> This is my first time working with a cluster, so any advice or
> >>> suggestions are more than welcome.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >>
> >
> >
> >
> 





More information about the mpich-discuss mailing list