[mpich-discuss] Cluster problem running MPI programs

Brice Chaffin linuxmage at lavabit.com
Tue Apr 10 23:38:05 CDT 2012


Thanks again guys. I really appreciate the input. Rebuilding the setup
based on Hydra and recompiling the test program worked like a charm. My
cluster is now fully operational.

On Tue, 2012-04-10 at 23:31 -0400, Brice Chaffin wrote:
> Thanks Gus,
> 
> That is the main tutorial I used. It is dated, but complete, which why I
> chose it to base my first try on. Now that I have some experience with
> the overall process, I am going back over the mpich2 docs and rebuilding
> the whole thing.
> 
> On Tue, 2012-04-10 at 22:42 -0400, Gus Correa wrote:
> > On 04/10/2012 09:01 PM, Pavan Balaji wrote:
> > >
> > > [please keep mpich-discuss cc'ed]
> > >
> > > Sorry, we can't help you with mpd.
> > >
> > > I'm not sure which tutorial you are following, but if you look at the
> > > README, it will give you step-by-step instructions for building and
> > > installing mpich2.
> > >
> > > -- Pavan
> > 
> > For what it is worth, this issue appeared recently
> > in the list, related to the following tutorial,
> > which is very detailed, but unfortunately seems
> > to have become obsolete now [particularly regarding mpd]:
> > 
> > https://help.ubuntu.com/community/MpichCluster
> > 
> > To configure/make/install MPICH and to run MPI programs,
> > you may be better off if you use the MPICH2
> > documentation instead:
> > 
> > http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs
> > 
> > If you already have or install gfortran,
> > you don't need to disable Fortran in MPICH.
> > You don't need to 'overconfigure' MPICH2 either.
> > To get it up and running you may need at most to
> > set --prefix [if you don't want it to install in /usr/local],
> > and point to the compilers [e.g. FC=gfortran]
> > 
> > './configure -help' shows the options available.
> > 
> > My $0.02,
> > Gus Correa
> > 
> > 
> > 
> > 
> > >
> > > On 04/10/2012 07:58 PM, Brice Chaffin wrote:
> > >> I realize mpd is deprecated, but this being my first time, I was
> > >> following tutorials that relied on mpd, so I made the decision to use it
> > >> on my first run. I did very little configuration of mpich2, beyond
> > >> disabling Fortran options(since I have no Fortran compilers installed)
> > >> and setting the install directory. I am not familiar with all the config
> > >> options yet, so I may have left something important out. The problem
> > >> with the tutorials is that they assume everything works the first time.
> > >> I had to do some fine tuning afterwards, and may need to do some more to
> > >> correct this. I'm just not quite sure yet where the trouble is.
> > >>
> > >> On Tue, 2012-04-10 at 19:41 -0500, Pavan Balaji wrote:
> > >>> Hello,
> > >>>
> > >>> How did you configure mpich2? Please note that mpd is now deprecated
> > >>> and is not supported. In 1.4.1p1, mpd should not be built at all by
> > >>> default.
> > >>>
> > >>> -- Pavan
> > >>>
> > >>> On 04/10/2012 07:27 PM, Brice Chaffin wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> I have built a small cluster, but seem to be having a problem.
> > >>>>
> > >>>> I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
> > >>>> share for a common directory when running as a cluster.
> > >>>>
> > >>>> According to mpdtrace the ring is fully functional. Both machines are
> > >>>> recognized and communicating.
> > >>>>
> > >>>> I can run regular c programs compiled with gcc using mpiexec or mpirun,
> > >>>> and results are returned from both nodes. When running actual MPI
> > >>>> programs, such as the examples included with MPICH2, or ones I compile
> > >>>> myself with mpicc, I get this:
> > >>>>
> > >>>> rank 1 in job 8 node1_33851 caused collective abort of all ranks
> > >>>> exit status of rank 1: killed by signal 4
> > >>>>
> > >>>> I am including mpich2version output so you can see exactly how I built
> > >>>> it.
> > >>>>
> > >>>> MPICH2 Version: 1.4.1p1
> > >>>> MPICH2 Release date: Thu Sep 1 13:53:02 CDT 2011
> > >>>> MPICH2 Device: ch3:nemesis
> > >>>> MPICH2 configure: --disable-f77 --disable-fc --with-pm=mpd
> > >>>> --prefix=/home/bchaffin/mpich2
> > >>>> MPICH2 CC: gcc -O2
> > >>>> MPICH2 CXX: c++ -O2
> > >>>> MPICH2 F77:
> > >>>> MPICH2 FC:
> > >>>>
> > >>>> This is my first time working with a cluster, so any advice or
> > >>>> suggestions are more than welcome.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> > >>>> To manage subscription options or unsubscribe:
> > >>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > >>>
> > >>
> > >>
> > >>
> > >
> > 
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss





More information about the mpich-discuss mailing list