[mpich-discuss] Not able to run MPI program parallely...

Albert Spade albert.spade at gmail.com
Tue May 1 05:30:09 CDT 2012


Hi,
Thanks for your reply.
I am using mpich2-1.4.1p1 with its default process manager(hydra).
I disabled firewall on all my machines in cluster by using
#service iptables stop
#chkconfig iptables off

I created file names hosts in my home directory of all the cluster machines
i.e /root with
beowulf.master
beowulf.node1
beowulf.node2
beowulf.node3
beowulf.node4

Also in /etc/hosts of all machines I added there ip addresses and
respective names.

I am able to login to any machine from any other without password.

I also set environment for Hydra process manager by adding
export HYDRA_FILE=/root/hosts
in .bashrc file in /root

Am I missing something??
Thanks...

------------------------------
>
> Message: 6
> Date: Tue, 1 May 2012 12:27:26 +0800
> From: Ju JiaJia <jujj603 at gmail.com>
> Subject: Re: [mpich-discuss] Not able to run MPI program parallely...
> To: mpich-discuss at mcs.anl.gov
> Message-ID:
>        <CAON5g3S1xnVC=5FJS0dsr0uAYqhoffbHM+BMWyz3yTvz5fhmZw at mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Which Process Manager are you using ? If you are using mpd, make sure mpd
> is running and  all the nodes are in the ring. Use mpdtrace -l to check.
>
> On Tue, May 1, 2012 at 5:27 AM, Albert Spade <albert.spade at gmail.com>
> wrote:
>
> > Hi I want to run my program parallely on the my small cluster. It  has 5
> > nodes one master and 4 compute nodes.
> > When I run the below program on invidual machine it works fine and give
> > proper output. But if I run it on cluster it gives below error.
> > I disabled firewall.
> >
> > OUTPUT....
> > -----------------
> > [root at beowulf ~]# mpiexec -n 4 ./cpi
> > Process 2 of 4 is on beowulf.master
> > Process 3 of 4 is on beowulf.master
> > Process 1 of 4 is on beowulf.master
> > Process 0 of 4 is on beowulf.master
> > Fatal error in PMPI_Reduce: Other MPI error, error stack:
> > PMPI_Reduce(1270)...............: MPI_Reduce(sbuf=0xbfa66ba8,
> > rbuf=0xbfa66ba0, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
> > failed
> > MPIR_Reduce_impl(1087)..........:
> > MPIR_Reduce_intra(895)..........:
> > MPIR_Reduce_binomial(144).......:
> > MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 2
> > MPIR_Reduce_binomial(144).......:
> > MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1
> >
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120501/c2e1898f/attachment.htm
> >
>
> ------------------------------
>
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120501/48d9d959/attachment.htm>


More information about the mpich-discuss mailing list