[mpich-discuss] Fatal error in PMPI_Bcast:

Fujun Liu liufujun07 at gmail.com
Fri May 27 14:17:23 CDT 2011


I reconfigure it by
 /home/netlab/MPI/mpich2-1.3.2p1/configure
-prefix=/home/netlab/MPI/mpich2-install --enable-g=all
and run by
mpiexec -n 2 -l -f machinefile /home/netlab/MPI/mpich2-build/examples/cpi
-mpich-dbg=file -mpich-dbg-level=verbose -mpich2-dbg-class=all

I found the following error message
netlab at query:~/MPI$ mpiexec -n 2 -l -f machinefile
/home/netlab/MPI/mpich2-build/examples/cpi -mpich-dbg=file
-mpich-dbg-level=verbose -mpich2-dbg-class=all
[0] /home/netlab/MPI/mpich2-build/examples/cpi: error while loading shared
libraries: libopa.so.1: cannot open shared object file: No such file or
directory
[mpiexec at query] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
[1] /home/netlab/MPI/mpich2-build/examples/cpi: error while loading shared
libraries: libopa.so.1: cannot open shared object file: No such file or
directory
[mpiexec at query] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP

Sorry, I didn't find the log file

Best Wishes,

On Fri, May 27, 2011 at 1:38 PM, Darius Buntinas <buntinas at mcs.anl.gov>wrote:

> Can you reconfigure with the --enable-g=all option, then re-run it like
> this (all on one line):
>
> mpiexec -n 2 -l -f machinefile /home/netlab/MPI/mpich2-build/examples/cpi
> -mpich-dbg=file -mpich-dbg-level=verbose -mpich2-dbg-class=all
>
> There should then be two files starting with "dbg" and ending with ".log".
>  Please send those to us.
>
> Thanks,
> -d
>
> On May 27, 2011, at 11:51 AM, Fujun Liu wrote:
>
> > I also doubt it is a networking problem. I am trying to how to find it.
> Anyway, thanks a lot
> >
> > On Fri, May 27, 2011 at 12:46 PM, Dave Goodell <goodell at mcs.anl.gov>
> wrote:
> > If your firewall truly is disabled and those /etc/hosts files are
> accurate, then I don't know what the problem might be.  It still sounds like
> a networking problem, but I don't have any concrete suggestions for what
> else to check.
> >
> > Perhaps others on the list have experienced these sorts of problems
> before and can offer ideas.
> >
> > -Dave
> >
> > On May 27, 2011, at 11:24 AM CDT, Fujun Liu wrote:
> >
> > > I use two hosts: one is query, the other is trigger
> > >
> > > (1) about firewall
> > >
> > > netlab at query:~$ sudo ufw status
> > > Status: inactive
> > >
> > > netlab at trigger:~$ sudo ufw status
> > > Status: inactive
> > >
> > > Both firewalls are turned off.
> > >
> > > (2)about DNS
> > >
> > > for query, /etc/hosts is as below:
> > >
> > > 127.0.0.1       localhost
> > > #127.0.1.1      query
> > >
> > > xxx.xxx.xxx.42  trigger
> > > xxx.xxx.xxx.43  query
> > >
> > > for trigger, /etc/hosts is as below:
> > > 127.0.0.1       localhost
> > > #127.0.1.1      trigger
> > >
> > > xxx.xxx.xxx.42  trigger
> > > xxx.xxx.xxx.43  query
> > >
> > > In fact, they are the same
> > >
> > > (3) version of MPICH2
> > >
> > > mpich2-1.3.2p1, it is from
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
> > > As you can notice, it is called stable version
> > >
> > > (4) about configure.
> > >
> > > I did nothing about this. I just use the -prefix option. Do I need more
> about this?
> > >
> > > Now hellowworld workds fine on two hosts, cpi works fine on single one
> host. The problem is probably that the two hosts can't communicate. So any
> suggestion?
> > >
> > > Best Wishes,
> > >
> > > On Fri, May 27, 2011 at 11:55 AM, Dave Goodell <goodell at mcs.anl.gov>
> wrote:
> > > The problem looks like a networking issue, either a firewall or DNS
> (bad /etc/hosts file?) issue.  Are the firewalls disabled on these machines?
>  How are the hostnames configured?
> > >
> > > What version of MPICH2 is this?  What configure options did you use
> when you built MPICH2?
> > >
> > > -Dave
> > >
> > > On May 27, 2011, at 10:49 AM CDT, Fujun Liu wrote:
> > >
> > > > The cpi also does not work. There is no error message, but it takes
> forever:
> > > >
> > > > xxxx at query:~/MPI$ mpiexec -n 2 -f machinefile
> /home/netlab/MPI/mpich2-build/examples/cpi
> > > > Process 1 of 2 is on query
> > > > Process 0 of 2 is on trigger
> > > >
> > > > I think my two hosts are still trying to communicate to each other.
> Any suggestions?
> > > >
> > > > Best wishes,
> > > >
> > > >
> > > > On Fri, May 27, 2011 at 9:42 AM, Dave Goodell <goodell at mcs.anl.gov>
> wrote:
> > > > Does the "examples/cpi" program from the MPICH2 build directory work
> correctly for you when you run it on multiple nodes?
> > > >
> > > > -Dave
> > > >
> > > > On May 26, 2011, at 5:49 PM CDT, Fujun Liu wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > When I try one example from
> http://beige.ucs.indiana.edu/I590/node62.html, I got the following error
> message as below. In the MPI cluster, there are two hosts. If I run the two
> processes on just one host, everything works fine. But if I run two
> processes on the two-host cluster, the following error happens. I think the
> two hosts just can't send/receive message to each other, but I don't know
> how to resolve this.
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > xxxx at query:~/MPI$ mpiexec -n 2 -f machinefile ./GreetMaster
> > > > > Fatal error in PMPI_Bcast: Other MPI error, error stack:
> > > > > PMPI_Bcast(1430).......................:
> MPI_Bcast(buf=0x7fff13114cb0, count=8192, MPI_CHAR, root=0, MPI_COMM_WORLD)
> failed
> > > > > MPIR_Bcast_impl(1273)..................:
> > > > > MPIR_Bcast_intra(1107).................:
> > > > > MPIR_Bcast_binomial(143)...............:
> > > > > MPIC_Recv(110).........................:
> > > > > MPIC_Wait(540).........................:
> > > > > MPIDI_CH3I_Progress(353)...............:
> > > > > MPID_nem_mpich2_blocking_recv(905).....:
> > > > > MPID_nem_tcp_connpoll(1823)............:
> > > > > state_commrdy_handler(1665)............:
> > > > > MPID_nem_tcp_recv_handler(1559)........:
> > > > > MPID_nem_handle_pkt(587)...............:
> > > > > MPIDI_CH3_PktHandler_EagerSend(632)....: failure occurred while
> posting a receive for message data (MPIDI_CH3_PKT_EAGER_SEND)
> > > > > MPIDI_CH3U_Receive_data_unexpected(251): Out of memory (unable to
> allocate -1216907051 bytes)
> > > > > [mpiexec at query] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
> > > > > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> > > > >
> > > > > --
> > > > > Fujun Liu
> > > > > Department of Computer Science, University of Kentucky, 2010.08-
> > > > > fujun.liu at uky.edu, (859)229-3659
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > mpich-discuss mailing list
> > > > > mpich-discuss at mcs.anl.gov
> > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > >
> > > > _______________________________________________
> > > > mpich-discuss mailing list
> > > > mpich-discuss at mcs.anl.gov
> > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > >
> > > >
> > > >
> > > > --
> > > > Fujun Liu
> > > > Department of Computer Science, University of Kentucky, 2010.08-
> > > > fujun.liu at uky.edu, (859)229-3659
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > mpich-discuss mailing list
> > > > mpich-discuss at mcs.anl.gov
> > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > >
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > >
> > >
> > >
> > > --
> > > Fujun Liu
> > > Department of Computer Science, University of Kentucky, 2010.08-
> > > fujun.liu at uky.edu, (859)229-3659
> > >
> > >
> > >
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> >
> >
> > --
> > Fujun Liu
> > Department of Computer Science, University of Kentucky, 2010.08-
> > fujun.liu at uky.edu, (859)229-3659
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
Fujun Liu
Department of Computer Science, University of Kentucky, 2010.08-
fujun.liu at uky.edu, (859)229-3659
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110527/2fae0cc5/attachment.htm>


More information about the mpich-discuss mailing list