[mpich-discuss] [mpich2-dev] mpich2-1.4.1 communication error.

张磊 jyklzhang at gmail.com
Thu Jun 21 07:38:20 CDT 2012


Oh,It's my mistake to email the wrong emailing list.I'm sorry about that.
I've tried to reconfigure mpich2 with --enable-g=all ,but I got errors in
making.

> ./configure --enable-g=all -prefix=/usr/mpich2-install
>>
> make[5]: Entering directory
> `/home/zhang/Downloads/mpich2-1.4.1p1/src/mpi/romio/adio/common'
>   CC              malloc.c
> In file included from malloc.c:24:0:
> /usr/include/malloc.h:61:14: error: expected identifier or ‘(’ before
> '\x6c6c6f63'
> /usr/include/malloc.h:64:14: error: expected identifier or ‘(’ before
> '\x6c6c6f63'
> /usr/include/malloc.h:72:49: error: macro "realloc" passed 2 arguments,
> but takes just 1
> /usr/include/malloc.h:72:14: error: ‘realloc’ redeclared as different kind
> of symbol
> /usr/include/malloc.h:76:13: error: expected identifier or ‘(’ before
> '\x46726565'
> make[5]: *** [malloc.o] Error 1
>

Unwilling to quit it , I tried this:

./configure --enable-FEATURE=dbg -prefix=/usr/mpich2-install
>>
> However, it seems it did't work that I didn't get any log text.

2012/6/20 Darius Buntinas <buntinas at mcs.anl.gov>

> Please use the mpich-discuss mailing list to get help with mpich2.  This
> mailing list is for mpich2 developers.
>
> It's hard to tell what happened from the stack trace, but it looks like
> process 0 thinks that process 1 failed.  Can you reconfigure mpich with
> --enable-g=all then rerun it with logging enabled?
>
>    ./configure --enable-g=all <other configure options you specified>
>    make clean
>    make
>    make install
>    cd examples
>    export MPICH_DBG_FILENAME="dbg-%w-%d.txt"
>    export MPICH_DBG_CLASS=ALL
>    export MPICH_DBG_LEVEL=VERBOSE
>    mpiexec -f hosts -n 2 ./cpi
>
> Then, if it happens again, send us the files dbg-*.txt but please reply to
> the mpich-discuss mailing list.
>
> -d
>
> On Jun 20, 2012, at 6:18 AM, 张磊 wrote:
>
> > I've configured two computers of Ubuntu12.04 with mpich2-1.4.1p1,and it
> all works well with the Helloworld program in one machine or two machines
> together.
> > But I got wrong info in running the cpi examples.Before,all two machines
> can ping and ssh from each other to the other with no password. and I also
> turned off the firewall.(I've checked the uwf-status.) However, when I
> running the cpi example, I got these below:
> >
> > zhang at zhang-Lenovo-IdeaPad-Y470:~/test$ mpiexec -f hosts -np 2 ./cpi
> > Process 0 of 2 is on zhang-Lenovo-IdeaPad-Y470
> > Process 1 of 2 is on lianjie2
> > Fatal error in PMPI_Reduce: Other MPI error, error stack:
> > PMPI_Reduce(1270)...............: MPI_Reduce(sbuf=0xbf8f9b18,
> rbuf=0xbf8f9b20, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
> failed
> > MPIR_Reduce_impl(1087)..........:
> > MPIR_Reduce_intra(895)..........:
> > MPIR_Reduce_binomial(144).......:
> > MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1
> >
> > [mpiexec at zhang-Lenovo-IdeaPad-Y470] control_cb
> (./pm/pmiserv/pmiserv_cb.c:321): assert (!closed) failed
> > [mpiexec at zhang-Lenovo-IdeaPad-Y470] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [mpiexec at zhang-Lenovo-IdeaPad-Y470] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> > [mpiexec at zhang-Lenovo-IdeaPad-Y470] main (./ui/mpich/mpiexec.c:405):
> process manager error waiting for completion
> > And the two computers are on a private net 192.168.48.xxx. I would very
> strongly appreciate when somebody could give me hint how to cope with this
> problem.
> >
> > Thanks very much in advance for any tip.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120621/9fec8f20/attachment.html>


More information about the mpich-discuss mailing list