[MPICH] troubles with mpich2-1.0.3

Anthony Chan chan at mcs.anl.gov
Wed Jan 18 04:05:25 CST 2006



On Wed, 18 Jan 2006, Philip Sydney Lavers wrote:

> 2) The new version would not build on all the nodes even though it builds without problem on identical machines.

>From the output that you sent, I can't definitely identify the cause of
the problem.  However, the failure seems to occur earlier that what you
showed below.  So could you send me your configure and make outputs as
seen on your screen ?

A.Chan

>
> Here is the make error:
>
> "lots of output
> ...........make /home/psl/downloads/mpich2-1.0.3/src/mpe2/lib/libmpe.a
> `/home/psl/downloads/mpich2-1.0.3/src/mpe2/lib/libmpe.a' is up to date.
> make /home/psl/downloads/mpich2-1.0.3/src/mpe2/bin/clog2_print
> gcc -O3 -march=athlon64 -I.. -I/home/psl/downloads/mpich2-1.0.3/src/mpe2/src/logging/include  -I../../.. -I/home/psl/downloads/mpich2-1.0.3/sr
> c/mpe2/src/logging/../../include    -DCLOG_NOMPI -c clog_print.c
> gcc  -O3 -march=athlon64  -o /home/psl/downloads/mpich2-1.0.3/src/mpe2/bin/clog2_print clog_print.o  -L/home/psl/downloads/mpich2-1.0.3/src/mp
> e2/lib -lmpe_nompi
> clog_print.o(.text+0x2e): In function `main':
> : undefined reference to `CLOG_Rec_sizes_init'
> clog_print.o(.text+0x33): In function `main':
> : undefined reference to `CLOG_Preamble_create'
> clog_print.o(.text+0x41): In function `main':
> : undefined reference to `CLOG_Preamble_read'
> clog_print.o(.text+0x51): In function `main':
> : undefined reference to `CLOG_Preamble_print'
> clog_print.o(.text+0x5d): In function `main':
> : undefined reference to `CLOG_BlockData_create'
> clog_print.o(.text+0x80): In function `main':
> : undefined reference to `CLOG_BlockData_reset'
> clog_print.o(.text+0x91): In function `main':
> : undefined reference to `CLOG_BlockData_print'
> clog_print.o(.text+0xea): In function `main':
> : undefined reference to `CLOG_BlockData_free'
> clog_print.o(.text+0xf2): In function `main':
> : undefined reference to `CLOG_Preamble_free'
> clog_print.o(.text+0x10e): In function `main':
> : undefined reference to `CLOG_BlockData_swap_bytes_first'
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3/src/mpe2/src/logging/src.
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3/src/mpe2/src/logging/src.
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3/src/mpe2/src/logging.
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3/src/mpe2.
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3/src.
> *** Error code 1
>
> Stop in /home/psl/downloads/mpich2-1.0.3.
>
> "
>
> This problem is an old "friend" - I remember once changing source code to fix the build but I dont want to do that now that I have ten machines.
>
> 3) The above problem goes away randomly by just repeating the command 'make' - either the error will recurr or the build will complete.
>
> I then su to root and 'make install' .
>
> Great - but I think mpi on some nodes is fractured.
>
> 4) Here is a broken production run output:
>
> '
> psl at claude1$ mpdtrace -l
> claude1_56774 (192.168.1.11)
> Paul1_49487 (192.168.1.221)
> claude8_64665 (192.168.1.18)
> claude10_54749 (192.168.1.20)
> psl at claude1$ mpiexec -n 6 ./rogfn 1 100 10  789 500 .0000001
> Process 0 of 6 is on claude1
> Process 1 of 6 is on Paul1Process 2 of 6 is on claude8
>
> Process 3 of 6 is on claude10
> Process 4 of 6 is on claude1
> Process 5 of 6 is on Paul1
> [cli_3]: aborting job:
> Fatal error in MPI_Barrier: Error message texts are not available, error stack:
> (unknown)(): Error message texts are not available
> rank 3 in job 1  claude1_56774   caused collective abort of all ranks
>   exit status of rank 3: return code 13
>
> '
> Is this problem related to the build problem?
> Can anyone help?
>
> rogfn is a very large md type programme that was runnning very well on the smaller cluster. claude1 and Paul1 are dual core athlon and dual opteron machines respectively. Asking for six processes on the above four platforms commits them to using both processors.
>
> Thanks for any advice,
>
> Phil Lavers
>
>




More information about the mpich-discuss mailing list