[mpich2-dev] MPI_COMM_WORLD failed when using pio-bench on PVFS
汪迪
otheryou at yahoo.cn
Thu Mar 26 00:05:04 CDT 2009
Hi, Rajeev
Other MPI programs runs correctly, including cpi under examples folder.
gxwangdi at WANGDI ~/D/m/examples> pwd
~/Downloads/mpich2-1.0.8/examples
gxwangdi at WANGDI ~/D/m/examples> mpd &
gxwangdi at WANGDI ~/D/m/examples> mpdtrace
WANGDI
gxwangdi at WANGDI ~/D/m/examples> which mpiexec
/usr/local/bin/mpiexec
gxwangdi at WANGDI ~/D/m/examples> mpiexec -n 10 ./cpi
Process 2 of 10 is on WANGDI
Process 1 of 10 is on WANGDI
Process 0 of 10 is on WANGDI
Process 6 of 10 is on WANGDI
Process 4 of 10 is on WANGDI
Process 9 of 10 is on WANGDI
Process 3 of 10 is on WANGDI
Process 7 of 10 is on WANGDI
Process 8 of 10 is on WANGDI
Process 5 of 10 is on WANGDI
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.012229
Thanks for your suggestion, I run make testing in the top directory of mpich2. There is a test that do not pass:
xwangdi at WANGDI ~/D/mpich2-1.0.8> mpd &
gxwangdi at WANGDI ~/D/mpich2-1.0.8> make testing
(cd test && make testing)
make[1]: Entering directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test'
(NOXMLCLOSE=YES && export NOXMLCLOSE && cd mpi && make testing)
make[2]: Entering directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi'
./runtests -srcdir=. -tests=testlist \
-mpiexec=/usr/local/bin/mpiexec \
-xmlfile=summary.xml
Looking in ./testlist
Processing directory attr
Looking in ./attr/testlist
Processing directory coll
Looking in ./coll/testlist
Processing directory comm
Looking in ./comm/testlist
Some programs (cmsplit) may still be running:
pids = 2279
The executable (cmsplit) will not be removed.
Processing directory datatype
Looking in ./datatype/testlist
Processing directory errhan
Looking in ./errhan/testlist
Processing directory group
Looking in ./group/testlist
Processing directory info
Looking in ./info/testlist
Processing directory init
Looking in ./init/testlist
Processing directory pt2pt
Looking in ./pt2pt/testlist
Some programs (sendrecv3) may still be running:
pids = 4049
The executable (sendrecv3) will not be removed.
Processing directory rma
Looking in ./rma/testlist
Some programs (transpose3) may still be running:
pids = 5713
The executable (transpose3) will not be removed.
Processing directory spawn
Looking in ./spawn/testlist
Processing directory topo
Looking in ./topo/testlist
Processing directory perf
Looking in ./perf/testlist
Processing directory io
Looking in ./io/testlist
Processing directory cxx
Looking in ./cxx/testlist
Processing directory attr
Looking in ./cxx/attr/testlist
Processing directory pt2pt
Looking in ./cxx/pt2pt/testlist
Failed to build bsend1cxx; make[3]: Entering directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi/cxx/pt2pt'
/usr/local/bin/mpicxx -DHAVE_CONFIG_H -I. -I. -I../../include -I./../../include -c bsend1cxx.cxx
bsend1cxx.cxx: In function ‘int main(int, char**)’:
bsend1cxx.cxx:81: error: ‘strcmp’ was not declared in this scope
bsend1cxx.cxx:91: error: ‘strcmp’ was not declared in this scope
make[3]: *** [bsend1cxx.o] Error 1
make[3]: Leaving directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi/cxx/pt2pt'
Processing directory comm
Looking in ./cxx/comm/testlist
Processing directory coll
Looking in ./cxx/coll/testlist
Processing directory init
Looking in ./cxx/init/testlist
Processing directory info
Looking in ./cxx/info/testlist
Processing directory datatype
Looking in ./cxx/datatype/testlist
Processing directory io
Looking in ./cxx/io/testlist
Processing directory spawn
Looking in ./cxx/spawn/testlist
Processing directory rma
Looking in ./cxx/rma/testlist
Processing directory errors
Looking in ./errors/testlist
Processing directory attr
Looking in ./errors/attr/testlist
Processing directory coll
Looking in ./errors/coll/testlist
Processing directory comm
Looking in ./errors/comm/testlist
Processing directory group
Looking in ./errors/group/testlist
Processing directory pt2pt
Looking in ./errors/pt2pt/testlist
Processing directory topo
Looking in ./errors/topo/testlist
Processing directory rma
Looking in ./errors/rma/testlist
Processing directory spawn
Looking in ./errors/spawn/testlist
Processing directory io
Looking in ./errors/io/testlist
Processing directory cxx
Looking in ./errors/cxx/testlist
Processing directory errhan
Looking in ./errors/cxx/errhan/testlist
Processing directory io
Looking in ./errors/cxx/io/testlist
Processing directory threads
Looking in ./threads/testlist
Processing directory pt2pt
Looking in ./threads/pt2pt/testlist
Processing directory comm
Looking in ./threads/comm/testlist
1 tests failed out of 385
Details in /home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi/summary.xml
make[2]: Leaving directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi'
(XMLFILE=../mpi/summary.xml && XMLCONTINUE=YES && \
export XMLFILE && export XMLCONTINUE && \
cd commands && make testing)
make[2]: Entering directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/commands'
make[2]: Nothing to be done for `testing'.
make[2]: Leaving directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test/commands'
make[1]: Leaving directory `/home/gxwangdi/Downloads/mpich2-1.0.8/test'
And the attachment is the /home/gxwangdi/Downloads/mpich2-1.0.8/test/mpi/summary.xml file, it is too long to paste all content here. I can not understand what is the problem yet.
From: Rajeev Thakur <thakur at mcs.anl.gov>
Subject: RE: [mpich2-dev] MPI_COMM_WORLD failed when using pio-bench on PVFS
To: otheryou at yahoo.cn, mpich2-dev at mcs.anl.gov
Date: 2009,3/26,Thursday,2:41
Do the other MPICH2 tests run, such as the cpi example in
the examples directory? If you run "make testing" in the top-level mpich2
directory it will run the entire tests suite in test/mpi (can take more than an
hour).
Rajeev
From: mpich2-dev-bounces at mcs.anl.gov
[mailto:mpich2-dev-bounces at mcs.anl.gov] On Behalf Of ??
Sent:
Wednesday, March 25, 2009 12:11 AM
To: MPICH2-developer
mailling-list
Subject: [mpich2-dev] MPI_COMM_WORLD failed when using
pio-bench on PVFS
Hi,
I configure PVFS system server on my own
laptop, which uses MPICH2 as its trove storage system
implementation. And now I tend to use pio-bench to get the trace
of the server. It works as I set the number of process as 1, but when I
set the number of process to more than 1, it does not work all the time.
It tips the MPI_COMM_WORLD failed, but I run hostname command with MPI
and it works. It is peculiar. I check the MPICH2 document and find no
particular configuration for single host MPI deployment. Do I do
anything wrong?
By the way my laptop is IBM i386
architecture and CPU is intel centrino2 vPro, OS UBUNTU 8.10, MPICH2
1.0.8 is installed by default under /usr/local, and pvfs2.7.1 is
under /root/pvfs-install/.
gxwangdi at WANGDI:~/Desktop/pio-bench$
sudo mpiexec -n 1 ./pio-bench
[sudo] password for gxwangdi:
File
under test: /mnt/pvfs2/ftpaccess
Number of Processes: 1
Sync:
off
Averaging: Off
the nested strided pattern needs to be run with
an even amount of processes
file pio-bench.c, line 586: access
pattern initialization error:
-1
gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4
./pio-bench
Fatal error in MPI_Barrier: Other MPI error, error
stack:
MPI_Barrier(406)..........................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............:
connection failure
(set=0,sock=1,errno=104:Connection reset by
peer)[cli_0]: aborting job:
Fatal error in MPI_Barrier: Other MPI
error, error stack:
MPI_Barrier(406)..........................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_readFatal
error in MPI_Bcast: Other MPI error,
error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1f329978
0x1f3258d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)[cli_1]: aborting job:
Fatal error in MPI_Bcast: Other MPI
error, error stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................(637)..............:
connection
failure (set=0,sock=1,errno=104:Connection reset by
peer)
:
MPIDI_CH3i_Progress_wait(215).............: an error
occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1f329978
0x1f3258d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)
Fatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1eb9f978
0x1eb9b8d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)[cli_2]: aborting job:
Fatal error in MPI_Bcast: Other MPI
error, error stack:
MPI_Bcast(786)........................so....:
MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1eb9f978
0x1eb9b8d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)
rank 1 in job 9 WANGDI_59039 caused
collective abort of all ranks
exit status of rank 1: return
code 1
rank 0 in job 9 WANGDI_59039 caused
collective abort of all ranks
exit status of rank 0: return
code 1
gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4
hostname
WANGDI
WANGDI
WANGDI
WANGDI
and my
pio-bench.conf file is like:
Testfile
"/mnt/pvfs2/ftpaccess"
OutputToFile
"/home/gxwangdi/Desktop/pio-bench/results/result"
<ap_module>
ModuleName
"Nested Strided (read)"
ModuleReps 3
ModuleSettleTime
5
</ap_module>
<ap_module>so
ModuleName "Nested
Strided (write)"
ModuleReps 3
ModuleSettleTime
5
</ap_module>
<ap_module>
ModuleName "Nested
Strided (read-modify-write)"
ModuleReps 3
ModuleSettleTime
5
</ap_module>
<ap_module>
ModuleName "Nested
Strided (re-read)"
ModuleReps 3
ModuleSettleTime
5
</ap_module>
<ap_module>
ModuleName "Nested
Strided (re-write)"
ModuleReps 3
ModuleSettleTime
5
</ap_module>
Also I use another file that is not under
/mnt/pvfs2 for test, it tips like
below:
gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4
./pio-bench
Fatal error in MPI_Barrier: Other MPI error, error
stack:
MPI_Barrier(406)..........................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............:
connection failure
(set=0,sock=1,errno=104:Connection reset by
peer)[cli_0]: aborting job:
Fatal error in MPI_Barrier: Other MPI
error, error stack:
MPI_Barrier(406)..........................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_readFatal
error in MPI_Bcast: Other MPI error,
error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1f525978
0x1f5218d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)[cli_2]: aborting job:
Fatal error in MPI_Bcast: Other MPI
error, error stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................(637)..............:
connection
failure (set=0,sock=1,errno=104:Connection reset by
peer)
:
MPIDI_CH3i_Progress_wait(215).............: an error
occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x1f525978
0x1f5218d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)
Fatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x202d2978
0x202ce8d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)[cli_1]: aborting job:
Fatal error in MPI_Bcast: Other MPI
error, error stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................:
ch3|sock|immedread
0x1e5a0d60 0x202d2978
0x202ce8d0
MPIDU_Sock_readv(455).....................: the supplied
buffer contains
invalid memory (set=0,sock=1,errno=14:Bad
address)
rank 2 in job 11 WANGDI_59039 caused
collective abort of all ranks
exit status of rank 2: return
code 1
rank 1 in job 11 WANGDI_59039 caused
collective abort of all ranks
exit status of rank 1: return
code 1
rank 0 in job 11 WANGDI_59039 caused
collective abort of all ranks
exit status of rank 0: return
code 1
The MPI_COMM_WORLD failed again but it caused
collective abort of all
ranks at the end, which is a little bit
different, as it has no syslog
for pio-bench to check, I do not
understand what happens and I can not
solve this
problem.
Appreciate your
responses.
好玩贺卡等你发,邮箱贺卡全新上线!
___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20090326/a454943f/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: summary.xml
Type: text/xml
Size: 43399 bytes
Desc: not available
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20090326/a454943f/attachment-0001.bin>
More information about the mpich2-dev
mailing list