[mpich2-dev] MPI_COMM_WORLD failed when using pio-bench on PVFS

Rajeev Thakur thakur at mcs.anl.gov
Wed Mar 25 13:41:03 CDT 2009


Do the other MPICH2 tests run, such as the cpi example in the examples
directory? If you run "make testing" in the top-level mpich2 directory it
will run the entire tests suite in test/mpi (can take more than an hour).
 
Rajeev
 


  _____  

From: mpich2-dev-bounces at mcs.anl.gov [mailto:mpich2-dev-bounces at mcs.anl.gov]
On Behalf Of ??
Sent: Wednesday, March 25, 2009 12:11 AM
To: MPICH2-developer mailling-list
Subject: [mpich2-dev] MPI_COMM_WORLD failed when using pio-bench on PVFS


Hi, 

I configure PVFS system server on my own laptop, which uses MPICH2 as its
trove storage system implementation.  And now I tend to use pio-bench to get
the trace of the server. It works as I set the number of process as 1, but
when I set the number of process to more than 1, it does not work all the
time. It tips the MPI_COMM_WORLD failed, but I run hostname command with MPI
and it works. It is peculiar. I check the MPICH2 document and find no
particular configuration for single host MPI deployment. Do I do anything
wrong?


 By the way my laptop is IBM i386 architecture and CPU is intel centrino2
vPro, OS UBUNTU 8.10, MPICH2 1.0.8 is installed by default under
/usr/local, and pvfs2.7.1 is under /root/pvfs-install/. 

gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 1 ./pio-bench
[sudo] password for gxwangdi:
File under test: /mnt/pvfs2/ftpaccess
Number of Processes: 1
Sync: off
Averaging: Off
the nested strided pattern needs to be run with an even amount of processes
file pio-bench.c, line 586: access pattern initialization error: -1

gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4 ./pio-bench
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)..........................: MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)[cli_0]: aborting job:
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)..........................: MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_readFatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1f329978 0x1f3258d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)[cli_1]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by peer)
:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1f329978 0x1f3258d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1eb9f978 0x1eb9b8d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)[cli_2]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)........................so....: MPI_Bcast(buf=0x1fd6ca78,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1eb9f978 0x1eb9b8d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)
rank 1 in job 9  WANGDI_59039   caused collective abort of all ranks
  exit status of rank 1: return code 1
rank 0 in job 9  WANGDI_59039   caused collective abort of all ranks
  exit status of rank 0: return code 1
gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4 hostname
WANGDI
WANGDI
WANGDI
WANGDI


and my pio-bench.conf file is like:

Testfile "/mnt/pvfs2/ftpaccess"

OutputToFile "/home/gxwangdi/Desktop/pio-bench/results/result"

<ap_module>
ModuleName "Nested Strided (read)"
ModuleReps 3
ModuleSettleTime 5
</ap_module>

<ap_module>so
ModuleName "Nested Strided (write)"
ModuleReps 3
ModuleSettleTime 5
</ap_module>

<ap_module>
ModuleName "Nested Strided (read-modify-write)"
ModuleReps 3
ModuleSettleTime 5
</ap_module>

<ap_module>
ModuleName "Nested Strided (re-read)"
ModuleReps 3
ModuleSettleTime 5
</ap_module>

<ap_module>
ModuleName "Nested Strided (re-write)"
ModuleReps 3
ModuleSettleTime 5
</ap_module>

Also I use another file that is not under /mnt/pvfs2 for test, it tips like
below:

gxwangdi at WANGDI:~/Desktop/pio-bench$ sudo mpiexec -n 4 ./pio-bench
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)..........................: MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)[cli_0]: aborting job:
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)..........................: MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77)..........................:
MPIC_Sendrecv(126)........................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_readFatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1f525978 0x1f5218d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)[cli_2]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by peer)
:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x1f525978 0x1f5218d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x202d2978 0x202ce8d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)[cli_1]: aborting job:
Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(786)............................: MPI_Bcast(buf=0x1f60fad0,
count=20, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(198)...........................:
MPIC_Recv(81).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(456):
adjust_iov(973)...........................: ch3|sock|immedread
0x1e5a0d60 0x202d2978 0x202ce8d0
MPIDU_Sock_readv(455).....................: the supplied buffer contains
invalid memory (set=0,sock=1,errno=14:Bad address)
rank 2 in job 11  WANGDI_59039   caused collective abort of all ranks
  exit status of rank 2: return code 1
rank 1 in job 11  WANGDI_59039   caused collective abort of all ranks
  exit status of rank 1: return code 1
rank 0 in job 11  WANGDI_59039   caused collective abort of all ranks
  exit status of rank 0: return code 1


The MPI_COMM_WORLD failed again but it caused collective abort of all
ranks at the end, which is a little bit different, as it has no syslog
for pio-bench to check, I do not understand what happens and I can not
solve this problem.

Appreciate your responses.

	

  _____  

好玩贺卡等你发,邮箱贺卡全新上线!
<http://cn.rd.yahoo.com/mail_cn/tagline/card/*http://card.mail.cn.yahoo.com/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20090325/eb377da2/attachment.htm>


More information about the mpich2-dev mailing list