[mpich-discuss] Too many open files (errno 24) when using MPI_Alltoallv

jt.meng at siat.ac.cn jt.meng at siat.ac.cn
Thu Jun 7 01:16:52 CDT 2012


I have trouble on using MPI_alltoallv over 1000 cores, I can not find out why this error come out, can you help me? 

The TestAlltoAll.cpp is attached in this email, You can have a try on 1000 cores to reproduce this errors, Thanks.



node73:/lustrefs/home/temp/BGI/newGraph/MPIGraph # mpic++ -O2 TestAlltoAll.cpp  -o TestAlltoAll
node73:/lustrefs/home/temp/BGI/newGraph/MPIGraph # time mpirun -np 1056 -machinefile hostfile ./TestAlltoAll
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)..............: MPI_Alltoallv(sbuf=0x2b2738837010, scnts=0x5f06e0, sdispls=0x5f89b0, MPI_LONG_LONG_INT, rbuf=0x2b276ade2010, rcnts=0x5f7920, rdispls=0x5f9a40, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389).........:
MPIR_Alltoallv(355)..............:
MPIR_Alltoallv_intra(190)........:
MPIC_Isend(475)..................:
MPID_nem_lmt_RndvSend(81)........:
MPIDI_CH3_RndvSend(63)...........: failure occurred while attempting to send RTS packet
MPID_nem_tcp_iStartContigMsg(298):
MPID_nem_tcp_connect(849)........: unable to create a socket, Too many open files (errno 24)
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)..............: MPI_Alltoallv(sbuf=0x2b242dea0010, scnts=0x5f06e0, sdispls=0x5f89b0, MPI_LONG_LONG_INT, rbuf=0x2b246044b010, rcnts=0x5f7920, rdispls=0x5f9a40, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389).........:
MPIR_Alltoallv(355)..............:
MPIR_Alltoallv_intra(190)........:
MPIC_Isend(475)..................:
MPID_nem_lmt_RndvSend(81)........:
MPIDI_CH3_RndvSend(63)...........: failure occurred while attempting to send RTS packet
MPID_nem_tcp_iStartContigMsg(298):
MPID_nem_tcp_connect(849)........: unable to create a socket, Too many open files (errno 24)
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)..............: MPI_Alltoallv(sbuf=0x2b1a26776010, scnts=0x5f06e0, sdispls=0x5f89b0, MPI_LONG_LONG_INT, rbuf=0x2b1a58d21010, rcnts=0x5f7920, rdispls=0x5f9a40, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389).........:
MPIR_Alltoallv(355)..............:
MPIR_Alltoallv_intra(190)........:
MPIC_Isend(475)..................:
MPID_nem_lmt_RndvSend(81)........:
MPIDI_CH3_RndvSend(63)...........: failure occurred while attempting to send RTS packet
MPID_nem_tcp_iStartContigMsg(298):
MPID_nem_tcp_connect(849)........: unable to create a socket, Too many open files (errno 24)
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)...............: MPI_Alltoallv(sbuf=0x2b7cd63af010, scnts=0x5f06e0, sdispls=0x5f89b0, MPI_LONG_LONG_INT, rbuf=0x2b7d0895a010, rcnts=0x5f7920, rdispls=0x5f9a40, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389)..........:
MPIR_Alltoallv(355)...............:
MPIR_Alltoallv_intra(199).........:
MPIC_Waitall_ft(852)..............:
MPIR_Waitall_impl(121)............:
MPIDI_CH3I_Progress(402)..........:
MPID_nem_mpich2_blocking_recv(905):
MPID_nem_tcp_connpoll(1838).......:
state_listening_handler(1908).....: accept of socket fd failed - Too many open files
Ctrl-C caught... cleaning up processes

real    13m0.684s
user    0m13.065s
sys     0m49.703s


Jintao Meng
High Performance Computing Center
Shenzhen Institutes of Advanced Technology, CAS





-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TestAlltoAll.cpp
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120607/db1f6238/attachment.ksh>


More information about the mpich-discuss mailing list