[mpich-discuss] MPI error on MPI_alltoallv.

jt.meng at siat.ac.cn jt.meng at siat.ac.cn
Thu Aug 2 03:15:56 CDT 2012


Hi, 
    My programs run well on 960 cores, howerver if it was running on 1024cores, I get the following error.  
    I guess that this may cause by the OS limitations. 
    Can anyone help me resolve this problem? 


ulimit output start here: 
---------------------------------------------------------
# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 136192
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 819200
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 136192
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited




Error logs start here: 
------------------------------------------------------------------------------------
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b08c2bd7010, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b08c5bde010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389)......:
MPIR_Alltoallv(355)...........:
MPIR_Alltoallv_intra(199).....:
MPIC_Waitall_ft(852)..........:
MPIR_Waitall_impl(121)........:
MPIDI_CH3I_Progress(402)......:
MPID_nem_mpich2_test_recv(747):
MPID_nem_tcp_connpoll(1838)...:
state_listening_handler(1908).: accept of socket fd failed - Too many open files
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b974c333010, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b974f335010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389)......:
MPIR_Alltoallv(355)...........:
MPIR_Alltoallv_intra(199).....:
MPIC_Waitall_ft(852)..........:
MPIR_Waitall_impl(121)........:
MPIDI_CH3I_Progress(402)......:
MPID_nem_mpich2_test_recv(747):
MPID_nem_tcp_connpoll(1838)...:
state_listening_handler(1908).: accept of socket fd failed - Too many open files
[proxy:0:9 at node15] handle_pmi_response (./pm/pmiserv/pmip_cb.c:406): assert (!closed) failed
[proxy:0:9 at node15] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:952): unable to handle PMI response
[proxy:0:9 at node15] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:9 at node15] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at node73] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec at node73] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at node73] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at node73] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion


Jintao




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120802/7bb9d62d/attachment.html>


More information about the mpich-discuss mailing list