[mpich-discuss] MPI error on MPI_alltoallv.
jt.meng at siat.ac.cn
jt.meng at siat.ac.cn
Thu Aug 2 03:15:56 CDT 2012
Hi,
My programs run well on 960 cores, howerver if it was running on 1024cores, I get the following error.
I guess that this may cause by the OS limitations.
Can anyone help me resolve this problem?
ulimit output start here:
---------------------------------------------------------
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 136192
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 819200
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 136192
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Error logs start here:
------------------------------------------------------------------------------------
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b08c2bd7010, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b08c5bde010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389)......:
MPIR_Alltoallv(355)...........:
MPIR_Alltoallv_intra(199).....:
MPIC_Waitall_ft(852)..........:
MPIR_Waitall_impl(121)........:
MPIDI_CH3I_Progress(402)......:
MPID_nem_mpich2_test_recv(747):
MPID_nem_tcp_connpoll(1838)...:
state_listening_handler(1908).: accept of socket fd failed - Too many open files
Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b974c333010, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b974f335010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(389)......:
MPIR_Alltoallv(355)...........:
MPIR_Alltoallv_intra(199).....:
MPIC_Waitall_ft(852)..........:
MPIR_Waitall_impl(121)........:
MPIDI_CH3I_Progress(402)......:
MPID_nem_mpich2_test_recv(747):
MPID_nem_tcp_connpoll(1838)...:
state_listening_handler(1908).: accept of socket fd failed - Too many open files
[proxy:0:9 at node15] handle_pmi_response (./pm/pmiserv/pmip_cb.c:406): assert (!closed) failed
[proxy:0:9 at node15] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:952): unable to handle PMI response
[proxy:0:9 at node15] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:9 at node15] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at node73] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec at node73] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at node73] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at node73] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
Jintao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120802/7bb9d62d/attachment.html>
More information about the mpich-discuss
mailing list