<div>Hi, </div><div> My programs run well on 960 cores, howerver if it was running on 1024cores, I get the following error. </div><div> I guess that this may cause by the OS limitations. </div><div> Can anyone help me resolve this problem? </div><div><div><br></div><div>ulimit output start here: </div><div>---------------------------------------------------------</div><div># ulimit -a</div><div>core file size (blocks, -c) 0</div><div>data seg size (kbytes, -d) unlimited</div><div>file size (blocks, -f) unlimited</div><div>pending signals (-i) 136192</div><div>max locked memory (kbytes, -l) unlimited</div><div>max memory size (kbytes, -m) unlimited</div><div>open files &nbs
p; (-n) 819200</div><div>pipe size (512 bytes, -p) 8</div><div>POSIX message queues (bytes, -q) 819200</div><div>stack size (kbytes, -s) unlimited</div><div>cpu time (seconds, -t) unlimited</div><div>max user processes (-u) 136192</div><div>virtual memory (kbytes, -v) unlimited</div><div>file locks (-x) unlimited</div></div><div><br></div><div><br></div><div>Error logs start here: </div><div>------------------------------------------------------------------------------------</div><div>Fatal error in PMPI_Alltoallv: Other MPI error, error stack:</div><div>PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b08c2bd7010
, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b08c5bde010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed</div><div>MPIR_Alltoallv_impl(389)......:</div><div>MPIR_Alltoallv(355)...........:</div><div>MPIR_Alltoallv_intra(199).....:</div><div>MPIC_Waitall_ft(852)..........:</div><div>MPIR_Waitall_impl(121)........:</div><div>MPIDI_CH3I_Progress(402)......:</div><div>MPID_nem_mpich2_test_recv(747):</div><div>MPID_nem_tcp_connpoll(1838)...:</div><div>state_listening_handler(1908).: accept of socket fd failed - Too many open files</div><div>Fatal error in PMPI_Alltoallv: Other MPI error, error stack:</div><div>PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b974c333010, scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT, rbuf=0x2b974f335010, rcnts=0x658b30, rdispls=0x65ab50, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed</div><div>MPIR_Alltoallv_impl(389)......:</div><div>MPIR_Alltoallv(355)...........:</div><div>MPIR_Alltoallv_int
ra(199).....:</div><div>MPIC_Waitall_ft(852)..........:</div><div>MPIR_Waitall_impl(121)........:</div><div>MPIDI_CH3I_Progress(402)......:</div><div>MPID_nem_mpich2_test_recv(747):</div><div>MPID_nem_tcp_connpoll(1838)...:</div><div>state_listening_handler(1908).: accept of socket fd failed - Too many open files</div><div>[proxy:0:9@node15] handle_pmi_response (./pm/pmiserv/pmip_cb.c:406): assert (!closed) failed</div><div>[proxy:0:9@node15] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:952): unable to handle PMI response</div><div>[proxy:0:9@node15] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status</div><div>[proxy:0:9@node15] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event</div><div>[mpiexec@node73] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed</div><div>[mpiexec@node73] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status</div><div>[mpiex
ec@node73] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event</div><div>[mpiexec@node73] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion</div><div><br></div>Jintao<br><br><br><br><br>