[mpich-discuss] MPI error on MPI_alltoallv.

Pavan Balaji balaji at mcs.anl.gov
Thu Aug 2 13:35:21 CDT 2012


Please see this:

https://lists.mcs.anl.gov/mailman/htdig/mpich-discuss/2012-June/012590.html

  -- Pavan

On 08/02/2012 03:15 AM, jt.meng at siat.ac.cn wrote:
> Hi,
>      My programs run well on 960 cores, howerver if it was running on
> 1024cores, I get the following error.
>      I guess that this may cause by the OS limitations.
>      Can anyone help me resolve this problem?
>
> ulimit output start here:
> ---------------------------------------------------------
> # ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 136192
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files       &nbs p;              (-n) 819200
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> stack size              (kbytes, -s) unlimited
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 136192
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
>
> Error logs start here:
> ------------------------------------------------------------------------------------
> Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
> PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b08c2bd7010 ,
> scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT,
> rbuf=0x2b08c5bde010, rcnts=0x658b30, rdispls=0x65ab50,
> MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
> MPIR_Alltoallv_impl(389)......:
> MPIR_Alltoallv(355)...........:
> MPIR_Alltoallv_intra(199).....:
> MPIC_Waitall_ft(852)..........:
> MPIR_Waitall_impl(121)........:
> MPIDI_CH3I_Progress(402)......:
> MPID_nem_mpich2_test_recv(747):
> MPID_nem_tcp_connpoll(1838)...:
> state_listening_handler(1908).: accept of socket fd failed - Too many
> open files
> Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
> PMPI_Alltoallv(549)...........: MPI_Alltoallv(sbuf=0x2b974c333010,
> scnts=0x64ac20, sdispls=0x659b40, MPI_LONG_LONG_INT,
> rbuf=0x2b974f335010, rcnts=0x658b30, rdispls=0x65ab50,
> MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
> MPIR_Alltoallv_impl(389)......:
> MPIR_Alltoallv(355)...........:
> MPIR_Alltoallv_int ra(199).....:
> MPIC_Waitall_ft(852)..........:
> MPIR_Waitall_impl(121)........:
> MPIDI_CH3I_Progress(402)......:
> MPID_nem_mpich2_test_recv(747):
> MPID_nem_tcp_connpoll(1838)...:
> state_listening_handler(1908).: accept of socket fd failed - Too many
> open files
> [proxy:0:9 at node15] handle_pmi_response (./pm/pmiserv/pmip_cb.c:406):
> assert (!closed) failed
> [proxy:0:9 at node15] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:952): unable to handle PMI response
> [proxy:0:9 at node15] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:9 at node15] main (./pm/pmiserv/pmip.c:226): demux engine error
> waiting for event
> [mpiexec at node73] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert
> (!closed) failed
> [mpiexec at node73] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiex ec at node73] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> [mpiexec at node73] main (./ui/mpich/mpiexec.c:405): process manager error
> waiting for completion
>
> Jintao
>
>
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list