[mpich-discuss] mpich2 1.4.1p1 exit codes

Dave Goodell goodell at mcs.anl.gov
Wed Dec 14 16:29:01 CST 2011


What version of MPICH2 are you using?

This looks like a segfault in your program (signal 11 plus 128 = 139), although this usually shows up instead as "exited with signal 11" or some similar error from hydra.  Perhaps "dotprod" is a shell script that is running your actual binary which is in turn segfaulting?

Anyway, segfaults like this can be debugged in numerous ways.  My favorite method is usually to start with Valgrind and then move on to gdb and/or printf debugging depending on the circumstances.

-Dave

On Dec 14, 2011, at 4:21 PM CST, Ricardo Román Brenes wrote:

> this is the complete error output:
> 
> [rroman at meta:/home/rroman] mpiexec -n 10 -f machines ./dotprod 
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:0 at cadejos-4] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
> [proxy:0:0 at cadejos-4] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at cadejos-4] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> [mpiexec at cadejos-0] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> [mpiexec at cadejos-0] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at cadejos-0] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
> [mpiexec at cadejos-0] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
> 
> On Wed, Dec 14, 2011 at 4:05 PM, Ricardo Román Brenes <roman.ricardo at gmail.com> wrote:
> where can i see the exit codes for applications?
> 
> I have a centos machine running an mpich2 program that exits with 139 and the same program in another machine but running debian finishes succesfully.
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list