[mpich-discuss] mpich2 1.4.1p1 exit codes

Dave Goodell goodell at mcs.anl.gov
Wed Dec 14 16:30:52 CST 2011


Also, the "-print-all-exitcodes" option to mpiexec might help somewhat.

-Dave

On Dec 14, 2011, at 4:29 PM CST, Dave Goodell wrote:

> What version of MPICH2 are you using?
> 
> This looks like a segfault in your program (signal 11 plus 128 = 139), although this usually shows up instead as "exited with signal 11" or some similar error from hydra.  Perhaps "dotprod" is a shell script that is running your actual binary which is in turn segfaulting?
> 
> Anyway, segfaults like this can be debugged in numerous ways.  My favorite method is usually to start with Valgrind and then move on to gdb and/or printf debugging depending on the circumstances.
> 
> -Dave
> 
> On Dec 14, 2011, at 4:21 PM CST, Ricardo Román Brenes wrote:
> 
>> this is the complete error output:
>> 
>> [rroman at meta:/home/rroman] mpiexec -n 10 -f machines ./dotprod 
>> =====================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   EXIT CODE: 139
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> =====================================================================================
>> [proxy:0:0 at cadejos-4] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
>> [proxy:0:0 at cadejos-4] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
>> [proxy:0:0 at cadejos-4] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
>> [mpiexec at cadejos-0] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
>> [mpiexec at cadejos-0] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
>> [mpiexec at cadejos-0] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
>> [mpiexec at cadejos-0] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
>> 
>> On Wed, Dec 14, 2011 at 4:05 PM, Ricardo Román Brenes <roman.ricardo at gmail.com> wrote:
>> where can i see the exit codes for applications?
>> 
>> I have a centos machine running an mpich2 program that exits with 139 and the same program in another machine but running debian finishes succesfully.
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list