[mpich-discuss] mpich2 1.4.1p1 exit codes
Dave Goodell
goodell at mcs.anl.gov
Wed Dec 14 16:30:52 CST 2011
Also, the "-print-all-exitcodes" option to mpiexec might help somewhat.
-Dave
On Dec 14, 2011, at 4:29 PM CST, Dave Goodell wrote:
> What version of MPICH2 are you using?
>
> This looks like a segfault in your program (signal 11 plus 128 = 139), although this usually shows up instead as "exited with signal 11" or some similar error from hydra. Perhaps "dotprod" is a shell script that is running your actual binary which is in turn segfaulting?
>
> Anyway, segfaults like this can be debugged in numerous ways. My favorite method is usually to start with Valgrind and then move on to gdb and/or printf debugging depending on the circumstances.
>
> -Dave
>
> On Dec 14, 2011, at 4:21 PM CST, Ricardo Román Brenes wrote:
>
>> this is the complete error output:
>>
>> [rroman at meta:/home/rroman] mpiexec -n 10 -f machines ./dotprod
>> =====================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = EXIT CODE: 139
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> =====================================================================================
>> [proxy:0:0 at cadejos-4] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
>> [proxy:0:0 at cadejos-4] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
>> [proxy:0:0 at cadejos-4] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
>> [mpiexec at cadejos-0] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
>> [mpiexec at cadejos-0] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
>> [mpiexec at cadejos-0] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
>> [mpiexec at cadejos-0] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
>>
>> On Wed, Dec 14, 2011 at 4:05 PM, Ricardo Román Brenes <roman.ricardo at gmail.com> wrote:
>> where can i see the exit codes for applications?
>>
>> I have a centos machine running an mpich2 program that exits with 139 and the same program in another machine but running debian finishes succesfully.
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list