[mpich-discuss] Hydra handling of non-zero exit codes (1.3.2, 1.4rc2)

Yauheni Zelenko zelenko at cadence.com
Thu Apr 28 17:37:36 CDT 2011


I just re-verified with logging that MPI_Finalize() is called at the end.

I also experimented with cpi.c by replacing "return 0;" with "return 100;" (MPI_Finalize() is called before). Last variant demonstrate issue.

So I still think this is problem with exit codes handling.

Eugene.
________________________________________
From: Pavan Balaji [balaji at mcs.anl.gov]
Sent: Thursday, April 28, 2011 3:23 PM
To: mpich-discuss at mcs.anl.gov
Cc: Yauheni Zelenko
Subject: Re: [mpich-discuss] Hydra handling of non-zero exit codes (1.3.2, 1.4rc2)

Sorry, I misspoke. For cleanup, we don't actually look for the return
code, but rather if an internal (PMI) connection to the MPI processes is
broken. This is only for MPI processes -- for non-MPI processes, we
don't do any of this and let the user clean it up.

So, to go back to your question -- Hydra has no problem with a non-zero
exit codes. It does have a problem with applications aborting without
calling MPI_Finalize. But you can override that by passing
-disable-auto-cleanup.

  -- Pavan

On 04/28/2011 05:16 PM, Yauheni Zelenko wrote:
> Hi, Pavan!
>
> Thank you for help!
>
> Eugene.
> ________________________________________
> From: Pavan Balaji [balaji at mcs.anl.gov]
> Sent: Thursday, April 28, 2011 3:11 PM
> To: mpich-discuss at mcs.anl.gov
> Cc: Yauheni Zelenko
> Subject: Re: [mpich-discuss] Hydra handling of non-zero exit codes (1.3.2, 1.4rc2)
>
> If a process terminates with a non-zero return code, Hydra cleans up the
> remaining processes. Not doing this is bad, because it might cause the
> application to hang. You can disable automatic cleanup by passing the
> -disable-auto-cleanup option. I think this is what you are looking for.
>
> The return code of mpiexec is a bit-wise OR of all the process exit
> codes, so if all processes return the same exit code, mpiexec will
> return the same exit code as well.
>
>    -- Pavan
>
> On 04/28/2011 05:02 PM, Yauheni Zelenko wrote:
>> Hi!
>>
>> Our application could return non-zero exit codes as flag to launching script to make some further post-processing.
>>
>> Hydra prints "BAD TERMINATION OF ONE OF YOUR PROCESSES".
>>
>> I think will be good idea to add command line option to Hydra to allow non-zero exit codes and don't change them if all of them are same from all MPI processes.
>>
>> Problem may be reproduced with any MPICH2 example by returning non-zero from main().
>>
>> I also think will be good idea to print exit codes in Hydra verbose output.
>>
>> Eugene.
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list