[mpich-discuss] Recovering from a Bcast Timeout
Pavan Balaji
balaji at mcs.anl.gov
Tue Jan 5 07:44:40 CST 2010
Calling an init after a finalize in the same program is incorrect as per
the MPI standard. If it worked in some cases, you were lucky :-).
See pg. 291 line 1 of the MPI-2.2 standard.
-- Pavan
On 01/04/2010 05:11 PM, Hiatt, Dave M wrote:
> A general question to those in the know. From time to time I get a Bcast timeout error. I'm putting in an error handler to do a "catch" on this exception (C++). My question is, will an MPI:: Finalize() followed by and MPI:: Initi() work from the same process. This error is being caused by our deficient network, we've never lost a blade, and I'm confident both the app and MPI are functioning properly though considerable investigation.
>
> So are there any consequences to simply doing a Finalize() and a new Init() to start up, or will I have to stop the whole process and start again? I'm assuming that it should restart without prejudice. I'm on 1.2.1 Windows/Linux releases.
>
> Thanks
>
> dave
>
>
> "Consequences, Schmonsequences, as long as I'm rich". - Daffy Duck
> Dave Hiatt
> Market Risk Systems Integration
> CitiMortgage, Inc.
> 1000 Technology Dr.
> Third Floor East, M.S. 55
> O'Fallon, MO 63368-2240
>
> Phone: 636-261-1408
> Mobile: 314-452-9165
> FAX: 636-261-1312
> Email: Dave.M.Hiatt at citigroup.com
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list