[mpich-discuss] Recovering from a Bcast Timeout

Pavan Balaji balaji at mcs.anl.gov
Tue Jan 5 07:44:40 CST 2010


Calling an init after a finalize in the same program is incorrect as per
the MPI standard. If it worked in some cases, you were lucky :-).

See pg. 291 line 1 of the MPI-2.2 standard.

 -- Pavan

On 01/04/2010 05:11 PM, Hiatt, Dave M wrote:
> A general question to those in the know.  From time to time I get a Bcast timeout error.  I'm putting in an error handler to do a "catch" on this exception (C++).  My question is, will an MPI:: Finalize() followed by and MPI:: Initi() work from the same process.  This error is being caused by our deficient network, we've never lost a blade, and I'm confident both the app and MPI are functioning properly though considerable investigation.
> 
> So are there any consequences to simply doing a Finalize() and a new Init() to start up, or will I have to stop the whole process and start again?  I'm assuming that it should restart without prejudice.  I'm on 1.2.1 Windows/Linux releases.
> 
> Thanks
> 
> dave
> 
> 
> "Consequences, Schmonsequences, as long as I'm rich". - Daffy Duck
> Dave Hiatt
> Market Risk Systems Integration
> CitiMortgage, Inc.
> 1000 Technology Dr.
> Third Floor East, M.S. 55
> O'Fallon, MO 63368-2240
> 
> Phone:  636-261-1408
> Mobile: 314-452-9165
> FAX:    636-261-1312
> Email:     Dave.M.Hiatt at citigroup.com
> 
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list