[mpich-discuss] Recovering from a Bcast Timeout

Hiatt, Dave M dave.m.hiatt at citi.com
Mon Jan 4 17:11:08 CST 2010


A general question to those in the know.  From time to time I get a Bcast timeout error.  I'm putting in an error handler to do a "catch" on this exception (C++).  My question is, will an MPI:: Finalize() followed by and MPI:: Initi() work from the same process.  This error is being caused by our deficient network, we've never lost a blade, and I'm confident both the app and MPI are functioning properly though considerable investigation.

So are there any consequences to simply doing a Finalize() and a new Init() to start up, or will I have to stop the whole process and start again?  I'm assuming that it should restart without prejudice.  I'm on 1.2.1 Windows/Linux releases.

Thanks

dave


"Consequences, Schmonsequences, as long as I'm rich". - Daffy Duck
Dave Hiatt
Market Risk Systems Integration
CitiMortgage, Inc.
1000 Technology Dr.
Third Floor East, M.S. 55
O'Fallon, MO 63368-2240

Phone:  636-261-1408
Mobile: 314-452-9165
FAX:    636-261-1312
Email:     Dave.M.Hiatt at citigroup.com






More information about the mpich-discuss mailing list