[mpich-discuss] Suggestions solicited

Thu Dec 4 10:52:05 CST 2008

My topology is as follows; I have my MPI cluster on pile of Linux workstations, I have network access from an AIX server and from a bunch of Windows workstations.  Right now to trigger a run, I have to Rlogin to the node 0 Linux box and run mpiexec to start.  

My plan is to convert the parallel application into a "service" hanging blocking recv message with the material calculation details for the next set of calculations.  

The idea would be to let the cluster receive requests from any of the Windows work stations.  

My first idea, was to have a workstation become part of the cluster in a special communicator, that is start a process, MPI::INIT, send a message to node 0 with the calculation data, and Finalize, and cease execution.  Then later return repeat the steps and retrieve results.  But it was pointed out to me that the only official method that can operate after MPI::Finalize is run is MPI::Finalized, I wondered if the following is possible.

If the process on the workstation that I propose to be volatile runs, and terminates and a new process comes back to request the results, is that within the bounds of the standard and a supported approach?

Also, as long as the hardware the Linux and Windows workstations are running on are the same "Endian", would this still be considered a homogenous MPI cluster, or is there any issue between Windows and Linux?  

How in general do others approach this, that being, allowing workstations send in new requests for runs and later come back and retrieve the results. 

If you lived here you'd be home by now
Dave Hiatt
Manager, Market Risk Systems Integration
CitiMortgage, Inc.
1000 Technology Dr.
Third Floor East, M.S. 55
O'Fallon, MO 63368-2240

Phone:	636-261-1408
Mobile:	314-452-9165
FAX:	636-261-1312
Email:     Dave.M.Hiatt at citigroup.com