[mpich-discuss] MPI Cluster Hangs
Jayesh Krishna
jayesh at mcs.anl.gov
Fri Jan 15 09:51:01 CST 2010
Hi,
Can you send us a test program that shows the problem ?
Regards,
Jayesh
----- Original Message -----
From: "abhishek pandey" <hipandey at gmail.com>
To: mpich-discuss at mcs.anl.gov
Sent: Friday, January 15, 2010 8:57:15 AM GMT -06:00 US/Canada Central
Subject: [mpich-discuss] MPI Cluster Hangs
Hi,
I am using MPI to communicate in cluster consisting of controller-workers. There is one controller and 5 workers. All these workers are spawned by controller.
Most of the time the communication works fine between controller and workers but sometime a worker hangs. I am running cluster on windows and my program is multi-threaded.
The flow is as follows:
Worker :
1. worker places a IRecv request to get the message from controller.
2. worker sends ("blocking" ) a message to controller to provide data which the worker takes in buffer posted in step-1.
3. Worker tests the IRecv request. If the test fails then worker sleeps for sometime and then tests again.
Controller:
1. Controlller gets the message from worker and sends (blocking) message to worker.
But controller does send the message to worker and this message does not lost. But sometime the placed Irecv request from worker never succeeds and worker hangs.
Any thought on this ?
Thanks,
Abhishek
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list