[mpich-discuss] mpich2 error

Jayesh Krishna jayesh at mcs.anl.gov
Tue Feb 7 12:52:23 CST 2012


Hi,
 You dummy code (I would recommend sending us a working test code along with the skeleton next time - saves us a lot of time) worked for me (with NTOTAL = 68 and dummy work functions). Please debug your code further to make sure that there are no bugs in the code (I would recommend looking into the work funcs & the "command").

(PS: Also make sure you use the latest stable release of MPICH2)
Regards,
Jayesh

----- Original Message -----
From: "daniel shawul" <dshawul at yahoo.com>
To: mpich-discuss at mcs.anl.gov
Sent: Tuesday, February 7, 2012 9:18:38 AM
Subject: [mpich-discuss] mpich2 error




Hello , 
I am trying to schedule tasks in a batch file using a small MPI c program as a scheduler. 
Processor 0 is the scheduler, sends jobs to others, checks when a work is finished and sends 
the idle processor to work again. Other than that it doesn't do real work. 
Using mpich2 the program works but I sometimes get the below error when the job takes a long time to finish. 
It tells me it could be something related to timeout. The error is shown below. Thank you for any suggestions 


[quote] 


E:\Alltests\solver\Projects\Release>mpiexec -n 2 test commands.bat 68 
Process [Process [Worker 1 started problem 0 
0/2] on cee-3624-ab52 : pid 118980 
1/2] on cee-3624-ab52 : pid 120092 
mytest\controls.txt 
mytest\controlsp.txt 
10 File(s) copied 

1 file(s) copied. 
[01:97888]..ERROR:Error while connecting to host, No connection could be made because the target machine actively refuse 
d it. (10061) 
Fatal error in MPI_Init: Other MPI error, error stack: 
MPIR_Init_thread(388): 
MPID_Init(107).......: channel initialization failed 
MPID_Init(371).......: PMI_Init returned -1 
[/quote] 



And the code is shown below 


[code] 

int main(int argc, char* argv[] ) { 
int myid,nprocs,namelen,master; 
char processor_name[MPI_MAX_PROCESSOR_NAME]; 
MPI_Request request; 
MPI_Status status; 
int NTOTAL; 
int job; 


/*command and number of times to execute it*/ 
command = argv[1]; 
NTOTAL = atoi(argv[2]); 


/* 
* Inititalize MPI environment 
*/ 
int res = MPI_Init(&argc,&argv); 
MPI_Comm_size(MPI_COMM_WORLD,&nprocs); 
MPI_Comm_rank(MPI_COMM_WORLD,&myid); 
MPI_Get_processor_name(processor_name, &namelen); 
cerr << "Process [" << myid << "/" << nprocs<< "] on " 
<< processor_name << " : pid " << PID << endl; 
cerr.flush(); 
master = 0; 
nprocs--; 

/* 
* master 
*/ 
if(myid == master) { 
int r,sent,njobs; 
/* 
* Master sends slaves to work here 
*/ 
sent = 0; 
njobs = 0; 
while(njobs < NTOTAL && sent < nprocs) { 
sent++; 
njobs++; 
MPI_Send(&njobs,1,MPI_INT,sent,njobs,MPI_COMM_WORLD); 
} 
while(sent) { 
/* 
*Non blocking recieve to do housekeeping 
*staff in the mean time 
*/ 
int flag = 0; 
MPI_Irecv(&r,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&request); 
MPI_Test(&request, &flag, &status); 
double t1,t2; 
t1 = MPI_Wtime(); 
while (!flag) { 
SLEEP(1000); 
t2 = MPI_Wtime(); 
if(t2 - t1 >= update) { 
cout << "Progress " << njobs << "/" << NTOTAL << " completed." << endl; 
workProgress(); 
t1 = t2; 
} 
MPI_Test(&request, &flag, &status); 
} 
/*We got an idle processor now*/ 
if(njobs < NTOTAL) { 
njobs++; 
MPI_Send(&njobs,1,MPI_INT,r,njobs,MPI_COMM_WORLD); 
} else { 
MPI_Send(MPI_BOTTOM,0,MPI_INT,r,0,MPI_COMM_WORLD); 
sent--; 
} 
} 
cout << "Work finished" << endl; 
workProgress(); 
} 
/* 
* Slave processors pick up jobs here 
*/ 
else { 
while(true) { 
MPI_Recv(&job,1,MPI_INT,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
if(status.MPI_TAG == 0) { 
break; 
} else { 
work(myid,job); 
MPI_Send(&myid,1,MPI_INT,master,status.MPI_TAG,MPI_COMM_WORLD); 
} 
} 
} 

MPI_Finalize(); 

return 0; 
} 


[/code] 


_______________________________________________
mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list