[mpich-discuss] mpich2 error
daniel shawul
dshawul at yahoo.com
Wed Feb 8 13:14:41 CST 2012
Dear Jayesh
I just realized that what I want to do was an array job submissions with qsub -t 1-68, which
sends a different task id when executing the script. I didn't really need to write code.
I will let you know if I have problem with mpich2.
thanks again
________________________________
From: Jayesh Krishna <jayesh at mcs.anl.gov>
To: daniel shawul <dshawul at yahoo.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Wednesday, February 8, 2012 11:35 AM
Subject: Re: [mpich-discuss] mpich2 error
Hi,
FWIW, I ran your test code (68 jobs) such that each job takes 20s and did not get any errors.
Let us know if you need any help.
Regards,
Jayesh
----- Original Message -----
From: "daniel shawul" <dshawul at yahoo.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>, mpich-discuss at mcs.anl.gov
Sent: Wednesday, February 8, 2012 8:40:21 AM
Subject: Re: [mpich-discuss] mpich2 error
Dear Jayesh
I have confirmed it runs correctly on a linux machine with openmpi.
So it is probably a problem with my setup of mpich2 on the windows machine.
thank you for your help
From: Jayesh Krishna <jayesh at mcs.anl.gov>
To: daniel shawul <dshawul at yahoo.com>; mpich-discuss at mcs.anl.gov
Sent: Tuesday, February 7, 2012 1:52 PM
Subject: Re: [mpich-discuss] mpich2 error
Hi,
You dummy code (I would recommend sending us a working test code along with the skeleton next time - saves us a lot of time) worked for me (with NTOTAL = 68 and dummy work functions). Please debug your code further to make sure that there are no bugs in the code (I would recommend looking into the work funcs & the "command").
(PS: Also make sure you use the latest stable release of MPICH2)
Regards,
Jayesh
----- Original Message -----
From: "daniel shawul" < dshawul at yahoo.com >
To: mpich-discuss at mcs.anl.gov
Sent: Tuesday, February 7, 2012 9:18:38 AM
Subject: [mpich-discuss] mpich2 error
Hello ,
I am trying to schedule tasks in a batch file using a small MPI c program as a scheduler.
Processor 0 is the scheduler, sends jobs to others, checks when a work is finished and sends
the idle processor to work again. Other than that it doesn't do real work.
Using mpich2 the program works but I sometimes get the below error when the job takes a long time to finish.
It tells me it could be something related to timeout. The error is shown below. Thank you for any suggestions
[quote]
E:\Alltests\solver\Projects\Release>mpiexec -n 2 test commands.bat 68
Process [Process [Worker 1 started problem 0
0/2] on cee-3624-ab52 : pid 118980
1/2] on cee-3624-ab52 : pid 120092
mytest\controls.txt
mytest\controlsp.txt
10 File(s) copied
1 file(s) copied.
[01:97888]..ERROR:Error while connecting to host, No connection could be made because the target machine actively refuse
d it. (10061)
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(388):
MPID_Init(107).......: channel initialization failed
MPID_Init(371).......: PMI_Init returned -1
[/quote]
And the code is shown below
[code]
int main(int argc, char* argv[] ) {
int myid,nprocs,namelen,master;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Request request;
MPI_Status status;
int NTOTAL;
int job;
/*command and number of times to execute it*/
command = argv[1];
NTOTAL = atoi(argv[2]);
/*
* Inititalize MPI environment
*/
int res = MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Get_processor_name(processor_name, &namelen);
cerr << "Process [" << myid << "/" << nprocs<< "] on "
<< processor_name << " : pid " << PID << endl;
cerr.flush();
master = 0;
nprocs--;
/*
* master
*/
if(myid == master) {
int r,sent,njobs;
/*
* Master sends slaves to work here
*/
sent = 0;
njobs = 0;
while(njobs < NTOTAL && sent < nprocs) {
sent++;
njobs++;
MPI_Send(&njobs,1,MPI_INT,sent,njobs,MPI_COMM_WORLD);
}
while(sent) {
/*
*Non blocking recieve to do housekeeping
*staff in the mean time
*/
int flag = 0;
MPI_Irecv(&r,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&request);
MPI_Test(&request, &flag, &status);
double t1,t2;
t1 = MPI_Wtime();
while (!flag) {
SLEEP(1000);
t2 = MPI_Wtime();
if(t2 - t1 >= update) {
cout << "Progress " << njobs << "/" << NTOTAL << " completed." << endl;
workProgress();
t1 = t2;
}
MPI_Test(&request, &flag, &status);
}
/*We got an idle processor now*/
if(njobs < NTOTAL) {
njobs++;
MPI_Send(&njobs,1,MPI_INT,r,njobs,MPI_COMM_WORLD);
} else {
MPI_Send(MPI_BOTTOM,0,MPI_INT,r,0,MPI_COMM_WORLD);
sent--;
}
}
cout << "Work finished" << endl;
workProgress();
}
/*
* Slave processors pick up jobs here
*/
else {
while(true) {
MPI_Recv(&job,1,MPI_INT,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
if(status.MPI_TAG == 0) {
break;
} else {
work(myid,job);
MPI_Send(&myid,1,MPI_INT,master,status.MPI_TAG,MPI_COMM_WORLD);
}
}
}
MPI_Finalize();
return 0;
}
[/code]
_______________________________________________
mpich-discuss mailing list mpich-discuss at mcs.anl.gov
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120208/80aa53db/attachment.htm>
More information about the mpich-discuss
mailing list