[mpich-discuss] Problem in using wmpiexec.exe in Windows XP

jayesh at mcs.anl.gov jayesh at mcs.anl.gov
Thu Jun 3 15:12:20 CDT 2010


Hi,
 I am assuming the two email ids (timothy.leblanc & lhy2008lx) are associated with the same discussion.
 The DNS resolution could be taking long. Can you ping the machines (from each other) using the ipaddresses of the machines ? Can you try running your MPI job by specifying the ipaddresses of the machines and measure the timings ?

Regards,
Jayesh
----- Original Message -----
From: "Timothy LeBlanc" <timothy.leblanc at gmail.com>
To: mpich-discuss at mcs.anl.gov
Sent: Thursday, June 3, 2010 11:48:43 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP


Hi 

Thanks for getting back to me. 

On my Solaris 10 box hostname mars I can ping all other machines. using ip address or hostnames. 

On all other machines I can ping mars (Soloris 10 box) by ip address only. 

During configuration of the Solaris I allowed my server to offer a DHCP address to this Mars. And I did not register the name with my DNS server. If you believe this to be a problem I can configure it with a static address. 

Thanks 
Tim 






On Thu, Jun 3, 2010 at 11:47 AM, < jayesh at mcs.anl.gov > wrote: 


Hi, 
Can you try pinging hosts from each other (From host1: ping host2, From host2: ping host1) and let us know the results ? 

Regards, 
Jayesh 
----- Original Message ----- 
From: "lhy stony" < lhy2008lx at gmail.com > 
To: jayesh at mcs.anl.gov 
Sent: Wednesday, June 2, 2010 8:17:58 PM GMT -06:00 US/Canada Central 
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP 



Hi, 
I have already done what you said. The mpiexec command line is: 
mpiexec.exe -l -env -channel nemesis -hosts 2 rs dgj -noprompt C:\MPI\DataTransport.exe 
In my 100Mbps network, the result is: 
[1]Start time: (min:sec:msec)=(50:16:781) 
[0]Start time: (min:sec:msec)=(44:42:765) 
[0]End time: (min:sec:msec)=(44:47:937) 
[1]Time for Receiving Data in Process1 is: 5.164633! 
[1]Transmission Speed is :9.487605 M byte/s! 
[1]End time: (min:sec:msec)=(50:21:953) 
Everything goes well. 

In my 1Gbps network, the result is: 
[1]Start time: (min:sec:msec)=(3:19:890) 
[0]Start time: (min:sec:msec)=(57:46:31) 
[0]End time: (min:sec:msec)=(57:46:640) 
[1]Time for Receiving Data in Process1 is: 0.606684! 
[1]Transmission Speed is :80.766892 M byte/s! 
[1]End time: (min:sec:msec)=(3:20:500) 

However, before my program can print 
[1]Start time: (min:sec:msec)=(3:19:890) 
[0]Start time: (min:sec:msec)=(57:46:31) 
there are still some wating time. It seems that the program stopped for a while( about 30 seconds) before really started. I have no clue about why this happens. 

regards 

stonylhy 


2010/5/24 < jayesh at mcs.anl.gov > 


Hi, 
Did you try the nemesis channel (mpiexec -n 2 -channel nemesis MYPROGRAM.exe) ? What options are you using to run your job (Copy-paste the mpiexec command line in your email)? 
I have modified your code slightly to include time measurement for the entire process. Please compile/run the code below with the "-l" option for nemesis channel (mpiexec -l -n 2 -channel nemesis -machinefile mf.txt MYPROGRAM.exe) 

============================================================================ 
#include <stdio.h> 
#include <windows.h> 
#include "mpi.h" 

void print_time(void ){ 
SYSTEMTIME stime; 

ZeroMemory(&stime, sizeof(SYSTEMTIME)); 
GetSystemTime(&stime); 
printf("\t(min:sec:msec)=(%d:%d:%d)\n", stime.wMinute, stime.wSecond, stime.wMilliseconds); fflush(stdout); 
} 

int main(int argc, char** argv) 

{ 
int myid,numprocs; 
int namelen; 
char processor_name[MPI_MAX_PROCESSOR_NAME]; 
/* Print the start time = Use "-l" option to print output based on the ranks */ 
printf("Start time :"); print_time(); 

MPI_Init(&argc,&argv); 
MPI_Comm_rank(MPI_COMM_WORLD,&myid); 
MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 
MPI_Get_processor_name(processor_name,&namelen); 

double TimeStart, TimeEnd; 
int nWidth, nHeight; 
nWidth = 7000; 
nHeight = 7000; 

MPI_Request rRequest; 
MPI_Request rRequest1; 
BYTE *Data_Send ; 
BYTE *Data_Rec; 

BOOL bIfOver = FALSE; 
if ( myid == 0 ) 
{ 

Data_Send= new BYTE[ nHeight * nWidth]; 
TimeStart = MPI_Wtime(); 

//Send 
MPI_Isend( Data_Send, nHeight * nWidth, MPI_BYTE, 
1, 1, MPI_COMM_WORLD, &rRequest1 ); 

MPI_Status status1; 
MPI_Wait( &rRequest1, &status1 ); 
TimeEnd = MPI_Wtime(); 
delete Data_Send; 

} 
else 
{ 
Data_Rec = new BYTE[ nHeight * nWidth]; 
TimeStart = MPI_Wtime(); 
MPI_Irecv( Data_Rec, nHeight * nWidth, MPI_BYTE, 
0, 1, MPI_COMM_WORLD, &rRequest); 

MPI_Status status2; 
MPI_Wait( &rRequest, &status2 ); 
TimeEnd = MPI_Wtime(); 
printf( "Time for Receiving Data in Process%d is: %f!\n", myid, TimeEnd - TimeStart ); 
printf( "Transmission Speed is :%f M byte/s!\n\n", nHeight * nWidth * sizeof(BYTE)/(TimeEnd - TimeStart) / 1000000.0 ); 

delete Data_Rec; 
} 
MPI_Finalize(); 
/* Print the start time = Use "-l" option to print output based on the ranks */ 
printf("End time :"); print_time(); 
return TRUE; 
} 
============================================================================ 

I would recommend running a benchmark program like the OSU bandwidth micro benchmark to measure the bandwidth ( http://mvapich.cse.ohio-state.edu/benchmarks/ - The benchmark does some warmup steps before measuring the bandwidth etc). 
Let us know the results. 


Regards, 
Jayesh 
----- Original Message ----- 
From: "lhy stony" < lhy2008lx at gmail.com > 



To: "Jayesh Krishna" < jayesh at mcs.anl.gov > 
Sent: Friday, May 21, 2010 2:08:36 AM GMT -06:00 US/Canada Central 
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP 



Hi, 
I changed the version of MPICH2 to 1.2.1p1 and I also tried run the program in command line, but problem still exists. I tested the "initializing time " of my 1Gbps network and I found that the time is about 30 seconds. 
To illustrate the problem, code of my simple program is given here. 
int _tmain(int argc, char** argv) 
{ 
int myid,numprocs; 
int namelen; 
char processor_name[MPI_MAX_PROCESSOR_NAME]; 
MPI_Init(&argc,&argv); 
MPI_Comm_rank(MPI_COMM_WORLD,&myid); 
MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 
MPI_Get_processor_name(processor_name,&namelen); 

double TimeStart, TimeEnd; 
int nWidth, nHeight; 
nWidth = 7000; 
nHeight = 7000; 

MPI_Request rRequest; 
MPI_Request rRequest1; 
BYTE *Data_Send ; 
BYTE *Data_Rec; 

BOOL bIfOver = FALSE; 
if ( myid == 0 ) 
{ 
TimeStart = MPI_Wtime(); 
Data_Send= new BYTE[ nHeight * nWidth]; 
//Send 
MPI_Isend( Data_Send, nHeight * nWidth, MPI_BYTE, 
1, 1, MPI_COMM_WORLD, &rRequest1 ); 

MPI_Status status1; 
MPI_Wait( &rRequest1, &status1 ); 
TimeEnd = MPI_Wtime(); 
delete Data_Send; 

} 
else 
{ 
Data_Rec = new BYTE[ nHeight * nWidth]; 
TimeStart = MPI_Wtime(); 
MPI_Irecv( Data_Rec, nHeight * nWidth, MPI_BYTE, 
0, 1, MPI_COMM_WORLD, &rRequest); 

MPI_Status status2; 
MPI_Wait( &rRequest, &status2 ); 
TimeEnd = MPI_Wtime(); 
printf( "Time for Receiving Data in Process%d is: %f!\n", myid, TimeEnd - TimeStart ); 
printf( "Transmission Speed is :%f M byte/s!\n\n", nHeight * nWidth * sizeof(BYTE)/(TimeEnd - TimeStart) / 1000000.0 ); 

delete Data_Rec; 
} 
MPI_Finalize(); 
return TRUE; 
} 

The output is: 
Time for Receiving Data in Process1 is: 1.146748! 
Transmission Speed is :42.729512 M byte/s! 

But,by my stopwatch,I recorded the time from my beginning the program to output being printed out. It's almost 32 seconds! 
Therefore, I think that the "initializing time " of my 1Gbps network is about 30 seconds. 
Why? I'm really confused. 

Besides, I have another question. 
When I increase the size of the sending data, the transmission speed decreases. 
For example, when I set nWidth = 5000; nHeight = 5000; the output is : 
Time for Receiving Data in Process1 is: 0.396994! 
Transmission Speed is :60.055994 M byte/s! 

but when I set nWidth = 20000; nHeight = 20000; 
Time for Receiving Data in Process1 is: 14.101276! 
Transmission Speed is :27.052142 M byte/s! 
Why does the transmission speed decreases? 
And I test the speed in my original 100Mbps network, the result shows that the speed keep unchanged. 

Regards 

stonylhy 


2010/5/21 Jayesh Krishna < jayesh at mcs.anl.gov > 


Hi, 
First of all, MPICH2 1.0.7 is old. You should upgrade to a newer version of MPICH2 ( http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads ). I am still confused about the timings mentioned in your email. For example, how is the measurement of 10s different from that of 2s for the 1Gbps n/w. 
Can you also try submitting your job from the command line and see if it helps (mpiexec -n 2 -machinefile mf.txt MYMPIPGM.exe). The latest stable version of MPICH2 should also have the newer nemesis channel that you might want to try out (mpiexec -n 2 -channel nemesis -machinefile mf.txt MYMPIPGM.exe). 


Regards, 
Jayesh 

----- Original Message ----- 
From: "lhy stony" < lhy2008lx at gmail.com > 



To: jayesh at mcs.anl.gov 
Sent: Thursday, May 20, 2010 10:51:37 AM GMT -06:00 US/Canada Central 
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP 


Hi, Jayesh 
The version of MPICH2 I am using is 1.0.7. I also change the number of the nodes and the other MPI programs, but problem still exists. 
In order to describe the problem clearly, an simpler experiment has been done. 
In this experiment, there are only two nodes working. One is to send an image of which size is 80M, and the other one is to receive the image. The receiving time will be recorded and printed. In my 100Mbps network, it costs nearly 10 seconds to complete the transmission and the printed time is just the same(10 seconds). In my 1Gbps network, it actually costs almost 10 seconds (maybe more) to complete the transmission but the printed time is less than 2 seconds, which is normal for a 1Gbps network. 
It seems that when the 1 Gbps network gets to work, it work normally( the printed time shows that the transmission speed is faster indeed), but it must spend some time to initialize. If so, aren't the 1Gbps network useless in MPI programs? I doubt whether I forget to set up some necessary configuration after I change the network. 

Thanks. 

Regards, 
stonylhy 

2010/5/20 < jayesh at mcs.anl.gov > 


Hi, 
Which version of MPICH2 are you using ? If I understand you correctly, are you saying that your MPI program takes the same amount of time with 100 & 1Gbps n/ws but the MPI program launch environment (MPICH2 runtime initialization etc) takes more time with the 1Gbps n/w ? 
What is the time difference that you see with the two networks ? 
Did you change the number of processes running on the individual nodes when you changed your network (MPI processes running on the same node, MPI processes running across network etc)? 
Does running other MPI programs (eg: c:\program files\MPICH2\examples\cpi.exe) take more time with the 1Gbps network ? 

Regards, 
Jayesh 



----- Original Message ----- 
From: "lhy stony" < lhy2008lx at gmail.com > 
To: "MPICH讨论" < mpich-discuss at mcs.anl.gov > 
Sent: Wednesday, May 19, 2010 8:29:46 PM GMT -06:00 US/Canada Central 
Subject: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP 



Hi, all 
I am using wmpiexec.exe to run my MPI program in windows xp. When I ran the program in a 100Mbps network, everything was OK. But after I improve my network to 1000Mbps, it seems that wmpiexec.exe cost more time to initialize, cause in the first a few seconds, the usage of CPU does not change much which is obviously abnormal according to my program. 
In my program, I use MPI_Wtime to calculate the processing time. The time that printed by my program is correct according to my program, but the actual processing time is much longer. I don't know why and how to fix it. 
Anyone can help me? 
_______________________________________________ 
mpich-discuss mailing list 

mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 



_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 



-- 
The information contained in this E-mail message is privileged, confidential, and may be protected from disclosure; please be aware that any other use, printing, copying, disclosure or dissemination 
of this communication may be subject to legal restriction or sanction. If you think that you have received this E-mail message in error, please reply to the sender and delete it from your computer. Thank you. 

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list