[mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Jayesh Krishna
jayesh at mcs.anl.gov
Mon Jun 7 09:58:20 CDT 2010
Hi,
Good to know that you were able to find the source of the problem.
Let us know if you need any further help.
Regards,
Jayesh
----- Original Message -----
From: "lhy stony" <lhy2008lx at gmail.com>
To: jayesh at mcs.anl.gov
Sent: Monday, June 7, 2010 2:13:53 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi,
I think I find why. I have installed VMware workstation in some working nodes of my cluster before. I didn't know that virtual network card simulated by vmware workstation would affect my 1Gbps network. Maybe it will cost some time to search other network card to really run my MPI program. Anyway, after I unstalled the VMware workstation, problem I said before was solved.
Thank you very much for your help and patience!
Regards
stonylhy
在 2010年6月6日 上午8:29,lhy stony < lhy2008lx at gmail.com > 写道:
Hi,
I have tried running my program by specifying the ipaddress of two machines, but problem still exists. The timings were the same but there were still some waiting time before the program really started. These waiting time could not be recorded by my program.
Here are the command lines that I used.
C:\Documents and Settings\Work_Station>"E:\Program Files\MPICH2\bin\mpiexec.exe" -env MPICH2_CHANNEL nemesis -hosts 2 25.20.209.160 25.20.209.26 -noprompt C:\MPI\DataTransport.exe
Start time: (min:sec:msec)=(34:35:0)
Start time: (min:sec:msec)=(30:52:593)
End time: (min:sec:msec)=(30:53:109)
Time for Receiving Data in Process1 is: 0.509694!
Transmission Speed is :96.136096 M byte/s!
End time: (min:sec:msec)=(34:35:515)
In 100Mbps network, I tried the same command line above and I found the same problem appeared. There was also waiting time in 100Mbps network which never appeared when specifying hostname of two machines. It is very strange.
I have also pinged the working PC in 1Gbps network, and I didn't find any problem.
C:\Documents and Settings\Work_Station>ping 25.20.209.26
Pinging 25.20.209.26 with 32 bytes of data:
Reply from 25.20.209.26 : bytes=32 time<1ms TTL=128
Reply from 25.20.209.26 : bytes=32 time<1ms TTL=128
Reply from 25.20.209.26 : bytes=32 time<1ms TTL=128
Reply from 25.20.209.26 : bytes=32 time<1ms TTL=128
Ping statistics for 25.20.209.26 :
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
I also tried the OSU benchmark program osu_latency.c. I didn't find any problem either, except the waiting time problem.Here are the results in 100Mbps and 1Gbps networks.
1Gbps
C:\Documents and Settings\Work_Station>"E:\Program Files\MPICH2\bin\mpiexec.exe" -hosts 2 rs dgj -noprompt C:\MPI\OSU.exe
# OSU MPI Latency Test
# Size Latency (us)
0 92.1
1 87.9
2 91.4
4 88.1
8 87.6
16 91.7
32 90.1
64 93.4
128 90.3
256 92.4
512 96.2
1024 103.9
2048 175.9
4096 171.1
8192 237.2
16384 291.9
32768 524.3
65536 941.9
131072 1603.7
262144 3351.7
524288 6514.3
1048576 12465.6
2097152 29551.8
4194304 47658.5
100 Mbps
C:\Documents and Settings\Work_Station>"E:\Program Files\MPICH2\bin\mpiexec.exe" -hosts 2 rs dgj -noprompt C:\MPI\OSU.exe
# OSU MPI Latency Test
# Size Latency (us)
0 92.6
1 87.7
2 92.1
4 87.7
8 87.5
16 91.9
32 90.8
64 93.3
128 90.1
256 98.7
512 183.0
1024 227.7
2048 303.8
4096 471.3
8192 866.4
16384 1780.2
32768 3794.3
65536 7489.1
131072 14714.1
262144 30857.6
524288 61756.8
1048576 122839.6
2097152 245868.7
4194304 491271.9
Need your help, thanks!
Regards,
stonylhy
2010/6/4 < jayesh at mcs.anl.gov >
Hi,
I am assuming the two email ids (timothy.leblanc & lhy2008lx) are associated with the same discussion.
The DNS resolution could be taking long. Can you ping the machines (from each other) using the ipaddresses of the machines ? Can you try running your MPI job by specifying the ipaddresses of the machines and measure the timings ?
Regards,
Jayesh
----- Original Message -----
From: "Timothy LeBlanc" < timothy.leblanc at gmail.com >
To: mpich-discuss at mcs.anl.gov
Sent: Thursday, June 3, 2010 11:48:43 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi
Thanks for getting back to me.
On my Solaris 10 box hostname mars I can ping all other machines. using ip address or hostnames.
On all other machines I can ping mars (Soloris 10 box) by ip address only.
During configuration of the Solaris I allowed my server to offer a DHCP address to this Mars. And I did not register the name with my DNS server. If you believe this to be a problem I can configure it with a static address.
Thanks
Tim
On Thu, Jun 3, 2010 at 11:47 AM, < jayesh at mcs.anl.gov > wrote:
Hi,
Can you try pinging hosts from each other (From host1: ping host2, From host2: ping host1) and let us know the results ?
Regards,
Jayesh
----- Original Message -----
From: "lhy stony" < lhy2008lx at gmail.com >
To: jayesh at mcs.anl.gov
Sent: Wednesday, June 2, 2010 8:17:58 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi,
I have already done what you said. The mpiexec command line is:
mpiexec.exe -l -env -channel nemesis -hosts 2 rs dgj -noprompt C:\MPI\DataTransport.exe
In my 100Mbps network, the result is:
[1]Start time: (min:sec:msec)=(50:16:781)
[0]Start time: (min:sec:msec)=(44:42:765)
[0]End time: (min:sec:msec)=(44:47:937)
[1]Time for Receiving Data in Process1 is: 5.164633!
[1]Transmission Speed is :9.487605 M byte/s!
[1]End time: (min:sec:msec)=(50:21:953)
Everything goes well.
In my 1Gbps network, the result is:
[1]Start time: (min:sec:msec)=(3:19:890)
[0]Start time: (min:sec:msec)=(57:46:31)
[0]End time: (min:sec:msec)=(57:46:640)
[1]Time for Receiving Data in Process1 is: 0.606684!
[1]Transmission Speed is :80.766892 M byte/s!
[1]End time: (min:sec:msec)=(3:20:500)
However, before my program can print
[1]Start time: (min:sec:msec)=(3:19:890)
[0]Start time: (min:sec:msec)=(57:46:31)
there are still some wating time. It seems that the program stopped for a while( about 30 seconds) before really started. I have no clue about why this happens.
regards
stonylhy
2010/5/24 < jayesh at mcs.anl.gov >
Hi,
Did you try the nemesis channel (mpiexec -n 2 -channel nemesis MYPROGRAM.exe) ? What options are you using to run your job (Copy-paste the mpiexec command line in your email)?
I have modified your code slightly to include time measurement for the entire process. Please compile/run the code below with the "-l" option for nemesis channel (mpiexec -l -n 2 -channel nemesis -machinefile mf.txt MYPROGRAM.exe)
============================================================================
#include <stdio.h>
#include <windows.h>
#include "mpi.h"
void print_time(void ){
SYSTEMTIME stime;
ZeroMemory(&stime, sizeof(SYSTEMTIME));
GetSystemTime(&stime);
printf("\t(min:sec:msec)=(%d:%d:%d)\n", stime.wMinute, stime.wSecond, stime.wMilliseconds); fflush(stdout);
}
int main(int argc, char** argv)
{
int myid,numprocs;
int namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
/* Print the start time = Use "-l" option to print output based on the ranks */
printf("Start time :"); print_time();
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Get_processor_name(processor_name,&namelen);
double TimeStart, TimeEnd;
int nWidth, nHeight;
nWidth = 7000;
nHeight = 7000;
MPI_Request rRequest;
MPI_Request rRequest1;
BYTE *Data_Send ;
BYTE *Data_Rec;
BOOL bIfOver = FALSE;
if ( myid == 0 )
{
Data_Send= new BYTE[ nHeight * nWidth];
TimeStart = MPI_Wtime();
//Send
MPI_Isend( Data_Send, nHeight * nWidth, MPI_BYTE,
1, 1, MPI_COMM_WORLD, &rRequest1 );
MPI_Status status1;
MPI_Wait( &rRequest1, &status1 );
TimeEnd = MPI_Wtime();
delete Data_Send;
}
else
{
Data_Rec = new BYTE[ nHeight * nWidth];
TimeStart = MPI_Wtime();
MPI_Irecv( Data_Rec, nHeight * nWidth, MPI_BYTE,
0, 1, MPI_COMM_WORLD, &rRequest);
MPI_Status status2;
MPI_Wait( &rRequest, &status2 );
TimeEnd = MPI_Wtime();
printf( "Time for Receiving Data in Process%d is: %f!\n", myid, TimeEnd - TimeStart );
printf( "Transmission Speed is :%f M byte/s!\n\n", nHeight * nWidth * sizeof(BYTE)/(TimeEnd - TimeStart) / 1000000.0 );
delete Data_Rec;
}
MPI_Finalize();
/* Print the start time = Use "-l" option to print output based on the ranks */
printf("End time :"); print_time();
return TRUE;
}
============================================================================
I would recommend running a benchmark program like the OSU bandwidth micro benchmark to measure the bandwidth ( http://mvapich.cse.ohio-state.edu/benchmarks/ - The benchmark does some warmup steps before measuring the bandwidth etc).
Let us know the results.
Regards,
Jayesh
----- Original Message -----
From: "lhy stony" < lhy2008lx at gmail.com >
To: "Jayesh Krishna" < jayesh at mcs.anl.gov >
Sent: Friday, May 21, 2010 2:08:36 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi,
I changed the version of MPICH2 to 1.2.1p1 and I also tried run the program in command line, but problem still exists. I tested the "initializing time " of my 1Gbps network and I found that the time is about 30 seconds.
To illustrate the problem, code of my simple program is given here.
int _tmain(int argc, char** argv)
{
int myid,numprocs;
int namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Get_processor_name(processor_name,&namelen);
double TimeStart, TimeEnd;
int nWidth, nHeight;
nWidth = 7000;
nHeight = 7000;
MPI_Request rRequest;
MPI_Request rRequest1;
BYTE *Data_Send ;
BYTE *Data_Rec;
BOOL bIfOver = FALSE;
if ( myid == 0 )
{
TimeStart = MPI_Wtime();
Data_Send= new BYTE[ nHeight * nWidth];
//Send
MPI_Isend( Data_Send, nHeight * nWidth, MPI_BYTE,
1, 1, MPI_COMM_WORLD, &rRequest1 );
MPI_Status status1;
MPI_Wait( &rRequest1, &status1 );
TimeEnd = MPI_Wtime();
delete Data_Send;
}
else
{
Data_Rec = new BYTE[ nHeight * nWidth];
TimeStart = MPI_Wtime();
MPI_Irecv( Data_Rec, nHeight * nWidth, MPI_BYTE,
0, 1, MPI_COMM_WORLD, &rRequest);
MPI_Status status2;
MPI_Wait( &rRequest, &status2 );
TimeEnd = MPI_Wtime();
printf( "Time for Receiving Data in Process%d is: %f!\n", myid, TimeEnd - TimeStart );
printf( "Transmission Speed is :%f M byte/s!\n\n", nHeight * nWidth * sizeof(BYTE)/(TimeEnd - TimeStart) / 1000000.0 );
delete Data_Rec;
}
MPI_Finalize();
return TRUE;
}
The output is:
Time for Receiving Data in Process1 is: 1.146748!
Transmission Speed is :42.729512 M byte/s!
But,by my stopwatch,I recorded the time from my beginning the program to output being printed out. It's almost 32 seconds!
Therefore, I think that the "initializing time " of my 1Gbps network is about 30 seconds.
Why? I'm really confused.
Besides, I have another question.
When I increase the size of the sending data, the transmission speed decreases.
For example, when I set nWidth = 5000; nHeight = 5000; the output is :
Time for Receiving Data in Process1 is: 0.396994!
Transmission Speed is :60.055994 M byte/s!
but when I set nWidth = 20000; nHeight = 20000;
Time for Receiving Data in Process1 is: 14.101276!
Transmission Speed is :27.052142 M byte/s!
Why does the transmission speed decreases?
And I test the speed in my original 100Mbps network, the result shows that the speed keep unchanged.
Regards
stonylhy
2010/5/21 Jayesh Krishna < jayesh at mcs.anl.gov >
Hi,
First of all, MPICH2 1.0.7 is old. You should upgrade to a newer version of MPICH2 ( http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads ). I am still confused about the timings mentioned in your email. For example, how is the measurement of 10s different from that of 2s for the 1Gbps n/w.
Can you also try submitting your job from the command line and see if it helps (mpiexec -n 2 -machinefile mf.txt MYMPIPGM.exe). The latest stable version of MPICH2 should also have the newer nemesis channel that you might want to try out (mpiexec -n 2 -channel nemesis -machinefile mf.txt MYMPIPGM.exe).
Regards,
Jayesh
----- Original Message -----
From: "lhy stony" < lhy2008lx at gmail.com >
To: jayesh at mcs.anl.gov
Sent: Thursday, May 20, 2010 10:51:37 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi, Jayesh
The version of MPICH2 I am using is 1.0.7. I also change the number of the nodes and the other MPI programs, but problem still exists.
In order to describe the problem clearly, an simpler experiment has been done.
In this experiment, there are only two nodes working. One is to send an image of which size is 80M, and the other one is to receive the image. The receiving time will be recorded and printed. In my 100Mbps network, it costs nearly 10 seconds to complete the transmission and the printed time is just the same(10 seconds). In my 1Gbps network, it actually costs almost 10 seconds (maybe more) to complete the transmission but the printed time is less than 2 seconds, which is normal for a 1Gbps network.
It seems that when the 1 Gbps network gets to work, it work normally( the printed time shows that the transmission speed is faster indeed), but it must spend some time to initialize. If so, aren't the 1Gbps network useless in MPI programs? I doubt whether I forget to set up some necessary configuration after I change the network.
Thanks.
Regards,
stonylhy
2010/5/20 < jayesh at mcs.anl.gov >
Hi,
Which version of MPICH2 are you using ? If I understand you correctly, are you saying that your MPI program takes the same amount of time with 100 & 1Gbps n/ws but the MPI program launch environment (MPICH2 runtime initialization etc) takes more time with the 1Gbps n/w ?
What is the time difference that you see with the two networks ?
Did you change the number of processes running on the individual nodes when you changed your network (MPI processes running on the same node, MPI processes running across network etc)?
Does running other MPI programs (eg: c:\program files\MPICH2\examples\cpi.exe) take more time with the 1Gbps network ?
Regards,
Jayesh
----- Original Message -----
From: "lhy stony" < lhy2008lx at gmail.com >
To: "MPICH讨论" < mpich-discuss at mcs.anl.gov >
Sent: Wednesday, May 19, 2010 8:29:46 PM GMT -06:00 US/Canada Central
Subject: [mpich-discuss] Problem in using wmpiexec.exe in Windows XP
Hi, all
I am using wmpiexec.exe to run my MPI program in windows xp. When I ran the program in a 100Mbps network, everything was OK. But after I improve my network to 1000Mbps, it seems that wmpiexec.exe cost more time to initialize, cause in the first a few seconds, the usage of CPU does not change much which is obviously abnormal according to my program.
In my program, I use MPI_Wtime to calculate the processing time. The time that printed by my program is correct according to my program, but the actual processing time is much longer. I don't know why and how to fix it.
Anyone can help me?
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
The information contained in this E-mail message is privileged, confidential, and may be protected from disclosure; please be aware that any other use, printing, copying, disclosure or dissemination
of this communication may be subject to legal restriction or sanction. If you think that you have received this E-mail message in error, please reply to the sender and delete it from your computer. Thank you.
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list