[mpich-discuss] Error while connecting to host
Jayesh Krishna
jayesh at mcs.anl.gov
Thu Mar 11 22:48:45 CST 2010
Hi,
You are very close. When specifying the environment for MPI processes using SMPD as the process manager you should not specify the PMI_LOCAL environment variable.
Try the following steps to start MPI processes manually,
1) Run "mpiexec -pmiserver 2" on the command prompt. The command would output the host, port and kvs information in the order.
>D:\temp>mpiexec -pmiserver 2
>000520P80812.ad.hitachi-metals.co.jp (set PMI_HOST & PMI_ROOT_HOST to this hostname)
>2612 (set PMI_PORT & PMI_ROOT_PORT to this port number)
>65B8416A-4669-436b-A342-9FD0FF3357F9 (set PMI_KVS & PMI_DOMAIN to this hostname)
2) Now open two command prompts - command_prompt_1 & command_prompt_2
3) On command_prompt_1 set the PMI environment as follows,
>set PMI_ROOT_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_ROOT_PORT=2612
>set PMI_PORT=2612
>set PMI_RANK=0
>set PMI_SIZE=2
>set PMI_KVS=65B8416A-4669-436b-A342-9FD0FF3357F9
>set PMI_DOMAIN=65B8416A-4669-436b-A342-9FD0FF3357F9
4) On command_prompt_2 set the PMI environment as follows,
>set PMI_ROOT_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_ROOT_PORT=2612
>set PMI_PORT=2612
>set PMI_RANK=1
>set PMI_SIZE=2
>set PMI_KVS=65B8416A-4669-436b-A342-9FD0FF3357F9
>set PMI_DOMAIN=65B8416A-4669-436b-A342-9FD0FF3357F9
5) Now run the MPI program (by typing the name of the executable at the command prompt) on command_prompt_1 and command_prompt_2
If the above steps don't work please try the preview release of MPICH2 (1.3a1 - there has been several bug fixes added after the 1.2.1p1 release) and see if it works.
Let us know if it works. Meanwhile, why do you want to start your MPI processes manually ?
Regards,
Jayesh
----- Original Message -----
From: "Takahiro Someji" <Takahiro_Someji at hitachi-metals.co.jp>
To: jayesh at mcs.anl.gov
Cc: mpich-discuss at mcs.anl.gov
Sent: Thursday, March 11, 2010 10:30:39 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Error while connecting to host
Hello.
Sorry, I was not able to understand your explanation completely. Since
I do not know well about the PMI, please teach me more closely.
If there is another method of launch processes manually without using
PMI, please let me know.
Well, I tried.
I operated as shown below in one command window.
>D:\temp>mpiexec -pmiserver 2
>000520P80812.ad.hitachi-metals.co.jp
>2612
>65B8416A-4669-436b-A342-9FD0FF3357F9
Next, in anther two window (Host & Sub Window), I set up PMI
environment, as shown below.
>set PMI_ROOT_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_HOST=000520p80812.ad.hitachi-metals.co.jp
>set PMI_ROOT_PORT=2612
>set PMI_PORT=2612
>set PMI_LOCAL=1
>set PMI_RANK=%1
>set PMI_SIZE=2
>set PMI_KVS=65B8416A-4669-436b-A342-9FD0FF3357F9
>set PMI_DOMAIN=65B8416A-4669-436b-A342-9FD0FF3357F9
Next , I started the program with cmd "program.exe" in Host window& Sub
window.
Then, the program was terminated for the fatal error (access violation).
(Is this a fundamental problem?)
Next, I set PMI_ROOT_PORT and PMI_PORT to 9222.
Then, the result is "ERROR:Error while connecting to host......".
What is bad?.
Regars,
Someji
(2010/03/12 1:05), jayesh at mcs.anl.gov wrote:
> Hi,
> Try also setting the PMI_DOMAIN value to the PMI_KVS value.
> If you just want to start your programs manually (launch processes manually) and still use SMPD as a PMI server I would recommend using the "-pmiserver" option of mpiexec (mpiexec -pmiserver 2 => Starts an instance of SMPD - the port/kvs/domain values are printed out - and you can launch your MPI processes with the provided PMI environment. I set PMI_SIZE, PMI_RANK, PMI_KVS, PMI_DOMAIN, PMI_PORT, PMI_ROOT_PORT, PMI_HOST, PMI_ROOT_HOST env).
> Let us know if it works for you.
> Regards,
> Jayesh
>
> (PS: When connecting to the process manager the default values of the kvs won't work.)
> ----- Original Message -----
> From: "Takahiro Someji"<Takahiro_Someji at hitachi-metals.co.jp>
> To:jayesh at mcs.anl.gov,mpich-discuss at mcs.anl.gov
> Sent: Wednesday, March 10, 2010 6:53:22 PM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] Error while connecting to host
>
> Hello.
>
> I tried as your proposal. However, the result did not change.
>
> I tried a setup of PMI_ROOT_PORT=8676 as smpd default port number in
> Host window& Sub window.
> Then, the program was terminated for the fatal error.
> Is communication with smpd blocked? Or is this a unique phenomenon by
> japanese OS?
>
> Regards,
> Someji
>
> (2010/03/11 0:08),jayesh at mcs.anl.gov wrote:
>
>> Hi,
>> Try adding PMI_HOST (with value of PMI_ROOT_HOST) and PMI_PORT (with value of PMI_ROOT_PORT) into your PMI environment and see if it works. Make sure that you start the process with rank=0 before other processes.
>> Are you trying to debug your program (Another way to debug your code would be to attach to the MPI process using a debugger)?
>>
>> Regards,
>> Jayesh
>> ----- Original Message -----
>> From: "染次 孝博"<Takahiro_Someji at hitachi-metals.co.jp>
>> To:mpich-discuss at mcs.anl.gov
>> Sent: Tuesday, March 9, 2010 11:51:40 PM GMT -06:00 US/Canada Central
>> Subject: [mpich-discuss] Error while connecting to host
>>
>>
>> Hi.
>>
>> I am developing the manual start program. (WindowsXP sp3, MPICH2-1.2.1p1, VisualStudio2008 sp1, C++)
>> Program is very simple as below.
>>
>> ************************
>> *int main(int argc, char* argv[])
>> *{
>> * int numprocs,myid,namelen,i;
>> * char processor_name[256];
>> *
>> * MPI_Init(&argc,&argv);
>> * MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
>> * MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>> * MPI_Get_processor_name(processor_name,&namelen);
>> *
>> * for (i=0 ; i<argc ; i++)
>> * {
>> * printf("arg%d:%s\n",i,argv[i]);
>> * }
>> * printf("numprocs:%d\n",numprocs);
>> * printf("myid:%d\n",myid);
>> * printf("processor name:%s\n",processor_name);
>> *
>> * MPI_Finalize();
>> * return 0;
>> *}
>> ************************
>>
>> The result of the program with "mpiexec -n 2 program.exe" is below.
>>
>> *************************
>> *arg0:D:\temp\MPITest1\Debug\mpitest1.exe
>> *numprocs:2
>> *myid:1
>> *processor name:HOSTPC
>> *arg0:D:\temp\MPITest1\Debug\mpitest1.exe
>> *numprocs:2
>> *myid:0
>> *processor name:HOSTPC
>> *************************
>>
>> Next, I tried to start of the two programs manually with one PC.
>> As shown in the manual "Debugging jobs by starting them manually" , it set up as follows.
>> In Host window:
>> set PMI_ROOT_HOST=HOSTPC
>> set PMI_RANK=0
>> set PMI_ROOT_PORT=9222
>> set PMI_ROOT_LOCAL=1
>> set PMI_SIZE=2
>> set PMI_KVS=mpich2
>> In Sub window:
>> set PMI_ROOT_HOST=HOSTPC
>> set PMI_RANK=1
>> :
>> : same as Host
>>
>> Then , I started this program with cmd "program.exe" in Host window& Sub window.
>> As a result, the following errors occurred and it did not operate well.
>>
>> [01:3156]..ERROR:Error while connecting to host,
>> Can not connected because refused target computer (10061)
>> [01:3156]..ERROR:Connect on sock (host=000520p80812.ad.hitachi-metals.co.jp, por
>> t=9222) failed, exhaused all end points
>> SMPDU_Sock_post_connect failed.
>> [0] PMI_ConnectToHost failed: unable to post a connect to 000520p80812.ad.hitach
>> i-metals.co.jp:9222, error: Undefined dynamic error code
>> uPMI_ConnectToHost returning PMI_FAIL
>> [0] PMI_Init failed.
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(394): Initialization failed
>> MPID_Init(103).......: channel initialization failed
>> MPID_Init(374).......: PMI_Init returned -1
>>
>>
>> Please let me know solution.
>>
>> -- Thank you.
>>
>> -----------------------------------------------
>> Takahiro Someji , Senior Engineer
>>
>> Hitachi Metals Ltd. Production System Lab.
>> 6010, Mikajiri
>> Kumagaya city,Saitama pref. JAPAN
>> zip: 360-0843
>>
>> phone: +81-485-31-1720
>> fax: +81-485-33-3398
>> eMail:takahiro_someji at hitachi-metals.co.jp
>> web:http://www.hitachi-metals.co.jp
>> -----------------------------------------------
More information about the mpich-discuss
mailing list