<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>RE: [mpich-discuss] unable to connect ?</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hi,<BR>
<BR>
>>.. I launch mpiexec.exe from an another windows user acount...<BR>
<BR>
This could be your problem. You can try registering a username/password available on both the machines using the "-user" option (mpiexec -register -user 1) & launch your job using that user (mpiexec -n 2 -user 1 -hosts 2 10.0.0.10 10.0.0.13 hostname). You can also validate if the user credentials are capable of launching a job using the "-validate" option of mpiexec (mpiexec -validate -user 1 10.0.0.10 ; mpiexec -validate -user 1 10.0.0.13)<BR>
<BR>
(PS: Did you copy-paste the complete output of the mpiexec command & the command itself ? Please don't remove any part of the output. This will help us in debugging your problem.)<BR>
<BR>
Regards,<BR>
Jayesh<BR>
<BR>
-----Original Message-----<BR>
From: kiss attila [<A HREF="mailto:kissattila2008@gmail.com">mailto:kissattila2008@gmail.com</A>]<BR>
Sent: Thursday, February 26, 2009 12:26 AM<BR>
To: Jayesh Krishna<BR>
Subject: Re: [mpich-discuss] unable to connect ?<BR>
<BR>
1. Yes, the ping works fine. With wmpiconfig.exe i can see both machines.<BR>
2. MPICH2 1.0.8 installed on both.<BR>
3. No firewalls of any kind.<BR>
4. On smpd -status i get:<BR>
smpd running on 10.0.0.10<BR>
smpd running on 10.0.0.13<BR>
<BR>
5. from 10.0.0.10<BR>
C:\Program Files\MPICH2\bin>mpiexec -hosts 2 10.0.0.10 10.0.0.13 hostname<BR>
abort: unable to connect to 10.0.0.13<BR>
<BR>
from 10.0.0.13<BR>
C:\Program Files\MPICH2\bin>mpiexec -hosts 2 10.0.0.10 10.0.0.13 hostname<BR>
abort: unable to connect to 10.0.0.10<BR>
<BR>
and here is the -verbose mode:<BR>
<BR>
...../first_token<BR>
.....\compare_token<BR>
...../compare_token<BR>
.....\next_token<BR>
......\first_token<BR>
....../first_token<BR>
......\first_token<BR>
....../first_token<BR>
...../next_token<BR>
..../smpd_hide_string_arg<BR>
..../smpd_hide_string_arg<BR>
.....\smpd_option_on<BR>
......\smpd_get_smpd_data<BR>
.......\smpd_get_smpd_data_from_environment<BR>
......./smpd_get_smpd_data_from_environment<BR>
.......\smpd_get_smpd_data_default<BR>
......./smpd_get_smpd_data_default<BR>
.......Unable to get the data for the key 'nocache'<BR>
....../smpd_get_smpd_data<BR>
...../smpd_option_on<BR>
....\smpd_hide_string_arg<BR>
.....\first_token<BR>
...../first_token<BR>
.....\compare_token<BR>
...../compare_token<BR>
.....\next_token<BR>
......\first_token<BR>
....../first_token<BR>
......\first_token<BR>
....../first_token<BR>
...../next_token<BR>
..../smpd_hide_string_arg<BR>
..../smpd_hide_string_arg<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_state_writing_cred_ack_yes<BR>
.....wrote cred request yes ack.<BR>
..../smpd_state_writing_cred_ack_yes<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_state_writing_account<BR>
.....wrote account: 'mpiuser'<BR>
.....\smpd_encrypt_data<BR>
...../smpd_encrypt_data<BR>
..../smpd_state_writing_account<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_hide_string_arg<BR>
.....\first_token<BR>
...../first_token<BR>
.....\compare_token<BR>
...../compare_token<BR>
.....\next_token<BR>
......\first_token<BR>
....../first_token<BR>
......\first_token<BR>
....../first_token<BR>
...../next_token<BR>
..../smpd_hide_string_arg<BR>
..../smpd_hide_string_arg<BR>
.....\smpd_hide_string_arg<BR>
......\first_token<BR>
....../first_token<BR>
......\compare_token<BR>
....../compare_token<BR>
......\next_token<BR>
.......\first_token<BR>
......./first_token<BR>
.......\first_token<BR>
......./first_token<BR>
....../next_token<BR>
...../smpd_hide_string_arg<BR>
...../smpd_hide_string_arg<BR>
....\smpd_hide_string_arg<BR>
.....\first_token<BR>
...../first_token<BR>
.....\compare_token<BR>
...../compare_token<BR>
.....\next_token<BR>
......\first_token<BR>
....../first_token<BR>
......\first_token<BR>
....../first_token<BR>
...../next_token<BR>
..../smpd_hide_string_arg<BR>
..../smpd_hide_string_arg<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_process_result<BR>
.....read process session result: 'SUCCESS'<BR>
..../smpd_state_reading_process_result<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_reconnect_request<BR>
.....read re-connect request: '3972'<BR>
.....closing the old socket in the left context.<BR>
.....MPIDU_Sock_post_close(1720)<BR>
.....connecting a new socket.<BR>
.....\smpd_create_context<BR>
......\smpd_init_context<BR>
.......\smpd_init_command<BR>
......./smpd_init_command<BR>
....../smpd_init_context<BR>
...../smpd_create_context<BR>
.....posting a re-connect to 10.0.0.10:3972 in left context.<BR>
..../smpd_state_reading_reconnect_request<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_CLOSE<BR>
...\smpd_handle_op_close<BR>
....\smpd_get_state_string<BR>
..../smpd_get_state_string<BR>
....op_close received - SMPD_CLOSING state.<BR>
....Unaffiliated left context closing.<BR>
....\smpd_free_context<BR>
.....freeing left context.<BR>
.....\smpd_init_context<BR>
......\smpd_init_command<BR>
....../smpd_init_command<BR>
...../smpd_init_context<BR>
..../smpd_free_context<BR>
.../smpd_handle_op_close<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_CONNECT<BR>
...\smpd_handle_op_connect<BR>
....\smpd_generate_session_header<BR>
.....session header: (id=1 parent=0 level=0) ..../smpd_generate_session_header .../smpd_handle_op_connect ...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_state_writing_session_header<BR>
.....wrote session header: 'id=1 parent=0 level=0'<BR>
.....\smpd_post_read_command<BR>
......posting a read for a command header on the left context, sock 1656 ...../smpd_post_read_command .....creating connect command for left node .....creating connect command to '10.0.0.13'<BR>
.....\smpd_create_command<BR>
......\smpd_init_command<BR>
....../smpd_init_command<BR>
...../smpd_create_command<BR>
.....\smpd_add_command_arg<BR>
...../smpd_add_command_arg<BR>
.....\smpd_add_command_int_arg<BR>
...../smpd_add_command_int_arg<BR>
.....\smpd_post_write_command<BR>
......\smpd_package_command<BR>
....../smpd_package_command<BR>
......smpd_post_write_command on the left context sock 1656: 65 bytes for command: "cmd=connect src=0 dest=1 tag=0 host=10.0.0.13 id=2 "<BR>
...../smpd_post_write_command<BR>
.....not connected yet: 10.0.0.13 not connected ..../smpd_state_writing_session_header<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_state_writing_cmd<BR>
.....wrote command<BR>
.....command written to left: "cmd=connect src=0 dest=1 tag=0<BR>
host=10.0.0.13 id=2 "<BR>
.....moving 'connect' command to the wait_list.<BR>
..../smpd_state_writing_cmd<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_cmd_header<BR>
.....read command header<BR>
.....command header read, posting read for data: 69 bytes ..../smpd_state_reading_cmd_header<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_cmd<BR>
.....read command<BR>
.....\smpd_parse_command<BR>
...../smpd_parse_command<BR>
.....read command: "cmd=abort src=1 dest=0 tag=0 error="unable to connect to 10.0.0.13" "<BR>
.....\smpd_handle_command<BR>
......handling command:<BR>
...... src = 1<BR>
...... dest = 0<BR>
...... cmd = abort<BR>
...... tag = 0<BR>
...... ctx = left<BR>
...... len = 69<BR>
...... str = cmd=abort src=1 dest=0 tag=0 error="unable to connect to 10.0.0.13"<BR>
......\smpd_command_destination<BR>
.......0 -> 0 : returning NULL context<BR>
....../smpd_command_destination<BR>
......\smpd_handle_abort_command<BR>
.......abort: unable to connect to 10.0.0.13 ....../smpd_handle_abort_command ...../smpd_handle_command .....\smpd_post_read_command ......posting a read for a command header on the left context, sock 1656 ...../smpd_post_read_command .....\smpd_create_command ......\smpd_init_command ....../smpd_init_command ...../smpd_create_command .....\smpd_post_write_command ......\smpd_package_command ....../smpd_package_command ......smpd_post_write_command on the left context sock 1656: 43 bytes for command: "cmd=close src=0 dest=1 tag=1 "<BR>
...../smpd_post_write_command<BR>
..../smpd_state_reading_cmd<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_cmd_header<BR>
.....read command header<BR>
.....command header read, posting read for data: 31 bytes ..../smpd_state_reading_cmd_header<BR>
.../smpd_handle_op_read<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_WRITE<BR>
...\smpd_handle_op_write<BR>
....\smpd_state_writing_cmd<BR>
.....wrote command<BR>
.....command written to left: "cmd=close src=0 dest=1 tag=1 "<BR>
.....\smpd_free_command<BR>
......\smpd_init_command<BR>
....../smpd_init_command<BR>
...../smpd_free_command<BR>
..../smpd_state_writing_cmd<BR>
.../smpd_handle_op_write<BR>
...sock_waiting for the next event.<BR>
...SOCK_OP_READ<BR>
...\smpd_handle_op_read<BR>
....\smpd_state_reading_cmd<BR>
.....read command<BR>
.....\smpd_parse_command<BR>
...../smpd_parse_command<BR>
.....read command: "cmd=closed src=1 dest=0 tag=1 "<BR>
.....\smpd_handle_command<BR>
......handling command:<BR>
...... src = 1<BR>
...... dest = 0<BR>
...... cmd = closed<BR>
...... tag = 1<BR>
...... ctx = left<BR>
...... len = 31<BR>
...... str = cmd=closed src=1 dest=0 tag=1 ......\smpd_command_destination .......0 -> 0 : returning NULL context ....../smpd_command_destination ......\smpd_handle_closed_command .......closed command received from left child, closing sock.<BR>
.......MPIDU_Sock_post_close(1656)<BR>
.......received a closed at node with no parent context, assuming root, returning SMPD_EXITING.<BR>
....../smpd_handle_closed_command<BR>
...../smpd_handle_command<BR>
.....not posting read for another command because SMPD_EXITING returned ..../smpd_state_reading_cmd .../smpd_handle_op_read ...sock_waiting for the next event.<BR>
...SOCK_OP_CLOSE<BR>
...\smpd_handle_op_close<BR>
....\smpd_get_state_string<BR>
..../smpd_get_state_string<BR>
....op_close received - SMPD_EXITING state.<BR>
....\smpd_free_context<BR>
.....freeing left context.<BR>
.....\smpd_init_context<BR>
......\smpd_init_command<BR>
....../smpd_init_command<BR>
...../smpd_init_context<BR>
..../smpd_free_context<BR>
.../smpd_handle_op_close<BR>
../smpd_enter_at_state<BR>
./main<BR>
.\smpd_exit<BR>
..\smpd_kill_all_processes<BR>
../smpd_kill_all_processes<BR>
..\smpd_finalize_drive_maps<BR>
../smpd_finalize_drive_maps<BR>
..\smpd_dbs_finalize<BR>
../smpd_dbs_finalize<BR>
<BR>
I have registered with wmpiregister.exe the same user with the same password on both computers but I launch mpiexec.exe from an another windows user acount; could this be a problem?. Thanks<BR>
<BR>
regards<BR>
k.a.albert<BR>
<BR>
<BR>
<BR>
<BR>
2009/2/25 Jayesh Krishna <jayesh@mcs.anl.gov>:<BR>
> Hi,<BR>
><BR>
> # Can you ping the machines from each other ?<BR>
> # Make sure that you have the same version of MPICH2 installed on both<BR>
> the machines.<BR>
> # Do you have any firewalls (windows, third-party) running on the<BR>
> machines (Turn off any firewalls running on the machines)?<BR>
> # Make sure that you have the MPICH2 process manager, smpd.exe,<BR>
> running as a service on both the machines (To check the status of the<BR>
> process manager type, smpd -status, at the command prompt).<BR>
> # Before trying to execute an MPI program like cpi.exe, try executing<BR>
> a non-MPI program like hostname on the machines (mpiexec -hosts 2<BR>
> 10.0.0.10<BR>
> 10.0.0.13 hostname).<BR>
><BR>
> Let us know the results.<BR>
><BR>
> (PS: In your reply please copy-paste the commands and the output)<BR>
> Regards, Jayesh<BR>
><BR>
><BR>
><BR>
> -----Original Message-----<BR>
> From: mpich-discuss-bounces@mcs.anl.gov<BR>
> [<A HREF="mailto:mpich-discuss-bounces@mcs.anl.gov">mailto:mpich-discuss-bounces@mcs.anl.gov</A>] On Behalf Of kiss attila<BR>
> Sent: Wednesday, February 25, 2009 1:46 PM<BR>
> To: mpich-discuss@mcs.anl.gov<BR>
> Subject: [mpich-discuss] unable to connect ?<BR>
><BR>
> Hi<BR>
><BR>
> I have two WinXp machines (10.0.0.13,10.0.0.10) with mpich2<BR>
> installed, and on this command:<BR>
> "D:\Program Files\MPICH2\bin\mpiexec.exe" -hosts 2 10.0.0.10 10.0.0.13<BR>
> -noprompt c:\ex\cpi.exe<BR>
><BR>
> I get:<BR>
><BR>
> Aborting: unable to connect to 10.0.0.10<BR>
><BR>
> Somehow I can't start any process on the remote machine(10.0.0.10). It<BR>
> annoys me, that a few days ago it worked, but I had to reinstall one<BR>
> of them, and since then i couldn't figure it out what's wrong with my<BR>
> settings. thanks.<BR>
><BR>
> regards<BR>
> K.A. Albert<BR>
><BR>
</FONT>
</P>
</BODY>
</HTML>