[mpich-discuss] unable to connect ?

Jayesh Krishna jayesh at mcs.anl.gov
Thu Feb 26 09:40:53 CST 2009


Hi,

>>.. I launch mpiexec.exe from an another windows user acount...

 This could be your problem. You can try registering a username/password
available on both the machines using the "-user" option (mpiexec -register
-user 1) & launch your job using that user (mpiexec -n 2 -user 1 -hosts 2
10.0.0.10 10.0.0.13 hostname). You can also validate if the user
credentials are capable of launching a job using the "-validate" option of
mpiexec (mpiexec -validate -user 1 10.0.0.10 ; mpiexec -validate -user 1
10.0.0.13)

(PS: Did you copy-paste the complete output of the mpiexec command & the
command itself ? Please don't remove any part of the output. This will
help us in debugging your problem.)

Regards,
Jayesh

-----Original Message-----
From: kiss attila [mailto:kissattila2008 at gmail.com]
Sent: Thursday, February 26, 2009 12:26 AM
To: Jayesh Krishna
Subject: Re: [mpich-discuss] unable to connect ?

1. Yes, the ping works fine. With wmpiconfig.exe i can see both machines.
2. MPICH2 1.0.8 installed on both.
3. No firewalls of any kind.
4. On  smpd -status i get:
smpd running on 10.0.0.10
smpd running on 10.0.0.13

5. from 10.0.0.10
C:\Program Files\MPICH2\bin>mpiexec -hosts 2 10.0.0.10 10.0.0.13 hostname
abort: unable to connect to 10.0.0.13

from 10.0.0.13
C:\Program Files\MPICH2\bin>mpiexec -hosts 2 10.0.0.10 10.0.0.13 hostname
abort: unable to connect to 10.0.0.10

and here is the -verbose mode:

...../first_token
.....\compare_token
...../compare_token
.....\next_token
......\first_token
....../first_token
......\first_token
....../first_token
...../next_token
..../smpd_hide_string_arg
..../smpd_hide_string_arg
.....\smpd_option_on
......\smpd_get_smpd_data
.......\smpd_get_smpd_data_from_environment
......./smpd_get_smpd_data_from_environment
.......\smpd_get_smpd_data_default
......./smpd_get_smpd_data_default
.......Unable to get the data for the key 'nocache'
....../smpd_get_smpd_data
...../smpd_option_on
....\smpd_hide_string_arg
.....\first_token
...../first_token
.....\compare_token
...../compare_token
.....\next_token
......\first_token
....../first_token
......\first_token
....../first_token
...../next_token
..../smpd_hide_string_arg
..../smpd_hide_string_arg
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_state_writing_cred_ack_yes
.....wrote cred request yes ack.
..../smpd_state_writing_cred_ack_yes
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_state_writing_account
.....wrote account: 'mpiuser'
.....\smpd_encrypt_data
...../smpd_encrypt_data
..../smpd_state_writing_account
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_hide_string_arg
.....\first_token
...../first_token
.....\compare_token
...../compare_token
.....\next_token
......\first_token
....../first_token
......\first_token
....../first_token
...../next_token
..../smpd_hide_string_arg
..../smpd_hide_string_arg
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
....\smpd_hide_string_arg
.....\first_token
...../first_token
.....\compare_token
...../compare_token
.....\next_token
......\first_token
....../first_token
......\first_token
....../first_token
...../next_token
..../smpd_hide_string_arg
..../smpd_hide_string_arg
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_process_result
.....read process session result: 'SUCCESS'
..../smpd_state_reading_process_result
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_reconnect_request
.....read re-connect request: '3972'
.....closing the old socket in the left context.
.....MPIDU_Sock_post_close(1720)
.....connecting a new socket.
.....\smpd_create_context
......\smpd_init_context
.......\smpd_init_command
......./smpd_init_command
....../smpd_init_context
...../smpd_create_context
.....posting a re-connect to 10.0.0.10:3972 in left context.
..../smpd_state_reading_reconnect_request
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_CLOSE
...\smpd_handle_op_close
....\smpd_get_state_string
..../smpd_get_state_string
....op_close received - SMPD_CLOSING state.
....Unaffiliated left context closing.
....\smpd_free_context
.....freeing left context.
.....\smpd_init_context
......\smpd_init_command
....../smpd_init_command
...../smpd_init_context
..../smpd_free_context
.../smpd_handle_op_close
...sock_waiting for the next event.
...SOCK_OP_CONNECT
...\smpd_handle_op_connect
....\smpd_generate_session_header
.....session header: (id=1 parent=0 level=0)
..../smpd_generate_session_header .../smpd_handle_op_connect
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_state_writing_session_header
.....wrote session header: 'id=1 parent=0 level=0'
.....\smpd_post_read_command
......posting a read for a command header on the left context, sock 1656
...../smpd_post_read_command .....creating connect command for left node
.....creating connect command to '10.0.0.13'
.....\smpd_create_command
......\smpd_init_command
....../smpd_init_command
...../smpd_create_command
.....\smpd_add_command_arg
...../smpd_add_command_arg
.....\smpd_add_command_int_arg
...../smpd_add_command_int_arg
.....\smpd_post_write_command
......\smpd_package_command
....../smpd_package_command
......smpd_post_write_command on the left context sock 1656: 65 bytes for
command: "cmd=connect src=0 dest=1 tag=0 host=10.0.0.13 id=2 "
...../smpd_post_write_command
.....not connected yet: 10.0.0.13 not connected
..../smpd_state_writing_session_header
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_state_writing_cmd
.....wrote command
.....command written to left: "cmd=connect src=0 dest=1 tag=0
host=10.0.0.13 id=2 "
.....moving 'connect' command to the wait_list.
..../smpd_state_writing_cmd
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_cmd_header
.....read command header
.....command header read, posting read for data: 69 bytes
..../smpd_state_reading_cmd_header
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_cmd
.....read command
.....\smpd_parse_command
...../smpd_parse_command
.....read command: "cmd=abort src=1 dest=0 tag=0 error="unable to connect
to 10.0.0.13" "
.....\smpd_handle_command
......handling command:
...... src  = 1
...... dest = 0
...... cmd  = abort
...... tag  = 0
...... ctx  = left
...... len  = 69
...... str  = cmd=abort src=1 dest=0 tag=0 error="unable to connect to
10.0.0.13"
......\smpd_command_destination
.......0 -> 0 : returning NULL context
....../smpd_command_destination
......\smpd_handle_abort_command
.......abort: unable to connect to 10.0.0.13
....../smpd_handle_abort_command ...../smpd_handle_command
.....\smpd_post_read_command ......posting a read for a command header on
the left context, sock 1656 ...../smpd_post_read_command
.....\smpd_create_command ......\smpd_init_command
....../smpd_init_command ...../smpd_create_command
.....\smpd_post_write_command ......\smpd_package_command
....../smpd_package_command ......smpd_post_write_command on the left
context sock 1656: 43 bytes for command: "cmd=close src=0 dest=1 tag=1 "
...../smpd_post_write_command
..../smpd_state_reading_cmd
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_cmd_header
.....read command header
.....command header read, posting read for data: 31 bytes
..../smpd_state_reading_cmd_header
.../smpd_handle_op_read
...sock_waiting for the next event.
...SOCK_OP_WRITE
...\smpd_handle_op_write
....\smpd_state_writing_cmd
.....wrote command
.....command written to left: "cmd=close src=0 dest=1 tag=1 "
.....\smpd_free_command
......\smpd_init_command
....../smpd_init_command
...../smpd_free_command
..../smpd_state_writing_cmd
.../smpd_handle_op_write
...sock_waiting for the next event.
...SOCK_OP_READ
...\smpd_handle_op_read
....\smpd_state_reading_cmd
.....read command
.....\smpd_parse_command
...../smpd_parse_command
.....read command: "cmd=closed src=1 dest=0 tag=1 "
.....\smpd_handle_command
......handling command:
...... src  = 1
...... dest = 0
...... cmd  = closed
...... tag  = 1
...... ctx  = left
...... len  = 31
...... str  = cmd=closed src=1 dest=0 tag=1
......\smpd_command_destination .......0 -> 0 : returning NULL context
....../smpd_command_destination ......\smpd_handle_closed_command
.......closed command received from left child, closing sock.
.......MPIDU_Sock_post_close(1656)
.......received a closed at node with no parent context, assuming root,
returning SMPD_EXITING.
....../smpd_handle_closed_command
...../smpd_handle_command
.....not posting read for another command because SMPD_EXITING returned
..../smpd_state_reading_cmd .../smpd_handle_op_read ...sock_waiting for
the next event.
...SOCK_OP_CLOSE
...\smpd_handle_op_close
....\smpd_get_state_string
..../smpd_get_state_string
....op_close received - SMPD_EXITING state.
....\smpd_free_context
.....freeing left context.
.....\smpd_init_context
......\smpd_init_command
....../smpd_init_command
...../smpd_init_context
..../smpd_free_context
.../smpd_handle_op_close
../smpd_enter_at_state
./main
.\smpd_exit
..\smpd_kill_all_processes
../smpd_kill_all_processes
..\smpd_finalize_drive_maps
../smpd_finalize_drive_maps
..\smpd_dbs_finalize
../smpd_dbs_finalize

I have registered with wmpiregister.exe the same user with the same
password on both computers but I launch mpiexec.exe from an another
windows user acount; could this be a problem?. Thanks

regards
k.a.albert




2009/2/25 Jayesh Krishna <jayesh at mcs.anl.gov>:
>  Hi,
>
> # Can you ping the machines from each other ?
> # Make sure that you have the same version of MPICH2 installed on both
> the machines.
> # Do you have any firewalls (windows, third-party) running on the
> machines (Turn off any firewalls running on the machines)?
> # Make sure that you have the MPICH2 process manager, smpd.exe,
> running as a service on both the machines (To check the status of the
> process manager type, smpd -status, at the command prompt).
> # Before trying to execute an MPI program like cpi.exe, try executing
> a non-MPI program like hostname on the machines (mpiexec -hosts 2
> 10.0.0.10
> 10.0.0.13 hostname).
>
>  Let us know the results.
>
> (PS: In your reply please copy-paste the commands and the output)
> Regards, Jayesh
>
>
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of kiss attila
> Sent: Wednesday, February 25, 2009 1:46 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] unable to connect ?
>
> Hi
>
>   I have two WinXp machines (10.0.0.13,10.0.0.10) with mpich2
> installed, and on this command:
> "D:\Program Files\MPICH2\bin\mpiexec.exe" -hosts 2 10.0.0.10 10.0.0.13
> -noprompt c:\ex\cpi.exe
>
> I get:
>
> Aborting: unable to connect to 10.0.0.10
>
> Somehow I can't start any process on the remote machine(10.0.0.10). It
> annoys me, that a few days ago it worked, but I had to reinstall one
> of them, and since then i couldn't figure it out what's wrong with my
> settings.  thanks.
>
> regards
> K.A. Albert
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090226/6cec44f6/attachment.htm>


More information about the mpich-discuss mailing list