[mpich-discuss] help-mpich2-unable to connect
Jayesh Krishna
jayesh at mcs.anl.gov
Mon Jan 24 11:21:19 CST 2011
Hi,
You need to have the same username (with the same password) on both the machines for launching MPI jobs (Or use a domain user on both machines). Create local users on both the machines with the same username/password and let us know if MPICH2 works for you.
Regards,
Jayesh
----- Original Message -----
From: "Sayed Zulfikar" <sayed.zulfikar at yahoo.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Monday, January 24, 2011 11:09:35 AM
Subject: Re: help-mpich2-unable to connect
sorry, i forgot to mail to mpich-discuss, my bad...
Can you ping one machine from the other (192.168.1.3 from 192.168.1.5 and 192.168.1.5 from 192.168.1.3) ?
=> yes i can, both machine can ping each other
Do you have the same username on both the machines (And from your email I am assuming that you are able to run your job locally on each machine, right ? )?
=> no, the username are different, but the password and passphrase is same
yes, i am able to run my job locally on each machine,
the error message said that "unable connect to xxxx"
i've tried on both machine, but the error message just the same..
can anybody help me with this problem?
thx
From: Jayesh Krishna <jayesh at mcs.anl.gov>
To: Sayed Zulfikar <sayed.zulfikar at yahoo.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Mon, January 17, 2011 9:57:12 PM
Subject: Re: help-mpich2-unable to connect
Hi,
Can you ping one machine from the other (192.168.1.3 from 192.168.1.5 and 192.168.1.5 from 192.168.1.3) ?
Do you have the same username on both the machines (And from your email I am assuming that you are able to run your job locally on each machine, right ? )?
(PS: Please copy your response to mpich-discuss, Apart from me other devs and users can also pitch in with their comments/solns)
Regards,
Jayesh
----- Original Message -----
From: "Sayed Zulfikar" < sayed.zulfikar at yahoo.com >
To: "jayesh MPICH2 master" < jayesh at mcs.anl.gov >
Sent: Saturday, January 15, 2011 4:24:51 PM
Subject: help-mpich2-unable to connect
dear jayesh,
i'm sorry for mailing you, but from what i found in internet, you are the guy that often help people with MPICH2,
e.g in this link
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-February/004657.html
you help the guy,
i have the same problem, when i try to running MPICH2 with 2 computers connected by LAN cable, it is said
"abort : unable to connect to 192.168.1.3"
note : i made the ip static, my computer is 192.168.1.5 and the other one is 192.168.1.3
i test running the cpi.exe and some other parallel files in my single computer that has procesor core 2 duo, and it run correctly
i use mpich2 1.3
both machine is using windows 7 profesional
seem like i have done what you have suggest in that link,
i turn off the firewall
i made sure both machine run the same smpd version
i made sure both machine installed MPICH2 correctly, (run in admin priviledge, for everyone and default passphrase)
i use "mpiexec -register -user 1" and then i didn't enter the username, i just simply press "enter" button and then i enter the password "behappy"
i validated that user then i run mpiexec -hosts2 xxxxxxx xxxxxx - user 1, which didn't work
i made both machine has same logon password, "behappy"
thank you,
this is the verbose
..handling executable:
C:\pp\MergeSort.exe
..Processing environment variables
..Processing drive mappings
..Creating launch nodes (2)
..\smpd_get_next_host
...\smpd_get_host_id
.../smpd_get_host_id
../smpd_get_next_host
..Adding host (192.168.1.5) to launch list
..\smpd_get_next_host
...\smpd_get_host_id
.../smpd_get_host_id
../smpd_get_next_host
..Adding host (192.168.1.3) to launch list
..\smpd_create_cliques
...\prev_launch_node
.../prev_launch_node
...\prev_launch_node
.../prev_launch_node
...\prev_launch_node
.../prev_launch_node
...\prev_launch_node
.../prev_launch_node
../smpd_create_cliques
..\smpd_fix_up_host_tree
../smpd_fix_up_host_tree
./mp_parse_command_args
.host tree:
. host: 192.168.1.5, parent: 0, id: 1
. host: 192.168.1.3, parent: 1, id: 2
.launch nodes:
. iproc: 1, id: 2, exe: C:\pp\MergeSort.exe
. iproc: 0, id: 1, exe: C:\pp\MergeSort.exe
.\SMPDU_Sock_create_set
..\smpd_get_smpd_data
...\smpd_get_smpd_data_from_environment
.../smpd_get_smpd_data_from_environment
../smpd_get_smpd_data
..\smpd_create_context
...\smpd_init_context
....\smpd_init_command
..../smpd_init_command
.../smpd_init_context
../smpd_create_context
..\SMPDU_Sock_post_connect
../SMPDU_Sock_post_connect
..\SMPDU_Sock_set_user_ptr
../SMPDU_Sock_set_user_ptr
..\smpd_make_socket_loop
...\smpd_get_hostname
.../smpd_get_hostname
../smpd_make_socket_loop
..\SMPDU_Sock_native_to_sock
../SMPDU_Sock_native_to_sock
..\SMPDU_Sock_native_to_sock
../SMPDU_Sock_native_to_sock
..\smpd_create_context
...\smpd_init_context
....\smpd_init_command
..../smpd_init_command
....\SMPDU_Sock_set_user_ptr
..../SMPDU_Sock_set_user_ptr
.../smpd_init_context
../smpd_create_context
..\SMPDU_Sock_post_read
...\SMPDU_Sock_post_readv .../SMPDU_Sock_post_readv
../SMPDU_Sock_post_read
..\smpd_enter_at_state
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_CONNECT event.error = 0, result = 0, context=left
...\smpd_handle_op_connect
....connect succeeded, posting read of the challenge string
....\SMPDU_Sock_post_read
.....\SMPDU_Sock_post_readv
...../SMPDU_Sock_post_readv
..../SMPDU_Sock_post_read
.../smpd_handle_op_connect
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_READ event.error = 0, result = 0, context=left
...\smpd_handle_op_read
....\smpd_state_reading_challenge_string
.....read challenge string: '1.3 28253'
.....\smpd_verify_version
...../smpd_verify_version
.....Verification of smpd version succeeded
.....\smpd_hash
...../smpd_hash
.....\SMPDU_Sock_post_write
......\SMPDU_Sock_post_writev
....../SMPDU_Sock_post_writev
...../SMPDU_Sock_post_write
..../smpd_state_reading_challenge_string
.../smpd_handle_op_read
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_WRITE event.error = 0, result = 0, context=left
...\smpd_handle_op_write
....\smpd_state_writing_challenge_response
.....wrote challenge response: 'ac829821dd2160834c62236a36b026c8'
.....\SMPDU_Sock_post_read
......\SMPDU_Sock_post_readv
....../SMPDU_Sock_post_readv
...../SMPDU_Sock_post_read
..../smpd_state_writing_challenge_response
.../smpd_handle_op_write
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_READ event.error = 0, result = 0, context=left
...\smpd_handle_op_read
....\smpd_state_reading_connect_result
.....read connect result: 'SUCCESS'
.....\SMPDU_Sock_post_write
......\SMPDU_Sock_post_writev
....../SMPDU_Sock_post_writev
...../SMPDU_Sock_post_write
..../smpd_state_reading_connect_result
.../smpd_handle_op_read
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_WRITE event.error = 0, result = 0, context=left
...\smpd_handle_op_write
....\smpd_state_writing_process_session_request
.....wrote process session request: 'process'
.....\SMPDU_Sock_post_read
......\SMPDU_Sock_post_readv
....../SMPDU_Sock_post_readv
...../SMPDU_Sock_post_read
..../smpd_state_writing_process_session_request
.../smpd_handle_op_write
...sock_waiting for the next event.
...\SMPDU_Sock_wait
.../SMPDU_Sock_wait
...SOCK_OP_READ event.error = 0, result = 0, context=left
...\smpd_handle_op_read
....\smpd_state_reading_cred_request
.....read cred request: 'credentials'
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
. ....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
......\smpd_option_on
.......\smpd_get_smpd_data
........\smpd_get_smpd_data_from_environment
......../smpd_get_smpd_data_from_environment
........\smpd_get_smpd_data_default
......../smpd_get_smpd_data_default
........Unable to get the data for the key 'nocache'
......./smpd_get_smpd_data
....../smpd_option_on
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
.....\SMPDU_Sock_post_write
......\SMPDU_Sock_post_writev
....../SMPDU_Sock_post_writev
...../SMPDU_Sock_post_write
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_state_writing_cred_ack_yes
......wrote cred request yes ack.
......\SMPDU_Sock_post_write
.......\SMPDU_Sock_post_writev
......./SMPDU_Sock_post_writev
....../SMPDU_Sock_post_write
...../smpd_state_writing_cred_ack_yes
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_state_writing_account
......wrote account: 'morrow-PC\morrow'
......\smpd_encrypt_data
....../smpd_encrypt_data
......\SMPDU_Sock_post_write
.......\SMPDU_Sock_post_writev
......./SMPDU_Sock_post_writev
....../SMPDU_Sock_post_write
...../smpd_state_writing_account
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
.......\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
......\smpd_hide_string_arg
.......\first_token
......./first_token
.......\compare_token
......./compare_token
.......\next_token
........\first_token
......../first_token
........\first_token
......../first_token
......./next_token
....../smpd_hide_string_arg
....../smpd_hide_string_arg
......\SMPDU_Sock_post_read
.......\SMPDU_Sock_post_readv
......./SMPDU_Sock_post_readv
....../SMPDU_Sock_post_read
.....\smpd_hide_string_arg
......\first_token
....../first_token
......\compare_token
....../compare_token
......\next_token
..... ..\first_token
......./first_token
.......\first_token
......./first_token
....../next_token
...../smpd_hide_string_arg
...../smpd_hide_string_arg
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_process_result
......read process session result: 'SUCCESS'
......\SMPDU_Sock_post_read
.......\SMPDU_Sock_post_readv
......./SMPDU_Sock_post_readv
....../SMPDU_Sock_post_read
...../smpd_state_reading_process_result
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_reconnect_request
......read re-connect request: '49432'
......closing the old socket in the left context.
......\SMPDU_Sock_get_sock_id
....../SMPDU_Sock_get_sock_id
......SMPDU_Sock_post_close(464)
......\SMPDU_Sock_post_close
.......\SMPDU_Sock_post_read
........\SMPDU_Sock_post_readv
......../SMPDU_Sock_post_readv
......./SMPDU_Sock_post_read
....../SMPDU_Sock_post_close
......connecting a new socket.
......\smpd_create_context
.......\smpd_init_context
........\smpd_init_command
......../smpd_init_command
......./smpd_init_context
....../smpd_create_context
......posting a re-connect to 192.168.1.5:49432 in left context.
......\SMPDU_Sock_post_connect
....../SMPDU_Sock_post_connect
...../smpd_state_reading_reconnect_request
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_CLOSE event.error = 0, result = 0, context=left
....\smpd_handle_op_close
.....\smpd_get_state_string
...../smpd_get_state_string
.....op_close received - SMPD_CLOSING state.
.....Unaffiliated left context closing.
.....\smpd_free_context
......freeing left context.
......\smpd_init_context
.......\smpd_init_command
......./smpd_init_command
....../smpd_init_context
...../smpd_free_context
..../smpd_handle_op_close
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_CONNECT event.error = 0, result = 0, context=left
....\smpd_handle_op_connect
.....\smpd_generate_session_header
......session header: (id=1 parent=0 level=0)
...../smpd_generate_session_header
.....\SMPDU_Sock_post_write
......\SMPDU_Sock_post_writev
....../SMPDU_Sock_post_writev
...../SMPDU_Sock_post_write
..../smpd_handle_op_connect
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_state_writing_session_header
......wrote session header: 'id=1 parent=0 level=0'
......\smpd_post_read_command
.......\SMPDU_Sock_get_sock_id
......./SMPDU_Sock_get_sock_id
.......posting a read for a command header on the left context, sock 528
.......\SMPDU_Sock_post_read
........\SMPDU_Sock_post_readv
......../SMPDU_Sock_post_readv
......./SMPDU_Sock_post_read
....../smpd_post_read_command
......creating connect command for left node
......creating connect command to '192.168.1.3'
......\smpd_create_command
.......\smpd_init_command
......./smpd_init_command
....../smpd_create_command
......\smpd_add_command_arg
....../smpd_add_command_arg
......\smpd_add_command_int_arg
....../smpd_add_command_int_arg
......\smpd_post_write_command
.......\smpd_package_command
......./smpd_package_command
.......\SMPDU_Sock_get_sock_id
......./SMPDU_Sock_get_sock_id
.......smpd_post_write_command on the left context sock 528: 67 bytes for command: "cmd=connect src=0 dest=1 tag=0 host=192.168.1.3 id=2 "
.......\SMPDU_Sock_post_writev
......./SMPDU_Sock_post_writev
....../smpd_post_write_command
......not connected yet: 192.168.1.3 not connected
...../smpd_state_writing_session_header
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_state_writing_cmd
......wrote command
......command written to left: "cmd=connect src=0 dest=1 tag=0 host=192.168.1.3 id=2 "
......moving 'connect' command to the wait_list.
...../smpd_state_writing_cmd
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_cmd_header
......read command header
......command header read, posting read for data: 71 bytes
......\SMPDU_Sock_post_read
.......\SMPDU_Sock_post_readv
......./SMPDU_Sock_post_readv
....../SMPDU_Sock_post_read
...../smpd_state_reading_cmd_header
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_cmd
......read command
......\smpd_parse_command
....../smpd_parse_command
......read command: "cmd=abort src=1 dest=0 tag=0 error="Unable to connect to 192.168.1.3" "
......\smpd_handle_command
.......handling command:
....... src = 1
....... dest = 0
....... cmd = abort
....... tag = 0
....... ctx = left
....... len = 71
....... str = cmd=abort src=1 dest=0 tag=0 error="Unable to connect to 192.168.1.3"
.......\smpd_command_destination
........0 -> 0 : returning NULL context
......./smpd_command_destination
.......\smpd_handle_abort_command
........abort: Unable to connect to 192.168.1.3
......./smpd_handle_abort_command
....../smpd_handle_command
......\smpd_post_read_command
.......\SMPDU_Sock_get_sock_id
......./SMPDU_Sock_get_sock_id
.......posting a read for a command header on the left context, sock 528
.......\SMPDU_Sock_post_read
........\SMPDU_Sock_post_readv
......../SMPDU_Sock_post_readv
......./SMPDU_Sock_post_read
....../smpd_post_read_command
......\smpd_create_command
.......\smpd_init_command
......./smpd_init_command
....../smpd_create_command
......\smpd_post_write_command
.......\smpd_package_command
......./smpd_package_command
.......\SMPDU_Sock_get_sock_id
......./SMPDU_Sock_get_sock_id
.......smpd_post_write_command on the left context sock 528: 43 bytes for command: "cmd=close src=0 dest=1 tag=1 "
.......\SMPDU_Sock_post_writev
......./SMPDU_Sock_post_writev
....../smpd_post_write_command
...../smpd_state_reading_cmd
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_WRITE event.error = 0, result = 0, context=left
....\smpd_handle_op_write
.....\smpd_state_writing_cmd
......wrote command
......command written to left: "cmd=close src=0 dest=1 tag=1 "
......\smpd_free_command
.......\smpd_init_command
......./smpd_init_command
....../smpd_free_command
...../smpd_state_writing_cmd
..../smpd_handle_op_write
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_cmd_header
......read command header
......command header read, posting read for data: 31 bytes
......\SMPDU_Sock_post_read
.......\SMPDU_Sock_post_readv
......./SMPDU_Sock_post_readv
....../SMPDU_Sock_post_read
...../smpd_state_reading_cmd_header
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_READ event.error = 0, result = 0, context=left
....\smpd_handle_op_read
.....\smpd_state_reading_cmd
......read command
......\smpd_parse_command
....../smpd_parse_command
......read command: "cmd=closed src=1 dest=0 tag=1 "
......\smpd_handle_command
.......handling command:
....... src = 1
....... dest = 0
....... cmd = closed
....... tag = 1
....... ctx = left
....... len = 31
....... str = cmd=closed src=1 dest=0 tag=1
.......\smpd_command_destination
........0 -> 0 : returning NULL context
......./smpd_command_destination
.......\smpd_handle_closed_command
........closed command received from left child, closing sock.
........\SMPDU_Sock_get_sock_id
......../SMPDU_Sock_get_sock_id
........SMPDU_Sock_post_close(528)
........\SMPDU_Sock_post_close
.........\SMPDU_Sock_post_read
..........\SMPDU_Sock_post_readv
........../SMPDU_Sock_post_readv
........./SMPDU_Sock_post_read
......../SMPDU_Sock_post_close
........received a closed at node with no parent context, assuming root, returning SMPD_EXITING.
......./smpd_handle_closed_command
....../smpd_handle_command
......not posting read for another command because SMPD_EXITING returned
...../smpd_state_reading_cmd
..../smpd_handle_op_read
....sock_waiting for the next event.
....\SMPDU_Sock_wait
..../SMPDU_Sock_wait
....SOCK_OP_CLOSE event.error = 0, result = 0, context=left
....\smpd_handle_op_close
.....\smpd_get_state_string
...../smpd_get_state_string
.....op_close received - SMPD_EXITING state.
.....\smpd_free_context
......freeing left context.
......\smpd_init_context
.......\smpd_init_command
......./smpd_init_command
....../smpd_init_context
...../smpd_free_context
..../smpd_handle_op_close
.../smpd_enter_at_state
...calling SMPDU_Sock_finalize
...\SMPDU_Sock_finalize
.../SMPDU_Sock_finalize
../main
..\smpd_exit
...\smpd_kill_all_processes
.../smpd_kill_all_processes
...\smpd_finalize_drive_maps
.../smpd_finalize_drive_maps
More information about the mpich-discuss
mailing list