[mpich-discuss] Cannot use the main node to run a process of the programme

Waruna Ranasinghe warunapww at gmail.com
Fri Jul 25 03:05:39 CDT 2008


Hi all,

I'm using MPICH2 in Windows.
I can run my programme without errors if I don't use the machine in which I
execute the command (Main node).

mpiexec -channel ssm -n 3 -exitcodes -machinefile "c:\Program
Files\MPICH2\bin\hosts.txt" -wdir //10.8.102.27/ClusterShared GBMTest

If I use the main node also to execute one of the 3 processes, then it gives
the error below. But it prints the output I wanted too. then it gives the
error.
I wanted to know whether this is an issue with my programme(GBMTest) or I
cant use the main node to run the process.
In the machinefile I have included three machines.
10.8.102.28
10.8.102.30
10.8.102.27 (main node)

This works fine if I remove the main node and add another node instead.

this is the error.
////////////////////////////////////////////////////////////////////////////////////
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send an
eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank 2
usi
ng business card <port=1179 description=cse-365237834578 ifname=10.8.102.27shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on port
117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct associated
data
 being resolved for. (errno 11004)
job aborted:
rank: node: exit code[: error message]
0: 10.8.102.28: 1
1: 10.8.102.30: 1: Fatal error in MPI_Finalize: Other MPI error, error
stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send an
eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank 2
usi
ng business card <port=1179 description=cse-365237834578 ifname=10.8.102.27shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on port
117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct associated
data
 being resolved for. (errno 11004)
2: 10.8.102.27: 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080725/59812d33/attachment.htm>


More information about the mpich-discuss mailing list