[mpich-discuss] Cannot use the main node to run a process of the programme

Jayesh Krishna jayesh at mcs.anl.gov
Fri Jul 25 10:03:28 CDT 2008


Hi,
 You should be able to use all the nodes (with MPICH2 installed) for
running your job (i.e., You should be able to use the main node to run
your MPI processes).
 If you are using a shared drive to run your program you should map the
drive on all the nodes using the "-map" option of mpiexec (see the windows
developer's guide, available at
http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=
docs, for details)
 
Regards,
Jayesh

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Waruna Ranasinghe
Sent: Friday, July 25, 2008 3:06 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Cannot use the main node to run a process of the
programme


Hi all,

I'm using MPICH2 in Windows.
I can run my programme without errors if I don't use the machine in which
I execute the command (Main node).

mpiexec -channel ssm -n 3 -exitcodes -machinefile "c:\Program
Files\MPICH2\bin\hosts.txt" -wdir //10.8.102.27/ClusterShared GBMTest

If I use the main node also to execute one of the 3 processes, then it
gives the error below. But it prints the output I wanted too. then it
gives the error.
I wanted to know whether this is an issue with my programme(GBMTest) or I
cant use the main node to run the process.
In the machinefile I have included three machines. 
10.8.102.28
10.8.102.30
10.8.102.27 (main node)

This works fine if I remove the main node and add another node instead.

this is the error.
//////////////////////////////////////////////////////////////////////////
//////////
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct
associated data
 being resolved for. (errno 11004)
job aborted:
rank: node: exit code[: error message]
0: 10.8.102.28: 1
1: 10.8.102.30: 1: Fatal error in MPI_Finalize: Other MPI error, error
stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct
associated data
 being resolved for. (errno 11004)
2: 10.8.102.27: 1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080725/096d1210/attachment.htm>


More information about the mpich-discuss mailing list