[mpich-discuss] Cannot use the main node to run a process of the programme
Jayesh Krishna
jayesh at mcs.anl.gov
Thu Aug 7 11:59:45 CDT 2008
Hi,
Can you run cpi.exe (provided with MPICH2 in the examples directory) on
all the nodes ?
Can you run a simple hello world program using all the nodes ?
Try running the programs with and without sharing (mapping network drive)
the executable across nodes.
/* ############ MPI Hello world ################ */
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[]){
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("[%d] Hello world\n", rank);
MPI_Finalize();
}
/* ############ MPI Hello world ################ */
Regards,
Jayesh
_____
From: warunapww at gmail.com [mailto:warunapww at gmail.com] On Behalf Of Waruna
Ranasinghe
Sent: Thursday, August 07, 2008 11:36 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Cannot use the main node to run a process of
the programme
Hi,
I tried even mapping the drive as Jayesh mentioned, but the problem is
still the same.
If I run the programme only in the master node, then it will run.
Otherwise if I use other nodes including master node to run the programme,
the programme give the output but it won't exit (mpi finalize does not
work or called)
Please help me to over come this issue.
Regards,
Waruna Ranasinghe
2008/7/25 Jayesh Krishna <jayesh at mcs.anl.gov>
Hi,
You should be able to use all the nodes (with MPICH2 installed) for
running your job (i.e., You should be able to use the main node to run
your MPI processes).
If you are using a shared drive to run your program you should map the
drive on all the nodes using the "-map" option of mpiexec (see the windows
developer's guide, available at
http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=
docs, for details)
Regards,
Jayesh
_____
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Waruna Ranasinghe
Sent: Friday, July 25, 2008 3:06 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Cannot use the main node to run a process of the
programme
Hi all,
I'm using MPICH2 in Windows.
I can run my programme without errors if I don't use the machine in which
I execute the command (Main node).
mpiexec -channel ssm -n 3 -exitcodes -machinefile "c:\Program
Files\MPICH2\bin\hosts.txt" -wdir //10.8.102.27/ClusterShared GBMTest
If I use the main node also to execute one of the 3 processes, then it
gives the error below. But it prints the output I wanted too. then it
gives the error.
I wanted to know whether this is an issue with my programme(GBMTest) or I
cant use the main node to run the process.
In the machinefile I have included three machines.
10.8.102.28
10.8.102.30
10.8.102.27 (main node)
This works fine if I remove the main node and add another node instead.
this is the error.
//////////////////////////////////////////////////////////////////////////
//////////
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
and was found in the database, but it does not have the correct
associated data
being resolved for. (errno 11004)
job aborted:
rank: node: exit code[: error message]
0: 10.8.102.28: 1
1: 10.8.102.30: 1: Fatal error in MPI_Finalize: Other MPI error, error
stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
and was found in the database, but it does not have the correct
associated data
being resolved for. (errno 11004)
2: 10.8.102.27: 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080807/52659e0c/attachment.htm>
More information about the mpich-discuss
mailing list