[mpich-discuss] Cannot use the main node to run a process of the programme

Jayesh Krishna jayesh at mcs.anl.gov
Thu Aug 7 11:59:45 CDT 2008


Hi,
 Can you run cpi.exe (provided with MPICH2 in the examples directory) on
all the nodes ?
 Can you run a simple hello world program using all the nodes ?
 Try running the programs with and without sharing (mapping network drive)
the executable across nodes.
 
/* ############ MPI Hello world ################ */
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[]){
    int rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("[%d] Hello world\n", rank);
    MPI_Finalize();
}
 
/* ############ MPI Hello world ################ */
 
Regards,
Jayesh

  _____  

From: warunapww at gmail.com [mailto:warunapww at gmail.com] On Behalf Of Waruna
Ranasinghe
Sent: Thursday, August 07, 2008 11:36 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Cannot use the main node to run a process of
the programme


Hi,
I tried even mapping the drive as Jayesh mentioned, but the problem is
still the same.
If I run the programme only in the master node, then it will run.
Otherwise if I use other nodes including master node to run the programme,
the programme give the output but it won't exit (mpi finalize does not
work or called)

Please help me to over come this issue.

Regards,
Waruna Ranasinghe


2008/7/25 Jayesh Krishna <jayesh at mcs.anl.gov>


Hi,
 You should be able to use all the nodes (with MPICH2 installed) for
running your job (i.e., You should be able to use the main node to run
your MPI processes).
 If you are using a shared drive to run your program you should map the
drive on all the nodes using the "-map" option of mpiexec (see the windows
developer's guide, available at
http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=
docs, for details)
 
Regards,
Jayesh

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Waruna Ranasinghe
Sent: Friday, July 25, 2008 3:06 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Cannot use the main node to run a process of the
programme


Hi all,

I'm using MPICH2 in Windows.
I can run my programme without errors if I don't use the machine in which
I execute the command (Main node).

mpiexec -channel ssm -n 3 -exitcodes -machinefile "c:\Program
Files\MPICH2\bin\hosts.txt" -wdir //10.8.102.27/ClusterShared GBMTest

If I use the main node also to execute one of the 3 processes, then it
gives the error below. But it prints the output I wanted too. then it
gives the error.
I wanted to know whether this is an issue with my programme(GBMTest) or I
cant use the main node to run the process.
In the machinefile I have included three machines. 
10.8.102.28
10.8.102.30
10.8.102.27 (main node)

This works fine if I remove the main node and add another node instead.

this is the error.
//////////////////////////////////////////////////////////////////////////
//////////
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct
associated data
 being resolved for. (errno 11004)
job aborted:
rank: node: exit code[: error message]
0: 10.8.102.28: 1
1: 10.8.102.30: 1: Fatal error in MPI_Finalize: Other MPI error, error
stack:
MPI_Finalize(255)............: MPI_Finalize failed
MPI_Finalize(154)............:
MPID_Finalize(94)............:
MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77).............:
MPIC_Sendrecv(120)...........:
MPID_Isend(103)..............: failure occurred while attempting to send
an eage
r message
MPIDI_CH3_iSend(168).........:
MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
2 usi
ng business card <port=1179 description=cse-365237834578
ifname=10.8.102.27 shm_
host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
shm_pid=284
0 >
MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
port 117
9, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid
 and was found in the database, but it does not have the correct
associated data
 being resolved for. (errno 11004)
2: 10.8.102.27: 1



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080807/52659e0c/attachment.htm>


More information about the mpich-discuss mailing list