[mpich-discuss] Cannot use the main node to run a process of the programme

Waruna Ranasinghe warunapww at gmail.com
Thu Aug 7 12:15:54 CDT 2008


Hi Jayesh,
I will try and let you know

Regards,
Waruna

2008/8/7 Jayesh Krishna <jayesh at mcs.anl.gov>

>  Hi,
>  Can you run cpi.exe (provided with MPICH2 in the examples directory) on
> all the nodes ?
>  Can you run a simple hello world program using all the nodes ?
>  Try running the programs with and without sharing (mapping network drive)
> the executable across nodes.
>
> /* ############ MPI Hello world ################ */
> #include <stdio.h>
> #include "mpi.h"
> int main(int argc, char *argv[]){
>     int rank;
>     MPI_Init(&argc, &argv);
>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>     printf("[%d] Hello world\n", rank);
>     MPI_Finalize();
> }
>
> /* ############ MPI Hello world ################ */
>
> Regards,
> Jayesh
>
>  ------------------------------
> *From:* warunapww at gmail.com [mailto:warunapww at gmail.com] *On Behalf Of *Waruna
> Ranasinghe
> *Sent:* Thursday, August 07, 2008 11:36 AM
> *To:* Jayesh Krishna
> *Cc:* mpich-discuss at mcs.anl.gov
> *Subject:* Re: [mpich-discuss] Cannot use the main node to run a process
> of the programme
>
>  Hi,
> I tried even mapping the drive as Jayesh mentioned, but the problem is
> still the same.
> If I run the programme only in the master node, then it will run. Otherwise
> if I use other nodes including master node to run the programme, the
> programme give the output but it won't exit (mpi finalize does not work or
> called)
>
> Please help me to over come this issue.
>
> Regards,
> Waruna Ranasinghe
>
> 2008/7/25 Jayesh Krishna <jayesh at mcs.anl.gov>
>
>>  Hi,
>>  You should be able to use all the nodes (with MPICH2 installed) for
>> running your job (i.e., You should be able to use the main node to run your
>> MPI processes).
>>  If you are using a shared drive to run your program you should map the
>> drive on all the nodes using the "-map" option of mpiexec (see the windows
>> developer's guide, available at
>> http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs,
>> for details)
>>
>> Regards,
>> Jayesh
>>
>>  ------------------------------
>> *From:* owner-mpich-discuss at mcs.anl.gov [mailto:
>> owner-mpich-discuss at mcs.anl.gov] *On Behalf Of *Waruna Ranasinghe
>> *Sent:* Friday, July 25, 2008 3:06 AM
>> *To:* mpich-discuss at mcs.anl.gov
>> *Subject:* [mpich-discuss] Cannot use the main node to run a process of
>> the programme
>>
>>   Hi all,
>>
>> I'm using MPICH2 in Windows.
>> I can run my programme without errors if I don't use the machine in which
>> I execute the command (Main node).
>>
>> mpiexec -channel ssm -n 3 -exitcodes -machinefile "c:\Program
>> Files\MPICH2\bin\hosts.txt" -wdir //10.8.102.27/ClusterShared GBMTest
>>
>> If I use the main node also to execute one of the 3 processes, then it
>> gives the error below. But it prints the output I wanted too. then it gives
>> the error.
>> I wanted to know whether this is an issue with my programme(GBMTest) or I
>> cant use the main node to run the process.
>> In the machinefile I have included three machines.
>> 10.8.102.28
>> 10.8.102.30
>> 10.8.102.27 (main node)
>>
>> This works fine if I remove the main node and add another node instead.
>>
>> this is the error.
>>
>> ////////////////////////////////////////////////////////////////////////////////////
>> Fatal error in MPI_Finalize: Other MPI error, error stack:
>> MPI_Finalize(255)............: MPI_Finalize failed
>> MPI_Finalize(154)............:
>> MPID_Finalize(94)............:
>> MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
>> MPIR_Barrier(77).............:
>> MPIC_Sendrecv(120)...........:
>> MPID_Isend(103)..............: failure occurred while attempting to send
>> an eage
>> r message
>> MPIDI_CH3_iSend(168).........:
>> MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
>> 2 usi
>> ng business card <port=1179 description=cse-365237834578 ifname=
>> 10.8.102.27 shm_
>> host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
>> shm_pid=284
>> 0 >
>> MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
>> port 117
>> 9, exhausted all endpoints (errno -1)
>> MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
>> valid
>>  and was found in the database, but it does not have the correct
>> associated data
>>  being resolved for. (errno 11004)
>> job aborted:
>> rank: node: exit code[: error message]
>> 0: 10.8.102.28: 1
>> 1: 10.8.102.30: 1: Fatal error in MPI_Finalize: Other MPI error, error
>> stack:
>> MPI_Finalize(255)............: MPI_Finalize failed
>> MPI_Finalize(154)............:
>> MPID_Finalize(94)............:
>> MPI_Barrier(406).............: MPI_Barrier(comm=0x44000002) failed
>> MPIR_Barrier(77).............:
>> MPIC_Sendrecv(120)...........:
>> MPID_Isend(103)..............: failure occurred while attempting to send
>> an eage
>> r message
>> MPIDI_CH3_iSend(168).........:
>> MPIDI_CH3I_Sock_connect(1191): [ch3:sock] rank 1 unable to connect to rank
>> 2 usi
>> ng business card <port=1179 description=cse-365237834578 ifname=
>> 10.8.102.27 shm_
>> host=cse-365237834578 shm_queue=376D692D-A683-4917-BF58-13BD35D071E8
>> shm_pid=284
>> 0 >
>> MPIDU_Sock_post_connect(1228): unable to connect to cse-365237834578 on
>> port 117
>> 9, exhausted all endpoints (errno -1)
>> MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
>> valid
>>  and was found in the database, but it does not have the correct
>> associated data
>>  being resolved for. (errno 11004)
>> 2: 10.8.102.27: 1
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080807/5d6881ef/attachment.htm>


More information about the mpich-discuss mailing list