[mpich-discuss] problem with running mpd on different nodes

Rajeev Thakur thakur at mcs.anl.gov
Thu Jul 17 09:44:24 CDT 2008


There is probably something wrong with the networking configuration on the
machines. To debug the problem, you can use the mpdcheck utility as
described in the installation guide (follow all steps).

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Vlad Cojocaru
> Sent: Thursday, July 17, 2008 5:09 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] problem with running mpd on different nodes
> 
> Dear MPICH2 users,
> 
> Yesterday I have compiled mpich2 1.0.7 on a machine called node-06-01
> (64 bits opteron). I am properly  running my mpi application 
> on this machine and another one 06-02. However, when I tried 
> to go to a different node 05-02, I started mpd & but mpdtrace 
> does not retrieve the name of the host anymore (as it does on 
> 06-01 and 06-02). Instead I get errors like the one below.
> 
> node-05-02 is a similar machine. My ~/mpd.conf file is 
> visible from all nodes.
> Looking at the mpdtrace python script I noticed that for this 
> machine he msg{} at line 57 is empty while on both 06-01 and 
> 06-02 is not an empty string. The problem appears to be 
> located somewhere in the recv_dict_msg function. However I am 
> not very good with python so I was not able to detect the problem.
> 
> Does anybody have any idea how to solve this ?
> 
> Thanks
> vlad
> 
> 
> ----------------error-----------------------
> Alarm clock
> node-05-02_42768 (mpd_sockpair 226): connect 110 Connection timed out
> node-05-02_42768 (mpd_sockpair 233): connect error with 110 
> Connection timed out
> node-05-02_42768 (mpd_sockpair 244): connect 22 Invalid argument
> node-05-02_42768: mpd_uncaught_except_tb handling:
>   socket.error: (22, 'Invalid argument')
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 245  mpd_sockpair
>         raise socket.error, errinfo
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 802  create_single_mem_ring
>         self.lhsSock,self.rhsSock = mpd_sockpair()
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 848  enter_ring
>         rhsHandler=rhsHandler)
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  250  run
>         rhsHandler=self.handle_rhs_input)
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  1492  ?
>         mpd.run()
> node-05-02_49229: mpd_uncaught_except_tb handling:
>   exceptions.OSError: [Errno 2] No such file or directory: 
> '/tmp/mpd2.console_cojocavd'
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 1128  __init__
>         os.unlink(self.conFilename)
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  237  run
>         self.conListenSock =
> MPDConListenSock(secretword=self.parmdb['MPD_SECRETWORD'])
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  1492  ?
>         mpd.run()
> node-05-02_43805: mpd_uncaught_except_tb handling:
>   exceptions.OSError: [Errno 2] No such file or directory: 
> '/tmp/mpd2.console_cojocavd'
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 1128  __init__
>         os.unlink(self.conFilename)
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  237  run
>         self.conListenSock =
> MPDConListenSock(secretword=self.parmdb['MPD_SECRETWORD'])
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  1492  ?
>         mpd.run()
> node-05-02_42949: mpd_uncaught_except_tb handling:
>   exceptions.OSError: [Errno 2] No such file or directory: 
> '/tmp/mpd2.console_cojocavd'
>     
> /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin
> /mpdlib.py
> 1128  __init__
>         os.unlink(self.conFilename)
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  237  run
>         self.conListenSock =
> MPDConListenSock(secretword=self.parmdb['MPD_SECRETWORD'])
>     /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  1492  ?
>         mpd.run()
> 
> 
> --
> --------------------------------------------------------------
> --------------
> Dr. Vlad Cojocaru
> 
> EML Research gGmbH
> Schloss-Wolfsbrunnenweg 33
> 69118 Heidelberg
> 
> Tel: ++49-6221-533266
> Fax: ++49-6221-533298
> 
> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
> 
> http://projects.villa-bosch.de/mcm/people/cojocaru/
> 
> --------------------------------------------------------------
> --------------
> EML Research gGmbH
> Amtgericht Mannheim / HRB 337446
> Managing Partner: Dr. h.c. Klaus Tschira Scientific and 
> Managing Director: Prof. Dr.-Ing. Andreas Reuter http://www.eml-r.org
> --------------------------------------------------------------
> --------------
> 
> 
> 




More information about the mpich-discuss mailing list