[mpich-discuss] mpdtrace cannot connect (cygwin & mpd), previous smpd remnants?

Jayesh Krishna jayesh at mcs.anl.gov
Fri Mar 28 14:03:22 CDT 2008


 Hi,
  Currently, the only process manager available with MPICH2 on windows is
SMPD (You could however install MPD by compiling/installing MPICH2 using
cygwin). MPICH2 installation on windows should not interfere with the
installation of MPICH2+MPD using cygwin.
  When you re-compile MPICH2 to use MPD make sure that you do "make
distclean" and re-configure/compile/install MPICH2.
  Do you get any messages from mpd when you run it as a foreground process
(Just "mpd" instead of "mpd&")?

Regards,
Jayesh
-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ingo Bojak
Sent: Friday, March 28, 2008 1:30 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpdtrace cannot connect (cygwin & mpd), previous
smpd remnants?

Hi,

I've been installing mpich2 on my local Windows machines, simply for testing
out my mpi programs locally (i.e., these machines are not part of the Linux
cluster my programs will eventually run on).

At first I installed the smpd version on my office computer, but I didn't
get around to doing anything with it. Next I complied the mpd version under
cygwin on my laptop, and I've been using that for a few days now for coding
- works nicely. So when I got back to the office I decided to wipe the smpd
version and compile mpd instead there too, to have a consistent environment
(also consistent with the Linux cluster I will use this on). Thus I
uninstalled smpd with "Control Panel - Add or Remove Programs". Later on
(since it didn't work, as explained below) I also scanned with regedit for
"mpich" and "smpd" and removed what I could find. I also checked that no
smpd service is started anymore after reboot.

Now, on the office machine the mpd compilation worked fine under cygwin,
path is set and binaries are found ("which"), I did create an ".mpd.conf" -
all was going well as on the laptop. But then I tried to start mpd:

---
mpd &
[1] 2668

ps
      PID    PPID    PGID     WINPID  TTY  UID    STIME COMMAND
     2896       1    2896       2896  con 1008 18:03:07 /usr/bin/bash
     2668    2896    2668       3772  con 1008 18:04:28 /usr/bin/python2.5
     3256    2896    3256       3276  con 1008 19:03:38 /usr/bin/sh

ls -l /tmp
total 2048
-rw-r--r-- 1 username None 2111 Mar 28 19:01 XWin.log
-rw-r--r-- 1 username None   53 Mar 28 18:02 mpd2.console_username

mpdtrace
mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_username); possible
causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option) In
case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see the MPICH2
Installation Guide.
---

No other mpds are running (no pyhtons with ps). I should mention that the
first time I tried this I had to - and did - unblock python for the Win
firewall. I've checked that, too.

Now, I guess a possible solution is to re-install smpd again and see if that
works any better. But I don't really see why mpd would not work on my office
machine, and I would prefer to use it. Is there perhaps some remnant of the
earlier smpd install which is interfering with mpd now, or any other clue
what is going on?

I had some /etc/hosts.allow and /etc/hosts.deny files on the cygwin in the
office. I checked on the laptop, and there were simply no such files in the
/etc there. So I renamed those files on the office computer. That didn't
help though after restart. Here's a "mpd --trace" output:

---
computername_1359: ENTER mpd_check_python_version in
/usr/local/bin/mpich2/bin/mpdlib.py at line 275; ARGS=()
computername_1359: EXIT mpd_check_python_version at line 280
computername_1359: ENTER mpd_get_my_username in
/usr/local/bin/mpich2/bin/mpdlib.py at line 285; ARGS=()
computername_1359: EXIT mpd_get_my_username at line 294
computername_1359: ENTER mpd_get_my_username in
/usr/local/bin/mpich2/bin/mpdlib.py at line 285; ARGS=()
computername_1359: EXIT mpd_get_my_username at line 294
computername_1359: ENTER mpd_get_my_username in
/usr/local/bin/mpich2/bin/mpdlib.py at line 285; ARGS=()
computername_1359: EXIT mpd_get_my_username at line 294
computername_1359: ENTER set_handler in /usr/local/bin/mpich2/bin/mpdlib.py
at line 719; ARGS=(self=<mpdlib.MPDStreamHandler object at 0x7fed6b0c>,
stream=<mpdlib.MPDConListenSock object at 0x7feec02c>, handler=<bound method
MPD.handle_console_connection of <__main__.MPD object at 
0x7fed680c>>, args=())
computername_1359: EXIT set_handler at line 720
computername_1359: ENTER register in
/tmp/python.6884/usr/lib/python2.5/atexit.py at line 37; ARGS=(func=<bound
method MPD.cleanup of <__main__.MPD object at 
0x7fed680c>>, *targs=(), **kargs={})
computername_1359: EXIT register at line 44
computername_1359: ENTER seed in
/tmp/python.6884/usr/lib/python2.5/random.py at line 97;
ARGS=(self=<random.Random object at 0x74d44c>, a=None)
computername_1359: EXIT seed at line 114
computername_1359: ENTER enter_ring in
/usr/local/bin/mpich2/bin/mpdlib.py at line 827; ARGS=(self=<mpdlib.MPDRing
object at 0x7fed6b4c>, entryIfhn='', entryPort=0, lhsHandler=<bound method
MPD.handle_lhs_input of <__main__.MPD object at 0x7fed680c>>,
rhsHandler=<bound method MPD.handle_rhs_input of <__main__.MPD object at
0x7fed680c>>, ntries=1)
computername_1359: ENTER create_single_mem_ring in
/usr/local/bin/mpich2/bin/mpdlib.py at line 801; ARGS=(self=<mpdlib.MPDRing
object at 0x7fed6b4c>, ifhn='1xx.1xx.2xx.8', port=1359, lhsHandler=<bound
method MPD.handle_lhs_input of <__main__.MPD object at 0x7fed680c>>,
rhsHandler=<bound method MPD.handle_rhs_input of <__main__.MPD object at
0x7fed680c>>)
computername_1359: ENTER mpd_sockpair in /usr/local/bin/mpich2/bin/mpdlib.py
at line 199; ARGS=()
computername_1359: EXIT mpd_sockpair at line 259
computername_1359: ENTER set_handler in /usr/local/bin/mpich2/bin/mpdlib.py
at line 719; ARGS=(self=<mpdlib.MPDStreamHandler object at 0x7fed6b0c>,
stream=<mpdlib.MPDSock object at 0x7feec32c>, handler=<bound method
MPD.handle_lhs_input of <__main__.MPD object at 0x7fed680c>>, args=())
computername_1359: EXIT set_handler at line 720
computername_1359: ENTER set_handler in /usr/local/bin/mpich2/bin/mpdlib.py
at line 719; ARGS=(self=<mpdlib.MPDStreamHandler object at 0x7fed6b0c>,
stream=<mpdlib.MPDSock object at 0x7feec40c>, handler=<bound method
MPD.handle_rhs_input of <__main__.MPD object at 0x7fed680c>>, args=())
computername_1359: EXIT set_handler at line 720
computername_1359: EXIT create_single_mem_ring at line 810
computername_1359: EXIT enter_ring at line 872
computername_1359: ENTER runmainloop in /usr/local/bin/mpich2/bin/mpd at
line 281; ARGS=(self=<__main__.MPD object at 0x7fed680c>)
computername_1359: ENTER handle_active_streams in
/usr/local/bin/mpich2/bin/mpdlib.py at line 728;
ARGS=(self=<mpdlib.MPDStreamHandler object at 0x7fed6b0c>, streams=None,
timeout=8.0)
---

and this is the output of "mpdcheck -v"

---
obtaining hostname via gethostname and getfqdn gethostname gives
computername getfqdn gives  computername.somwhere.xx.
checking out unqualified hostname; make sure is not "localhost", etc.
checking out qualified hostname; make sure is not "localhost", etc.
obtain IP addrs via qualified and unqualified hostnames;  make sure other
than 127.0.0.1
gethostbyname_ex:  ('computername', [], ['1xx.1xx.2xx.8'])
gethostbyname_ex:  ('computername.somwhere.xx', [], ['1xx.1xx.2xx.8'])
checking that IP addrs resolve to same host now do some gethostbyaddr and
gethostbyname_ex for machines in hosts file
---

Thanks in advance for any help,
Ingo







More information about the mpich-discuss mailing list