FW: [MPICH] Problems installing MPICH2 on single machine

Rajeev Thakur thakur at mcs.anl.gov
Wed Jul 13 10:51:29 CDT 2005


Ralph,
      Any idea what the problem is here? You can reply directly to the list.

Rajeev
 

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James Frye
Sent: Monday, July 11, 2005 8:25 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] Problems installing MPICH2 on single machine

Hi,

I'm having problems setting up MPICH2 (and MPICH) to run on a single
machine, and would be grateful for any help.

The machine is a Dell laptop with P3 processor, running Linux.  Output of
uname -a is:

Linux a.me.org 2.6.5-1.358 #1 Sat May 8 09:04:50 EDT 2004 i686 i686 i386
GNU/Linux

The machine is used for development.  It needs to run MPI only on itself, 
but sometimes with multiple "processors" (e.g.  "mpirun -np 4 program") to 
test code that will eventually run on a parallel machine.  I'm often not 
connected to any network, so I've arbitrarily set HOSTNAME to "a.me.org". 
When I am connected, it's often via a DHCP connection, so AFAIK (I'm not a 
network guru) there's no way to give it a real, fixed name.

I've configured & installed per instructions, in directory
/opt/mpich2/gcc.  Paths & environment are set by sourcing the script

   #! /bin/csh

   echo 'Setting MPICH2/gcc paths'
   setenv MPICH2 /opt/mpich2/gcc
   setenv PATH $PATH\:${MPICH2}/bin
   setenv MANPATH $MANPATH\:${MPICH2}/man

"mpd &" starts, "mpdtrace -l" responds with "a.me.org_32811".

"mpiexec -n 1 /bin/hostname" will eventually time out with messages like

   a.me.org_mpdman_1: conn error in connect_lhs: Connection timed out
   a.me.org_mpdman_0: mpd_uncaught_except_tb handling:
     socket_error: (110, 'Connection timed out')
   ...

"mpdcheck" and "mpdcheck -f mpd.hosts" give no output. "mpdcheck -f
mpd.hosts -ssh"
gives

   ** ssh timed out to a.me.org
   ** ssh failed to a.me.org
   ** here is the output:

(but there is no output.)

If instead I use "localhost" in the mpd.hosts file, I get

   ** Timed out waiting for client on localhost to produce output.
   client on localhost failed to access the server.
   here is the output:

and again, no output.

---

I get similar problems with MPICH (the most recent version, downloaded &
installed yesterday).  It tries to use rsh instead of ssh, and can only
run with "-np 1".

The problem would seem to be related to the DHCP connection.  I've run
MPICH for a couple of years on my network at home without problems, but
they're all connected to each other and not the outside world.
Unfortunately they're about 6000 miles away at the moment, and I need to
get some work done...

Thanks,
James







More information about the mpich-discuss mailing list