[MPICH] MPICH2 not running as 'normal' users

Troy Telford ttelford.groups at gmail.com
Tue Oct 9 16:05:40 CDT 2007


I'm hoping this is simple oversight.  I've had no real issues using MPICH2 in 
the past, so this is a bit of a suprise to me.

I've got a new cluster I'm setting up to use MPICH2.

The program I'm running is a simple hello world that reports the rank of the 
process, and the node it's running on.

I can execute it fine when running from the login node as a non-privileged 
user.

It also executes fine when running on the compute nodes as 'root'

However, when I try to run as an unprivileged user on the compute nodes, the 
job quits with an error:

Here's a rundown of sorts:
(from the login node, running on itself)
$ mpdboot -n 8
$ mpdtrace -l
login.default.domain_48762 (10.254.1.250)
n001_47142 (10.254.1.1)
n002_40636 (10.254.1.2)
n003_40697 (10.254.1.3)
n004_40394 (10.254.1.4)
n005_40151 (10.254.1.5)
n006_39487 (10.254.1.6)
n007_39540 (10.254.1.7)
[mpdringtest works fine]
$ mpiexec -n 1 ./test
login.default.domain : proc (0)

Same thing, but including compute nodes:
$ mpiexec -n 2 ./test
[cli_1]: aborting job:
Fatal error in MPI_Init: Other MPI error, Null value
rank 1 in job 1  ls1host.default.domain_48874   caused collective abort of all 
ranks
  exit status of rank 1: return code 1

Now, if I log into the compute node, and try running it, the error is similar
$ mpiexec -n 1 ./test
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, Null value
rank 0 in job 1  n001_47208   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9

If I use just one process, and specify 'host' as the login node, it does work 
(which I'd expect to see)
$ mpiexec -n 1 -host login ./test
login.default.domain : proc (0)


None of this happens when the user is 'root'.  There aren't any login issues 
(ssh keys are fine, rsh is fine, etc.)  I noticed an mpd logfile 
in /tmp/mpd2.logfile_<user>, but its contents is just:
  logfile for mpd with pid 25772


Could anybody please give me a clue about what may be happening such that I'm 
able to run as root, but not as a 'normal' user?
-- 
Troy Telford




More information about the mpich-discuss mailing list