[MPICH] MPICH2 not running as 'normal' users
Troy Telford
ttelford.groups at gmail.com
Tue Oct 9 16:05:40 CDT 2007
I'm hoping this is simple oversight. I've had no real issues using MPICH2 in
the past, so this is a bit of a suprise to me.
I've got a new cluster I'm setting up to use MPICH2.
The program I'm running is a simple hello world that reports the rank of the
process, and the node it's running on.
I can execute it fine when running from the login node as a non-privileged
user.
It also executes fine when running on the compute nodes as 'root'
However, when I try to run as an unprivileged user on the compute nodes, the
job quits with an error:
Here's a rundown of sorts:
(from the login node, running on itself)
$ mpdboot -n 8
$ mpdtrace -l
login.default.domain_48762 (10.254.1.250)
n001_47142 (10.254.1.1)
n002_40636 (10.254.1.2)
n003_40697 (10.254.1.3)
n004_40394 (10.254.1.4)
n005_40151 (10.254.1.5)
n006_39487 (10.254.1.6)
n007_39540 (10.254.1.7)
[mpdringtest works fine]
$ mpiexec -n 1 ./test
login.default.domain : proc (0)
Same thing, but including compute nodes:
$ mpiexec -n 2 ./test
[cli_1]: aborting job:
Fatal error in MPI_Init: Other MPI error, Null value
rank 1 in job 1 ls1host.default.domain_48874 caused collective abort of all
ranks
exit status of rank 1: return code 1
Now, if I log into the compute node, and try running it, the error is similar
$ mpiexec -n 1 ./test
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, Null value
rank 0 in job 1 n001_47208 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
If I use just one process, and specify 'host' as the login node, it does work
(which I'd expect to see)
$ mpiexec -n 1 -host login ./test
login.default.domain : proc (0)
None of this happens when the user is 'root'. There aren't any login issues
(ssh keys are fine, rsh is fine, etc.) I noticed an mpd logfile
in /tmp/mpd2.logfile_<user>, but its contents is just:
logfile for mpd with pid 25772
Could anybody please give me a clue about what may be happening such that I'm
able to run as root, but not as a 'normal' user?
--
Troy Telford
More information about the mpich-discuss
mailing list