[MPICH] Running on root's MPD as either root or another user
Ralph Butler
rbutler at mtsu.edu
Mon Oct 1 17:26:22 CDT 2007
YOu probably want to just have a single mpd running on a single host
until you get it all sorted out.
Running on multiple hosts until then will merely cause more
confusion. As stated in the manual,
when mpd is run as root, it can service multiple users. You need
just one per host. It must be run
as root. The users (as described below) must run mpiexec that is
linked to mpdroot that has been
marked as setuid-root. This will cause their execution of mpiexec to
contact root's mpd. It will not
start up a separate daemon for the user.
I have tested this on an ubuntu box and it works fine. I had to make
sure
that MPD_USE_ROOT_MPD was set as described in the manual.
On Oct 1, 2007, at 5:13 PM, Matthew Chambers wrote:
> Hmm, I tried that (killed the running mpd and restarted it on all
> machines) but no change. The MPD is running as root:
> root 9629 1 0 17:06 ? 00:00:00 python2.4 /frogstar/
> usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --daemon
>
> Is the user's mpiexec call supposed to start its own mpd even when
> it's supposed to be using the root's existing mpd?
>
> Also, is this user supposed to exist on the other machines or is
> the MPI program run as whichever user started the mpd?
>
> -Matt
>
>
> Ralph Butler wrote:
>> I would suggest killing all mpds and related processes and
>> starting them from scratch.
>> Make sure that the mpd is running as root if users plan to use its
>> services. Then, the users
>> need to make sure they use the mpiexec that is linked to the
>> mpdroot which is marked as +s.
>>
>> On MonOct 1, at Mon Oct 1 5:01PM, Matthew Chambers wrote:
>>
>>> Ah, the s bit has to be on the owner's set, I hadn't tried that
>>> (and don't really understand why). But now I'm back to mpiexec
>>> locking up when I try to run a job from the user's account, and
>>> when I break the process, I get:
>>> (mpiexec 413): mpiexec: failed to obtain sock from manager
>>> And also there's a stranded mpd process in the process list each
>>> time I try:
>>> rslebos 27003 10967 0 16:58 ? 00:00:00 python2.4 /
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>> rslebos 27008 10967 0 16:58 ? 00:00:00 python2.4 /
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>> rslebos 27028 10967 0 17:00 ? 00:00:00 python2.4 /
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>
>>> Confused...
>>> -Matt
>>>
>>
>
More information about the mpich-discuss
mailing list