[MPICH] Running on root's MPD as either root or another user
Matt Chambers
matthew.chambers at vanderbilt.edu
Mon Oct 1 18:23:44 CDT 2007
I should have mentioned that the mpiexec call does NOT hang if I do, for
example, "mpiexec -n 1 hostname", i.e. as long as it doesn't have to
create a socket to another machine it works fine. I haven't tried
multiple processes on one machine, do you suggest that? I posted the
.mpd.conf of the user I'm testing with. It's got MPD_USE_ROOT_MPD=1 set
just like the manual describes. And yet when I run an mpiexec job
("which mpiexec" returns "/frogstar/usr/ppc/bin/mpiexec" which is the
same directory that I setuid'd mpdroot, actually I setuid's everything
in there) the user's account DOES seem to try to start an mpd of its
own, and then it just hangs. Anything else to try? :(
-Matt
Ralph Butler wrote:
> YOu probably want to just have a single mpd running on a single host
> until you get it all sorted out.
> Running on multiple hosts until then will merely cause more
> confusion. As stated in the manual,
> when mpd is run as root, it can service multiple users. You need just
> one per host. It must be run
> as root. The users (as described below) must run mpiexec that is
> linked to mpdroot that has been
> marked as setuid-root. This will cause their execution of mpiexec to
> contact root's mpd. It will not
> start up a separate daemon for the user.
>
> I have tested this on an ubuntu box and it works fine. I had to make
> sure
> that MPD_USE_ROOT_MPD was set as described in the manual.
>
> On Oct 1, 2007, at 5:13 PM, Matthew Chambers wrote:
>
>> Hmm, I tried that (killed the running mpd and restarted it on all
>> machines) but no change. The MPD is running as root:
>> root 9629 1 0 17:06 ? 00:00:00 python2.4
>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --daemon
>>
>> Is the user's mpiexec call supposed to start its own mpd even when
>> it's supposed to be using the root's existing mpd?
>>
>> Also, is this user supposed to exist on the other machines or is the
>> MPI program run as whichever user started the mpd?
>>
>> -Matt
>>
>>
>> Ralph Butler wrote:
>>> I would suggest killing all mpds and related processes and starting
>>> them from scratch.
>>> Make sure that the mpd is running as root if users plan to use its
>>> services. Then, the users
>>> need to make sure they use the mpiexec that is linked to the mpdroot
>>> which is marked as +s.
>>>
>>> On MonOct 1, at Mon Oct 1 5:01PM, Matthew Chambers wrote:
>>>
>>>> Ah, the s bit has to be on the owner's set, I hadn't tried that
>>>> (and don't really understand why). But now I'm back to mpiexec
>>>> locking up when I try to run a job from the user's account, and
>>>> when I break the process, I get:
>>>> (mpiexec 413): mpiexec: failed to obtain sock from manager
>>>> And also there's a stranded mpd process in the process list each
>>>> time I try:
>>>> rslebos 27003 10967 0 16:58 ? 00:00:00 python2.4
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>> rslebos 27008 10967 0 16:58 ? 00:00:00 python2.4
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>> rslebos 27028 10967 0 17:00 ? 00:00:00 python2.4
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>>
>>>> Confused...
>>>> -Matt
>>>>
>>>
>>
>
More information about the mpich-discuss
mailing list