[MPICH] Running on root's MPD as either root or another user

Matt Chambers matthew.chambers at vanderbilt.edu
Mon Oct 1 18:23:44 CDT 2007


I should have mentioned that the mpiexec call does NOT hang if I do, for 
example, "mpiexec -n 1 hostname", i.e. as long as it doesn't have to 
create a socket to another machine it works fine.  I haven't tried 
multiple processes on one machine, do you suggest that?  I posted the 
.mpd.conf of the user I'm testing with.  It's got MPD_USE_ROOT_MPD=1 set 
just like the manual describes.  And yet when I run an mpiexec job 
("which mpiexec" returns "/frogstar/usr/ppc/bin/mpiexec" which is the 
same directory that I setuid'd mpdroot, actually I setuid's everything 
in there) the user's account DOES seem to try to start an mpd of its 
own, and then it just hangs.  Anything else to try? :(

-Matt

Ralph Butler wrote:
> YOu probably want to just have a single mpd running on a single host 
> until you get it all sorted out.
> Running on multiple hosts until then will merely cause more 
> confusion.  As stated in the manual,
> when mpd is run as root, it can service multiple users.  You need just 
> one per host.  It must be run
> as root.  The users (as described below) must run mpiexec that is 
> linked to mpdroot that has been
> marked as setuid-root.  This will cause their execution of mpiexec to 
> contact root's mpd.  It will not
> start up a separate daemon for the user.
>
> I have tested this on an ubuntu box and it works fine.  I had to make 
> sure
> that MPD_USE_ROOT_MPD was set as described in the manual.
>
> On Oct 1, 2007, at 5:13 PM, Matthew Chambers wrote:
>
>> Hmm, I tried that (killed the running mpd and restarted it on all 
>> machines) but no change.  The MPD is running as root:
>> root      9629     1  0 17:06 ?        00:00:00 python2.4 
>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --daemon
>>
>> Is the user's mpiexec call supposed to start its own mpd even when 
>> it's supposed to be using the root's existing mpd?
>>
>> Also, is this user supposed to exist on the other machines or is the 
>> MPI program run as whichever user started the mpd?
>>
>> -Matt
>>
>>
>> Ralph Butler wrote:
>>> I would suggest killing all mpds and related processes and starting 
>>> them from scratch.
>>> Make sure that the mpd is running as root if users plan to use its 
>>> services.  Then, the users
>>> need to make sure they use the mpiexec that is linked to the mpdroot 
>>> which is marked as +s.
>>>
>>> On MonOct 1, at Mon Oct 1 5:01PM, Matthew Chambers wrote:
>>>
>>>> Ah, the s bit has to be on the owner's set, I hadn't tried that 
>>>> (and don't really understand why).  But now I'm back to mpiexec 
>>>> locking up when I try to run a job from the user's account, and 
>>>> when I break the process, I get:
>>>> (mpiexec 413): mpiexec: failed to obtain sock from manager
>>>> And also there's a stranded mpd process in the process list each 
>>>> time I try:
>>>> rslebos  27003 10967  0 16:58 ?        00:00:00 python2.4 
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>> rslebos  27008 10967  0 16:58 ?        00:00:00 python2.4 
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>> rslebos  27028 10967  0 17:00 ?        00:00:00 python2.4 
>>>> /frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>>
>>>> Confused...
>>>> -Matt
>>>>
>>>
>>
>




More information about the mpich-discuss mailing list