[MPICH] Running on root's MPD as either root or another user

Ralph Butler rbutler at mtsu.edu
Mon Oct 1 17:26:22 CDT 2007


YOu probably want to just have a single mpd running on a single host  
until you get it all sorted out.
Running on multiple hosts until then will merely cause more  
confusion.  As stated in the manual,
when mpd is run as root, it can service multiple users.  You need  
just one per host.  It must be run
as root.  The users (as described below) must run mpiexec that is  
linked to mpdroot that has been
marked as setuid-root.  This will cause their execution of mpiexec to  
contact root's mpd.  It will not
start up a separate daemon for the user.

I have tested this on an ubuntu box and it works fine.  I had to make  
sure
that MPD_USE_ROOT_MPD was set as described in the manual.

On Oct 1, 2007, at 5:13 PM, Matthew Chambers wrote:

> Hmm, I tried that (killed the running mpd and restarted it on all  
> machines) but no change.  The MPD is running as root:
> root      9629     1  0 17:06 ?        00:00:00 python2.4 /frogstar/ 
> usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --daemon
>
> Is the user's mpiexec call supposed to start its own mpd even when  
> it's supposed to be using the root's existing mpd?
>
> Also, is this user supposed to exist on the other machines or is  
> the MPI program run as whichever user started the mpd?
>
> -Matt
>
>
> Ralph Butler wrote:
>> I would suggest killing all mpds and related processes and  
>> starting them from scratch.
>> Make sure that the mpd is running as root if users plan to use its  
>> services.  Then, the users
>> need to make sure they use the mpiexec that is linked to the  
>> mpdroot which is marked as +s.
>>
>> On MonOct 1, at Mon Oct 1 5:01PM, Matthew Chambers wrote:
>>
>>> Ah, the s bit has to be on the owner's set, I hadn't tried that  
>>> (and don't really understand why).  But now I'm back to mpiexec  
>>> locking up when I try to run a job from the user's account, and  
>>> when I break the process, I get:
>>> (mpiexec 413): mpiexec: failed to obtain sock from manager
>>> And also there's a stranded mpd process in the process list each  
>>> time I try:
>>> rslebos  27003 10967  0 16:58 ?        00:00:00 python2.4 / 
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>> rslebos  27008 10967  0 16:58 ?        00:00:00 python2.4 / 
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>> rslebos  27028 10967  0 17:00 ?        00:00:00 python2.4 / 
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>
>>> Confused...
>>> -Matt
>>>
>>
>




More information about the mpich-discuss mailing list