[MPICH] Running on root's MPD as either root or another user
Ralph Butler
rbutler at mtsu.edu
Mon Oct 1 19:32:56 CDT 2007
Yes, if it all works in side one machine, then the problem is almost
certainly due to host/net config
issues. The manual actually suggests not trying things as root until
you have all those issues
addressed. Those problems are addressed via running mpdcheck.
Here is a blurb about that:
Sometimes there are problems with mpd or mpdboot while following
the Quick Start portion of the mpich2 install guide. This typically
happens somewhere during Steps 10-13, but may occur during other
steps as well. The guide suggests that when mpd/mpdboot problems
arise, you follow the procedures in Appendix A (Troubleshooting MPDs).
Section A.1 (Getting Started with MPD) provides a 7-step procedure
to follow to get one or more mpds to working, first by hand, and
then via mpdboot. However, some of the early steps begin with a
pre-MPD program called mpdcheck. That program is designed to help
determine in advance if there will be problems associated wtih host
or network configuration. The instructions in section A.1 suggest
first using mpdcheck on individual machines, and then pair-wise.
It is particularly important to try the pair-wise experiments where
one machine plays the role of the server and the other the client,
and then to reverse the roles.
Sometimes the procedures in A.1 indicate that MPDs are not likely
to run on your systems due to problems with host and/or network
configuration. At those points, you are referred to subsequent
sections, e.g. A.2 Debugging host/network configuration problems,
or A.3 Firewalls, etc.
On Oct 1, 2007, at 6:23 PM, Matt Chambers wrote:
> I should have mentioned that the mpiexec call does NOT hang if I
> do, for example, "mpiexec -n 1 hostname", i.e. as long as it
> doesn't have to create a socket to another machine it works fine.
> I haven't tried multiple processes on one machine, do you suggest
> that? I posted the .mpd.conf of the user I'm testing with. It's
> got MPD_USE_ROOT_MPD=1 set just like the manual describes. And yet
> when I run an mpiexec job ("which mpiexec" returns "/frogstar/usr/
> ppc/bin/mpiexec" which is the same directory that I setuid'd
> mpdroot, actually I setuid's everything in there) the user's
> account DOES seem to try to start an mpd of its own, and then it
> just hangs. Anything else to try? :(
>
> -Matt
>
> Ralph Butler wrote:
>> YOu probably want to just have a single mpd running on a single
>> host until you get it all sorted out.
>> Running on multiple hosts until then will merely cause more
>> confusion. As stated in the manual,
>> when mpd is run as root, it can service multiple users. You need
>> just one per host. It must be run
>> as root. The users (as described below) must run mpiexec that is
>> linked to mpdroot that has been
>> marked as setuid-root. This will cause their execution of mpiexec
>> to contact root's mpd. It will not
>> start up a separate daemon for the user.
>>
>> I have tested this on an ubuntu box and it works fine. I had to
>> make sure
>> that MPD_USE_ROOT_MPD was set as described in the manual.
>>
>> On Oct 1, 2007, at 5:13 PM, Matthew Chambers wrote:
>>
>>> Hmm, I tried that (killed the running mpd and restarted it on all
>>> machines) but no change. The MPD is running as root:
>>> root 9629 1 0 17:06 ? 00:00:00 python2.4 /
>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --
>>> daemon
>>>
>>> Is the user's mpiexec call supposed to start its own mpd even
>>> when it's supposed to be using the root's existing mpd?
>>>
>>> Also, is this user supposed to exist on the other machines or is
>>> the MPI program run as whichever user started the mpd?
>>>
>>> -Matt
>>>
>>>
>>> Ralph Butler wrote:
>>>> I would suggest killing all mpds and related processes and
>>>> starting them from scratch.
>>>> Make sure that the mpd is running as root if users plan to use
>>>> its services. Then, the users
>>>> need to make sure they use the mpiexec that is linked to the
>>>> mpdroot which is marked as +s.
>>>>
>>>> On MonOct 1, at Mon Oct 1 5:01PM, Matthew Chambers wrote:
>>>>
>>>>> Ah, the s bit has to be on the owner's set, I hadn't tried that
>>>>> (and don't really understand why). But now I'm back to mpiexec
>>>>> locking up when I try to run a job from the user's account, and
>>>>> when I break the process, I get:
>>>>> (mpiexec 413): mpiexec: failed to obtain sock from manager
>>>>> And also there's a stranded mpd process in the process list
>>>>> each time I try:
>>>>> rslebos 27003 10967 0 16:58 ? 00:00:00 python2.4 /
>>>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>>> rslebos 27008 10967 0 16:58 ? 00:00:00 python2.4 /
>>>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>>> rslebos 27028 10967 0 17:00 ? 00:00:00 python2.4 /
>>>>> frogstar/usr/ppc/bin/mpd --listenport=4050 --ifhn=172.20.0.1 --dae
>>>>>
>>>>> Confused...
>>>>> -Matt
>>>>>
>>>>
>>>
>>
>
More information about the mpich-discuss
mailing list