[MPICH] Running on root's MPD as either root or another user

Matt Chambers matthew.chambers at vanderbilt.edu
Mon Oct 1 20:14:24 CDT 2007


I don't understand how getting an MPD running from a user account will 
help me in debugging why I can't get my user account to use root's 
existing MPD ring.  To clarify, root's MPD ring works perfectly fine and 
I can run MPI programs as root over any number of nodes.  It's only when 
I try to use root's ring from a user account that the multi-node mpiexec 
calls hang (apparently while trying to get a socket, or at least that's 
the error it gives when breaking at the hang).

-Matt

Ralph Butler wrote:
> Yes, if it all works in side one machine, then the problem is almost 
> certainly due to host/net config
> issues.  The manual actually suggests not trying things as root until 
> you have all those issues
> addressed.  Those problems are addressed via running mpdcheck.
> Here is a blurb about that:
>
> Sometimes there are problems with mpd or mpdboot while following
> the Quick Start portion of the mpich2 install guide.  This typically
> happens somewhere during Steps 10-13, but may occur during other
> steps as well.  The guide suggests that when mpd/mpdboot problems
> arise, you follow the procedures in Appendix A (Troubleshooting MPDs).
>
> Section A.1 (Getting Started with MPD) provides a 7-step procedure
> to follow to get one or more mpds to working, first by hand, and
> then via mpdboot.  However, some of the early steps begin with a
> pre-MPD program called mpdcheck.  That program is designed to help
> determine in advance if there will be problems associated wtih host
> or network configuration.  The instructions in section A.1 suggest
> first using mpdcheck on individual machines, and then pair-wise.
> It is particularly important to try the pair-wise experiments where
> one machine plays the role of the server and the other the client,
> and then to reverse the roles.
>
> Sometimes the procedures in A.1 indicate that MPDs are not likely
> to run on your systems due to problems with host and/or network
> configuration.  At those points, you are referred to subsequent
> sections, e.g. A.2 Debugging host/network configuration problems,
> or A.3 Firewalls, etc.
>
>




More information about the mpich-discuss mailing list