[MPICH2-dev] Eclipse PTP support for MPICH2
Rusty Lusk
lusk at mcs.anl.gov
Wed Jun 7 17:12:43 CDT 2006
Hi Greg,
I am glad this is working for you so far. We would like to help
with it in any way we can. With regard to your specific questions below:
a) Our component-based resource management system (Cobalt) does separate
allocation and launch, if I understand what you mean correctly. In
fact, so far you have only seen the "launch" part, in mpd. We have a
completely separate scheduler that you would use for allocation, and
feed results from it to mpd (either directly, through mpiexec, or
through an XML file) to launch jobs. Your "proxy" essentially becomes a
peer component in Cobalt and can do all sorts of things.
b) Yes, environment variables are used to enable MPI_Init to work, along
with information that comes through the PMI interface. In fact, if you
run "printenv" with MPD you can see what the environment looks like:
shakey% mpiexec -genvnone -n 1 printenv
PMI_RANK=0
PMI_SIZE=1
PMI_PORT=140.221.9.72:44587
PMI_SPAWNED=0
PMI_DEBUG=0
MPICH_INTERFACE_HOSTNAME=140.221.9.72
PATH=/sandbox/lusk/jul6-install-mpd/bin:/usr/local/X11R5/bin:.:/mcs/bin:/usr/local/bin:/homes/lusk/bin:.
PMI_TOTALVIEW=0
Again if I understand you correctly, the mpd infrastructure either
already has what you need or can be tweaked to provide it, and we would
like to help.
Also, have you looked at mpd's approach to debugging, the -gdb option on
mpiexec? (formerly known as mpigdb). It has some features I like, and
is an example of slipping a debugger in between the mpd managers and the
application programs.
I hope the above is helpful.
Regards,
Rusty
From: Greg Watson <gwatson at lanl.gov>
Subject: [MPICH2-dev] Eclipse PTP support for MPICH2
Date: Wed, 7 Jun 2006 11:56:49 -0600
> Ok, I've got a prototype working against mpd. I've written a proxy
> that lets Eclipse/PTP query the mpd for available hosts and load
> these into it's model so you get a view of the cluster. You can then
> launch an MPI job, see the job status reflected in the views, see
> stdout from each process, and terminate a job. The only assumption is
> that the mpd's have already been started before you launch Eclipse.
>
> I'd like to try and get the debugger working next. The debugger is an
> MPI program itself and the current (non-attach) debug startup works
> as follows:
>
> 1. Request a process allocation for the debugger (n+1 procs) and
> obtain a job id from the runtime.
>
> 2. Request a process allocation for the target program (n procs) and
> obtain a job id from the runtime.
>
> 3. Using the debugger job id, request the runtime launch the debugger.
>
> 4. Once the debugger has started (MPI_Init has completed), set the
> following in the environment of each process:
>
> - the job id of the target allocation
> - the task id for each process in the target allocation
> - the total number of processes in the target allocation
>
> 5. The debugger then forks/execs the target executable, which is now
> under it's control.
>
> 6. The target eventually calls MPI_Init and completes the
> initialization using the information from the environment variables.
>
> This process requires support from the runtime, and assumes that: (a)
> the runtime supports the separation of allocation and launch, and (b)
> that MPI initialization can be completed using values taken from the
> environment.
>
> Would anyone be able to comment if (a) is currently feasible with mpd
> (maybe requiring modification) and if (b) is supported by PMI and
> what environment variables are necessary?
>
> Thanks,
>
> Greg
>
>
> On May 18, 2006, at 12:12 PM, Rusty Lusk wrote:
>
> > Hi Greg,
> >
> > Now that I see what you need, I have a different answer. At
> > least
> > most of what you need has been implemented and documented as part
> > of the
> > mpd package, which includes other commands besides mpiexec, such as
> > mpdlistjobs, mpdsigjob, and mpdkilljob. Other parts may be
> > available as
> > effects of doing something to the mpiexec process, like suspending it,
> > continuing it, or killing it. (These were originally implemented for
> > the purpose of interactive control, but should do what you want if you
> > deliver the appropriate signals to mpiexec.) For at least one
> > level of
> > documentation, once you have installed mpd, you can do
> >
> > mpdhelp
> >
> > to get a list of the mpd commands, and then
> >
> > <mpdcmd> --help
> >
> > to get a description of how to use each command. More information
> > is in
> > the MPICH2 Installer's Guide and User's Guide, in the doc subdirectory
> > of mpich2.
> >
> > What these commands do (including mpiexec) is contact the locally
> > running mpd and talk to it via messages consisting of python
> > dictionaries. Yes, you could write your own program to generate these
> > messages, but I would hope that we have already implemented much of
> > what
> > you need, and we would be interested in implementing the rest of what
> > you need in collaboration with you.
> >
> > Regards,
> > Rusty
> >
> > P.S. I am about to invite you to a workshop at Oak Ridge on July
> > 12-14
> > at Oak Ridge. Mark your calendar.. :-)
> >
> >
> >
> >
> >
> > From: Greg Watson <gwatson at lanl.gov>
> > Subject: Re: [MPICH2-dev] mpd client library and protocol?
> > Date: Thu, 18 May 2006 11:15:30 -0600
> >
> >> Rajeev,
> >>
> >> Sorry for not being clearer. Yes, I need to be able to control an
> >> MPI program, not implement MPI or a process manager. I'm exploring
> >> the possibility of using Eclipse (via the Parallel Tools Platform) to
> >> manage the launch and control of MPI programs using MPICH2. The
> >> architecture requires an interface between PTP (Java) and the runtime
> >> system (in this case the MPICH2 process manager) that supports a few
> >> basic commands (including RUN, TERMINATE, GETJOBS, GETPROCESSES,
> >> etc.) and responds to certain events, such as process termination.
> >>
> >> Unless you can suggest a better approach, I'm thinking of writing a
> >> python program that will provide this interface to mpd. I'd prefer to
> >> do it in Java or C as I'll have to re-implement a bunch of stuff in
> >> python, but because of the way you serialize python objects it
> >> doesn't look possible the use a non-python program to communicate
> >> with mpd.
> >>
> >> If you have any documentation that would assist, it would be
> >> appreciated.
> >>
> >> Regards,
> >>
> >> Greg
> >>
> >> On May 17, 2006, at 7:46 PM, Rajeev Thakur wrote:
> >>
> >>> Greg,
> >>> What exactly are you trying to do? That might help us figure
> >>> out what
> >>> might be your best option. I am attaching the document describing
> >>> the PMI
> >>> interface. We use use PMI to *implement* MPI. You seem to want to
> >>> control
> >>> MPI program. Is that right?
> >>>
> >>> Rajeev
> >>>
> >>>
> >>>
> >>> On Wed, 17 May 2006, Greg Watson wrote:
> >>>
> >>>> Bill,
> >>>>
> >>>> I'm not sure, since I still don't really understand the
> >>>> architecture.
> >>>> Can I use PMI to launch and control an MPI program on a cluster? Or
> >>>> is that something that will be available in the future? I would
> >>>> rather not have to provide a different program for each process
> >>>> manager, but cluster support is also essential.
> >>>>
> >>>> Any information or documentation you can provide on the
> >>>> architecture
> >>>> and APIs would be appreciated.
> >>>>
> >>>> Greg
> >>>>
> >>>>
> >>>> On May 16, 2006, at 10:21 PM, William Gropp wrote:
> >>>>
> >>>>> At 11:15 PM 5/16/2006, Greg Watson wrote:
> >>>>>> Rajeev,
> >>>>>>
> >>>>>> Many thanks for your reply. Can you suggest the best approach
> >>>>>> if I
> >>>>>> want to write a C program to control mpd? At a minimum, I'd
> >>>>>> like to
> >>>>>> be able to spawn/terminate an MPI job using a C program. Is PMI
> >>>>>> what
> >>>>>> I'd use to do this?
> >>>>>>
> >>>>>> Any documentation you could provide would be appreciated.
> >>>>>
> >>>>> An alternative is to not use MPD at all and to use the PMI
> >>>>> interface. A C example of this is the "gforker" process manager;
> >>>>> this is built using a set of utility routines in mpich2/src/pm/
> >>>>> util
> >>>>> that provide the "other" side of the simple PMI interface.
> >>>>> gforker
> >>>>> implements all of the PM functions, including spawning MPI jobs.
> >>>>> Let me know if this is the direction in which you are interested.
> >>>>>
> >>>>> Bill
> >>>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Greg
> >>>>>>
> >>>>>> On May 16, 2006, at 7:56 PM, Rajeev Thakur wrote:
> >>>>>>
> >>>>>>> Greg,
> >>>>>>>
> >>>>>>>> I assume that mpdlib.py is a client library that other
> >>>>>>>> applications
> >>>>>>>> (i.e. other than mpiexec) could potentially use to communicate
> >>>>>>>> with
> >>>>>>>> and/or control mpd.
> >>>>>>>>
> >>>>>>>> 1. Is there any API documentation?
> >>>>>>>
> >>>>>>> The API is the Process Manager Interface (PMI), which is the
> >>>>>>> interface
> >>>>>>> MPICH2 uses for interacting with process managers. There is some
> >>>>>>> documentation for it, which I could send you if you like (it may
> >>>>>>> not be 100%
> >>>>>>> up to date).
> >>>>>>>
> >>>>>>>> 2. Is there a C version of the client library?
> >>>>>>>
> >>>>>>> The PMI library is in C. It is implemented in src/pmi/simple/
> >>>>>>> simple_pmi.c.
> >>>>>>>
> >>>>>>>> 3. Is the mpd protocol documented anywhere?
> >>>>>>>
> >>>>>>> Not currently, but the plan is to :-).
> >>>>>>>
> >>>>>>>> 4. Is the protocol used by mpd the same as that used by smpd?
> >>>>>>>
> >>>>>>> No, they are different.
> >>>>>>>
> >>>>>>> Rajeev
> >>>>>
> >>>>> William Gropp
> >>>>> http://www.mcs.anl.gov/~gropp
> >>>>
> >>>> <paper.pdf>
> >>
>
More information about the mpich2-dev
mailing list