[MPICH2-dev] Eclipse PTP support for MPICH2

Greg Watson gwatson at lanl.gov
Wed Jun 7 12:56:49 CDT 2006


Ok, I've got a prototype working against mpd. I've written a proxy  
that lets Eclipse/PTP query the mpd for available hosts and load  
these into it's model so you get a view of the cluster. You can then  
launch an MPI job, see the job status reflected in the views, see  
stdout from each process, and terminate a job. The only assumption is  
that the mpd's have already been started before you launch Eclipse.

I'd like to try and get the debugger working next. The debugger is an  
MPI program itself and the current (non-attach) debug startup works  
as follows:

1. Request a process allocation for the debugger (n+1 procs) and  
obtain a job id from the runtime.

2. Request a process allocation for the target program (n procs) and  
obtain a job id from the runtime.

3. Using the debugger job id, request the runtime launch the debugger.

4. Once the debugger has started (MPI_Init has completed), set the  
following in the environment of each process:

- the job id of the target allocation
- the task id for each process in the target allocation
- the total number of processes in the target allocation

5. The debugger then forks/execs the target executable, which is now  
under it's control.

6. The target eventually calls MPI_Init and completes the  
initialization using the information from the environment variables.

This process requires support from the runtime, and assumes that: (a)  
the runtime supports the separation of allocation and launch, and (b)  
that MPI initialization can be completed using values taken from the  
environment.

Would anyone be able to comment if (a) is currently feasible with mpd  
(maybe requiring modification) and if (b) is supported by PMI and  
what environment variables are necessary?

Thanks,

Greg


On May 18, 2006, at 12:12 PM, Rusty Lusk wrote:

> Hi Greg,
>
>      Now that I see what you need, I have a different answer.  At  
> least
> most of what you need has been implemented and documented as part  
> of the
> mpd package, which includes other commands besides mpiexec, such as
> mpdlistjobs, mpdsigjob, and mpdkilljob.  Other parts may be  
> available as
> effects of doing something to the mpiexec process, like suspending it,
> continuing it, or killing it.  (These were originally implemented for
> the purpose of interactive control, but should do what you want if you
> deliver the appropriate signals to mpiexec.)  For at least one  
> level of
> documentation, once you have installed mpd, you can do
>
>     mpdhelp
>
> to get a list of the mpd commands, and then
>
>     <mpdcmd> --help
>
> to get a description of how to use each command.  More information  
> is in
> the MPICH2 Installer's Guide and User's Guide, in the doc subdirectory
> of mpich2.
>
> What these commands do (including mpiexec) is contact the locally
> running mpd and talk to it via messages consisting of python
> dictionaries.  Yes, you could write your own program to generate these
> messages, but I would hope that we have already implemented much of  
> what
> you need, and we would be interested in implementing the rest of what
> you need in collaboration with you.
>
> Regards,
> Rusty
>
> P.S.  I am about to invite you to a workshop at Oak Ridge on July  
> 12-14
> at Oak Ridge.  Mark your calendar.. :-)
>
>
>
>
>
> From: Greg Watson <gwatson at lanl.gov>
> Subject: Re: [MPICH2-dev] mpd client library and protocol?
> Date: Thu, 18 May 2006 11:15:30 -0600
>
>> Rajeev,
>>
>> Sorry for not being clearer.  Yes, I need to be able to control an
>> MPI program, not implement MPI or a process manager. I'm exploring
>> the possibility of using Eclipse (via the Parallel Tools Platform) to
>> manage the launch and control of MPI programs using MPICH2. The
>> architecture requires an interface between PTP (Java) and the runtime
>> system (in this case the MPICH2 process manager) that supports a few
>> basic commands (including RUN, TERMINATE, GETJOBS, GETPROCESSES,
>> etc.) and responds to certain events, such as process termination.
>>
>> Unless you can suggest a better approach, I'm thinking of writing a
>> python program that will provide this interface to mpd. I'd prefer to
>> do it in Java or C as I'll have to re-implement a bunch of stuff in
>> python, but because of the way you serialize python objects it
>> doesn't look possible the use a non-python program to communicate
>> with mpd.
>>
>> If you have any documentation that would assist, it would be
>> appreciated.
>>
>> Regards,
>>
>> Greg
>>
>> On May 17, 2006, at 7:46 PM, Rajeev Thakur wrote:
>>
>>> Greg,
>>>      What exactly are you trying to do? That might help us figure
>>> out what
>>> might be your best option. I am attaching the document describing
>>> the PMI
>>> interface. We use use PMI to *implement* MPI. You seem to want to
>>> control
>>> MPI program. Is that right?
>>>
>>> Rajeev
>>>
>>>
>>>
>>> On Wed, 17 May 2006, Greg Watson wrote:
>>>
>>>> Bill,
>>>>
>>>> I'm not sure, since I still don't really understand the  
>>>> architecture.
>>>> Can I use PMI to launch and control an MPI program on a cluster? Or
>>>> is that something that will be available in the future? I would
>>>> rather not have to provide a different program for each process
>>>> manager, but cluster support is also essential.
>>>>
>>>> Any information or documentation you can provide on the  
>>>> architecture
>>>> and APIs would be appreciated.
>>>>
>>>> Greg
>>>>
>>>>
>>>> On May 16, 2006, at 10:21 PM, William Gropp wrote:
>>>>
>>>>> At 11:15 PM 5/16/2006, Greg Watson wrote:
>>>>>> Rajeev,
>>>>>>
>>>>>> Many thanks for your reply. Can you suggest the best approach  
>>>>>> if I
>>>>>> want to write a C program to control mpd? At a minimum, I'd  
>>>>>> like to
>>>>>> be able to spawn/terminate an MPI job using a C program. Is PMI
>>>>>> what
>>>>>> I'd use to do this?
>>>>>>
>>>>>> Any documentation you could provide would be appreciated.
>>>>>
>>>>> An alternative is to not use MPD at all and to use the PMI
>>>>> interface.  A C example of this is the "gforker" process manager;
>>>>> this is built using a set of utility routines in mpich2/src/pm/ 
>>>>> util
>>>>> that provide the "other" side of the simple PMI interface.   
>>>>> gforker
>>>>> implements all of the PM functions, including spawning MPI jobs.
>>>>> Let me know if this is the direction in which you are interested.
>>>>>
>>>>> Bill
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On May 16, 2006, at 7:56 PM, Rajeev Thakur wrote:
>>>>>>
>>>>>>> Greg,
>>>>>>>
>>>>>>>> I assume that mpdlib.py is a client library that other
>>>>>>>> applications
>>>>>>>> (i.e. other than mpiexec) could potentially use to communicate
>>>>>>>> with
>>>>>>>> and/or control mpd.
>>>>>>>>
>>>>>>>> 1. Is there any API documentation?
>>>>>>>
>>>>>>> The API is the Process Manager Interface (PMI), which is the
>>>>>>> interface
>>>>>>> MPICH2 uses for interacting with process managers. There is some
>>>>>>> documentation for it, which I could send you if you like (it may
>>>>>>> not be 100%
>>>>>>> up to date).
>>>>>>>
>>>>>>>> 2. Is there a C version of the client library?
>>>>>>>
>>>>>>> The PMI library is in C. It is implemented in src/pmi/simple/
>>>>>>> simple_pmi.c.
>>>>>>>
>>>>>>>> 3. Is the mpd protocol documented anywhere?
>>>>>>>
>>>>>>> Not currently, but the plan is to :-).
>>>>>>>
>>>>>>>> 4. Is the protocol used by mpd the same as that used by smpd?
>>>>>>>
>>>>>>> No, they are different.
>>>>>>>
>>>>>>> Rajeev
>>>>>
>>>>> William Gropp
>>>>> http://www.mcs.anl.gov/~gropp
>>>>
>>>> <paper.pdf>
>>




More information about the mpich2-dev mailing list