[Swift-devel] Re: Probing running jobs

Michael Wilde wilde at mcs.anl.gov
Sat Apr 4 17:01:53 CDT 2009


Wow!  Way cool - I cant wait to try this and the monitor.
But need to clone myself.

Maybe Glen, you can try this on oops tests...

- Mike


On 4/4/09 4:34 PM, Mihael Hategan wrote:
> On Fri, 2009-04-03 at 08:38 -0500, Michael Wilde wrote:
>> Following up on Mihael's question about a feature I listed in the to-do 
>> list I proposed for coasters:
>>
>> On 4/2/09 11:17 PM, Mihael Hategan wrote:
>>> On Thu, 2009-04-02 at 21:01 -0500, Michael Wilde wrote:
>>>>>> - some way to probe a job thats running on a coaster?
>>>>> Define "probe".
>>>> - ps -f on the running process.
>>>> - probe its resource usage (/proc, also ps, etc)
>>>> - ls -lR of its jobdir (as these will more often be on /tmp)
>>>>
>>>> We have these needs today; on the BGP under falkon we manually login to 
>>>> the node, but thats cumbersome: hard to find the node; 2-stage login 
>>>> process.
>>>>
>>>> Low prio, a pipe dream. But theoretically do-able.
>>> It should be possible (and somewhat interesting) to have a simple shell
>>> that can execute stuff on the workers while the job is running, so that
>>> you can issue your own commands.
>>>
>>> The question of how to find the right worker remains. Can you go a bit
>>> deeper into the details? How do you find the node currently (be as
>>> specific as you can be)?
>> In the oops workflow, I recall these cases at the moment:
>>
>> 1) Have my (large set of similar) jobs started?
>>
>> 2) Most jobs have finished. Are the remaining ones hung, or proceeding 
>> normally but slower for some application- or data-specific reason?
> [...]
> 
> In swift r2821 cog r2365 (I think), there is such a feature.
> 
> If you start with the console monitor, you can go to the list of jobs.
> Then select desired job, and push enter to display a detail pane. If the
> job is in the active state and if it's running on a coaster worker, that
> detail pane will have an extra button named "Worker Terminal". Pressing
> that will pop up a simple terminal that can be used to run relatively
> arbitrary commands on the worker that the job is running on.
> 
> It won't run commands that require console input (e.g., vi), so don't
> try.
> 
> It won't start you in the job directory, but the swift workflow
> directory. That's because at some point we stopped using the GRAM
> directory attribute for setting the initial job dir because some silly
> site on OSG doesn't honor it. I think we should revisit the issue (I
> suspect there is a solution that works in both cases).
> 



More information about the Swift-devel mailing list