[Swift-devel] Re: Probing running jobs

Mihael Hategan hategan at mcs.anl.gov
Sat Apr 4 16:34:32 CDT 2009


On Fri, 2009-04-03 at 08:38 -0500, Michael Wilde wrote:
> Following up on Mihael's question about a feature I listed in the to-do 
> list I proposed for coasters:
> 
> On 4/2/09 11:17 PM, Mihael Hategan wrote:
> > On Thu, 2009-04-02 at 21:01 -0500, Michael Wilde wrote:
> >>>> - some way to probe a job thats running on a coaster?
> >>> Define "probe".
> >> - ps -f on the running process.
> >> - probe its resource usage (/proc, also ps, etc)
> >> - ls -lR of its jobdir (as these will more often be on /tmp)
> >>
> >> We have these needs today; on the BGP under falkon we manually login to 
> >> the node, but thats cumbersome: hard to find the node; 2-stage login 
> >> process.
> >>
> >> Low prio, a pipe dream. But theoretically do-able.
> > 
> > It should be possible (and somewhat interesting) to have a simple shell
> > that can execute stuff on the workers while the job is running, so that
> > you can issue your own commands.
> > 
> > The question of how to find the right worker remains. Can you go a bit
> > deeper into the details? How do you find the node currently (be as
> > specific as you can be)?
> 
> In the oops workflow, I recall these cases at the moment:
> 
> 1) Have my (large set of similar) jobs started?
> 
> 2) Most jobs have finished. Are the remaining ones hung, or proceeding 
> normally but slower for some application- or data-specific reason?
[...]

In swift r2821 cog r2365 (I think), there is such a feature.

If you start with the console monitor, you can go to the list of jobs.
Then select desired job, and push enter to display a detail pane. If the
job is in the active state and if it's running on a coaster worker, that
detail pane will have an extra button named "Worker Terminal". Pressing
that will pop up a simple terminal that can be used to run relatively
arbitrary commands on the worker that the job is running on.

It won't run commands that require console input (e.g., vi), so don't
try.

It won't start you in the job directory, but the swift workflow
directory. That's because at some point we stopped using the GRAM
directory attribute for setting the initial job dir because some silly
site on OSG doesn't honor it. I think we should revisit the issue (I
suspect there is a solution that works in both cases).




More information about the Swift-devel mailing list