[Swift-devel] Re: Probing running jobs

Michael Wilde wilde at mcs.anl.gov
Wed Dec 15 12:22:17 CST 2010


Mihael, I never tried this feature but have a good use for it now for Swift-R debugging.

Is the console monitor you refer to below the -tui or -monitor?

Do you know if this feature is currently working in trunk?

In the same vein, I think that the -tui option was broken last I tried it in trunk (I think it was just silent: nothing showed up on the screen).  Whoever has a chance to try both of these interfaces next, can you report back to the list if they worked or failed for you?

Thanks,

- Mike


----- Original Message -----
> On Fri, 2009-04-03 at 08:38 -0500, Michael Wilde wrote:
> > Following up on Mihael's question about a feature I listed in the
> > to-do
> > list I proposed for coasters:
> >
> > On 4/2/09 11:17 PM, Mihael Hategan wrote:
> > > On Thu, 2009-04-02 at 21:01 -0500, Michael Wilde wrote:
> > >>>> - some way to probe a job thats running on a coaster?
> > >>> Define "probe".
> > >> - ps -f on the running process.
> > >> - probe its resource usage (/proc, also ps, etc)
> > >> - ls -lR of its jobdir (as these will more often be on /tmp)
> > >>
> > >> We have these needs today; on the BGP under falkon we manually
> > >> login to
> > >> the node, but thats cumbersome: hard to find the node; 2-stage
> > >> login
> > >> process.
> > >>
> > >> Low prio, a pipe dream. But theoretically do-able.
> > >
> > > It should be possible (and somewhat interesting) to have a simple
> > > shell
> > > that can execute stuff on the workers while the job is running, so
> > > that
> > > you can issue your own commands.
> > >
> > > The question of how to find the right worker remains. Can you go a
> > > bit
> > > deeper into the details? How do you find the node currently (be as
> > > specific as you can be)?
> >
> > In the oops workflow, I recall these cases at the moment:
> >
> > 1) Have my (large set of similar) jobs started?
> >
> > 2) Most jobs have finished. Are the remaining ones hung, or
> > proceeding
> > normally but slower for some application- or data-specific reason?
> [...]
> 
> In swift r2821 cog r2365 (I think), there is such a feature.
> 
> If you start with the console monitor, you can go to the list of jobs.
> Then select desired job, and push enter to display a detail pane. If
> the
> job is in the active state and if it's running on a coaster worker,
> that
> detail pane will have an extra button named "Worker Terminal".
> Pressing
> that will pop up a simple terminal that can be used to run relatively
> arbitrary commands on the worker that the job is running on.
> 
> It won't run commands that require console input (e.g., vi), so don't
> try.
> 
> It won't start you in the job directory, but the swift workflow
> directory. That's because at some point we stopped using the GRAM
> directory attribute for setting the initial job dir because some silly
> site on OSG doesn't honor it. I think we should revisit the issue (I
> suspect there is a solution that works in both cases).

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list