[Swift-devel] Re: osg question: how to find sites' health

Michael Wilde wilde at mcs.anl.gov
Mon May 2 18:29:46 CDT 2011


Ketan, lets discuss this more tomorrow and report our progress back to the list.

I think "health" is a hard term to define for grid sites. At any given time, each service on each site is either working or not.

- the sites file builder can do various checks
- the checks need to be done under the user's cert to be meaningful
- swift needs to recover from what doesnt get caught by the sites file builder
- clean reporting of errors helps OSG site admins catch and fix problems

I'd like to see Allan's tools merged/extended with a few others, packaged with Swift and documented and tested for Swift users.

Mike


----- Original Message -----
> Hi Ketan,
> 
> Most of the time i just query the ReSS condor pool (condor_status
> -pool engage-central.renci.org) the look for the following classads:
> 
> GlueCEInfoTotalCPUs
> GlueCEInfo*Jobs* <= jobs running, total acceptable jobs, free cores,
> etc.
> 
> The OSG monitoring webpages (gratia, rsv) also has related
> information.
> 
> 2011/5/2 Ketan Maheshwari <ketan at mcs.anl.gov>:
> > Hi Allan,
> >
> > I am trying to reuse your work on OSG that you did for extenci. So
> > am using your scripts from allantools/..
> >
> > A quick question about OSG: How do you find the health of
> > participating sites?
> >
> > On EGI we have something called "lcg-infosites" series of commands
> > that do this.
> >
> > Thanks,
> > Ketan
> >
> >
> >
> 
> 
> 
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list