[Swift-devel] localscheduler (condor/ condorg) breaking on lots of condor jobs

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 23 18:31:04 CDT 2010


On Mon, 2010-08-23 at 17:25 -0600, Michael Wilde wrote:
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> 
> > Yeah. That's why the provider should be updated to use job logs
> > instead
> > of condor_qstat/condor_qedit for figuring out status.
> 
> Is that easy or hard?

Should be doable in a week or two by somebody who has some experience
with providers and some with condor. That includes testing. And then a
few more scattered hours due to subtleties that weren't obvious from the
start.

I might already have some code that I never committed. If somebody wants
to clean it/test it, I'd be happy to send it.

> 
> For such an approach should we make all the submit files specify a single per-user condorg user log file?

Yes. You would want that for scalability reasons. From my limited
testing, condor seems to properly handle that situation.

> 
> > That or update limits (and, btw, what does ulimit -a say on that
> > machine)?
> 
> Ive asked for the limit to be changed from 1024 to 20,000 - thats what engage-submit on OSG is using.

Mmm, decimal...





More information about the Swift-devel mailing list