[Swift-devel] localscheduler (condor/ condorg) breaking on lots of condor jobs
Mihael Hategan
hategan at mcs.anl.gov
Mon Aug 23 18:31:04 CDT 2010
On Mon, 2010-08-23 at 17:25 -0600, Michael Wilde wrote:
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
>
> > Yeah. That's why the provider should be updated to use job logs
> > instead
> > of condor_qstat/condor_qedit for figuring out status.
>
> Is that easy or hard?
Should be doable in a week or two by somebody who has some experience
with providers and some with condor. That includes testing. And then a
few more scattered hours due to subtleties that weren't obvious from the
start.
I might already have some code that I never committed. If somebody wants
to clean it/test it, I'd be happy to send it.
>
> For such an approach should we make all the submit files specify a single per-user condorg user log file?
Yes. You would want that for scalability reasons. From my limited
testing, condor seems to properly handle that situation.
>
> > That or update limits (and, btw, what does ulimit -a say on that
> > machine)?
>
> Ive asked for the limit to be changed from 1024 to 20,000 - thats what engage-submit on OSG is using.
Mmm, decimal...
More information about the Swift-devel
mailing list