[Swift-devel] localscheduler (condor/ condorg) breaking on lots of condor jobs
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Jul 28 15:00:44 CDT 2010
Ah, only 1024 files. That's why.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 122880
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 122880
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
On Wed, Jul 28, 2010 at 02:48:36PM -0500, Mihael Hategan wrote:
> Yeah. That's why the provider should be updated to use job logs instead
> of condor_qstat/condor_qedit for figuring out status.
>
> That or update limits (and, btw, what does ulimit -a say on that
> machine)?
>
> On Wed, 2010-07-28 at 14:34 -0500, Allan Espinosa wrote:
> > Hi,
> >
> > it seems that when there's too many submitted condor jobs, the submit host will
> > start to complain if it opens too many log, stderr, and stdout files:
> >
> > 330 Finished successfully:162 Failed but can retry:927
> > Failed to transfer wrapper log from sleep-LGU-estimate/info/x on USCMS-FNAL-WC1
> > Progress: Initializing site shared directory:1 Stage in:2 Submitted:1332
> > Active:245 Failed:331 Finished successfully:162 Failed but can retry:928
> > Progress:Failed to cancel job 57445
> > java.io.IOException: Cannot run program "condor_qedit": java.io.IOException:
> > error=24, Too many open files
> > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> > at java.lang.Runtime.exec(Runtime.java:593)
More information about the Swift-devel
mailing list