[Swift-user] Large Job not starting
Lorenzo Pesce
lpesce at uchicago.edu
Fri Jan 11 08:59:53 CST 2013
Hi,
We are working on a project which involves about 3 million tasks. We have run through 1,5 million tasks and we were resuming the job.
I have been seeing this for a while:
Progress: time: Thu, 10 Jan 2013 20:39:20 +0000 Selecting site:63831 Submitted:7171 Finished in previous run:1486037
...
Progress: time: Fri, 11 Jan 2013 14:50:21 +0000 Selecting site:63831 Submitted:7171 Finished in previous run:1486037
from the ps command:
lpesce 28172 28102 19 Jan10 pts/4 04:20:32 java -Xmx12072M -XX:+HeapDumpOnOutOfMemoryError -Djava.endorsed.dirs=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/endorsed -DUID=1978 -DGLOBUS_HOSTNAME=login5.beagle.ci.uchicago.edu -DCOG_INSTALL_PATH=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/.. -Dswift.home=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/.. -Duser.home=/lustre/beagle/lpesce -Djava.security.egd=file:///dev/urandom -XX:+UseParallelGC -XX:ParallelGCThreads=2 -classpath /home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../etc:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../libexec:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/ant.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/antlr-2.7.5.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/castor-0.9.6.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/coaster-bootstrap.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-abstraction-common-2.4.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-grapheditor-0.47.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-jglobus-1.7.0.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-karajan-0.36-dev.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-coaster-0.3.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-dcache-0.1.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-gt2-2.4.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-local-2.2.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-localscheduler-0.4.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-ssh-2.4.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-provider-webdav-2.1.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-resources-1.0.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-swift-svn.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cog-util-0.92.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/commons-httpclient.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/commons-logging-1.1.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cryptix32.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cryptix-asn1.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/cryptix.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/j2ssh-common-0.2.2.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/j2ssh-core-0.2.2-patch-b.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jakarta-regexp-1.2.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jakarta-slide-webdavlib-2.0.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jaxrpc.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jce-jdk13-131.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jgss.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jline-0.9.94.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jsr173_1.0_api.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/jug-lgpl-2.0.0.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/junit.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/log4j-1.2.16.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/puretls.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/resolver.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn/bin/../lib/stringtemplate.jar:/home/davidk/swift-trunk/cog/modules/swift/dist/swift
lpesce at login5:/lustre/beagle/GCNet/RG/Oreo/o080522_BS1> ps v 28172
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
28172 pts/4 Sl+ 260:32 84 2 12868101 11612816 70.2 java -Xmx12072M -XX:+HeapDumpOnOutOfMemoryError
Job seems to be using zero cpu at this time.
It has no jobs in the queue
lpesce at login5:/lustre/beagle/GCNet/RG/Oreo/o080522_BS1> qstat -u lpesce
sdb:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1919801.sdb lpesce advanced B0109-030545-00 6144 -- -- -- 117:4 R 45:54
1919802.sdb lpesce advanced B0109-030545-00 2868 -- -- -- 117:4 R 45:53
1919806.sdb lpesce advanced B0109-080540-00 27222 -- -- -- 117:3 R 45:49
1919807.sdb lpesce advanced B0109-080540-00 6609 -- -- -- 117:3 R 45:49
1919808.sdb lpesce advanced B0109-080540-00 3328 -- -- -- 117:3 R 45:48
(Unrelted jobs, which have been running for more than a day)
Suggestions?
Thanks,
Lorenzo
More information about the Swift-user
mailing list