[Swift-devel] Null pointer exception in scheduling?
Michael Wilde
wilde at mcs.anl.gov
Sun Mar 22 23:21:29 CDT 2009
I just saw this one for what I think is the first time:
Progress: uninitialized:1 Finished successfully:2
Progress: Initializing:999 Selecting site:1 Finished successfully:2
Progress: Selecting site:999 Finished successfully:2 Initializing site
shared directory:1
Progress: Stage in:999 Submitting:1 Finished successfully:2
Exception caught while processing event
java.lang.NullPointerException
at
org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:609)
at
org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:421)
at
org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
at
org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
at
org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
at
org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.setTaskStatus(CachingDelegatedFileOperationHandler.java:68)
at
org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:42)
at
org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28)
at
org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:86)
at
edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431)
at
edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643)
at
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668)
at java.lang.Thread.run(Thread.java:801)
Progress: Stage in:867 Submitting:133 Finished successfully:2
Progress: Stage in:256 Submitting:742 Submitted:2 Finished successfully:2
The script ran almost to the end; looks like 1 job is hung in "stage in"
and thus the final analysis job didnt run.
This was my first run with the new wait logic at a scale of 1000 jobs.
It ran ok (once) at 500 jobs, as Ive been scaling up the testing.
I'll see if this is reproducible. Not sure if its related to the wait
logic or not.
The log is at:
www.ci.uchicago.edu/~wilde/oops-20090322-2312-ubvg3su6.log
More information about the Swift-devel
mailing list