[Swift-devel] slow swift startup time
Michael Wilde
wilde at mcs.anl.gov
Mon Nov 5 22:01:08 CST 2007
Like I said when I sent it - I wasnt complaining about it as an error; I
was wondering if the *fact* that its requesting a lock could be causing
a delay. (Eg, trying to lock, sleeping, failing, and then continuing).
I was just trying to look for an explanation of why startup is slow.
- Mike
On 11/5/07 8:45 PM, Mihael Hategan wrote:
> It's a warning. warning != error.
>
> On Mon, 2007-11-05 at 18:56 -0600, Michael Wilde wrote:
>> The lock error I was referring to is this, on stdout/err:
>>
>> Failed to acquire exclusive lock on log file.
>>
>> Below is the log file text that accompanied it.
>>
>> Note that Im not complaining about this error - it went away when I
>> started compiling on terminable again.
>>
>> Im just pointing it out as a suspect in the slow startup. It surprised
>> me that we bother to lock the logfile, unless Java is gratuituously doig
>> it for us.
>>
>> - Mike
>>
>> Logfile head showed:
>>
>> 2007-11-04 10:15:28,949-0600 INFO Loader awf3.swift: source file is
>> new. Recompiling.
>> 2007-11-04 10:15:31,315-0600 INFO Karajan Validation of XML
>> intermediate file was successful
>> 2007-11-04 10:15:34,597-0600 INFO unknown Using sites file: ./sites.xml
>> 2007-11-04 10:15:34,599-0600 INFO unknown Using tc.data: ./tc.data
>> 2007-11-04 10:15:36,606-0600 INFO unknown Swift v0.3-dev
>> libexec/svn-revision: line 1: svn: command not found
>> libexec/svn-revision: line 1: svn: command not found
>>
>>
>> 2007-11-04 10:15:36,607-0600 INFO unknown RunID: 20071104-1015-afgc18i3
>> 2007-11-04 10:15:36,825-0600 INFO AbstractDataNode Found data
>> pcapfiles.$[]/1.[0]
>> 2007-11-04 10:15:36,825-0600 INFO AbstractDataNode Found data
>> pcapfiles.$[]/2.[1]
>> 2007-11-04 10:15:36,825-0600 INFO AbstractDataNode Found data
>> pcapfiles.$[]/3.[2]
>> 2007-11-04 10:15:36,825-0600 INFO AbstractDataNode Found data
>> pcapfiles.$[]/4.[3]
>> 2007-11-04 10:15:36,825-0600 INFO AbstractDataNode Found data
>> pcapfiles.$[]/5.[4]
>> 2007-11-04 10:16:26,917-0600 INFO FlushableLockedFileWriter Could not
>> acquire lock on /home/wilde/angle/data/./awf3-20071104-1015
>> -afgc18i3.0.rlog
>> java.io.IOException: No locks available
>> at sun.nio.ch.FileChannelImpl.lock0(Native Method)
>> at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:804)
>> at java.nio.channels.FileChannel.tryLock(FileChannel.java:983)
>> at
>> org.globus.cog.karajan.workflow.nodes.restartLog.FlushableLockedFileWriter.<init>(FlushableLockedFileWriter.java:26)
>> at
>> org.globus.cog.karajan.workflow.nodes.restartLog.RestartLog.create(RestartLog.java:123)
>> at
>> org.globus.cog.karajan.workflow.nodes.restartLog.RestartLog.partialArgumentsEvaluated(RestartLog.java:55)
>> at
>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.executeChildren(PartialArgumentsContainer.java:51)
>> at
>> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java(Inlined
>> Compiled Code))
>> at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled
>> Code))
>> at
>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Inlined
>> Compiled Code))
>> at
>> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java(Compiled
>> Code))
>> 2007-11-04 10:16:26,921-0600 WARN RestartLog Failed to acquire
>> exclusive lock on log file.
>> 2007-11-04 10:16:26,999-0600 INFO vdl:execute START thread=0-4 tr=angle4
>> 2007-11-04 10:16:26,999-0600 INFO vdl:execute START thread=0-1 tr=angle4
>> 2007-11-04 10:16:26,999-0600 INFO vdl:execute START thread=0-5 tr=angle4
>> 2007-11-04 10:16:27,000-0600 INFO vdl:execute START thread=0-3 tr=angle4
>> 2007-11-04 10:16:27,001-0600 INFO vdl:execute START thread=0-2 tr=angle4
>>
>>
>>
>> On 11/5/07 6:46 PM, Michael Wilde wrote:
>>> Im running on tg-login.uc.teragrid.org (IA64) and seeing very long swift
>>> start times.
>>>
>>> My swift command wrapper prints the wf start and end times with the
>>> swift stdout sandwiched in between. Here's an example of those,
>>> followed by the swift log file. In this run, i start swift in the
>>> background, then tail the stdout file. it was about 70 seconds (on my
>>> watch) before swift responded with its initial messages on stdout. (I
>>> dont think its being buffered, but thats worth checking...)
>>>
>>> Note that swift was launched at 18:30:49 and its logfile entry with the
>>> runid came at 18:32:05. 32:05-30:49 = 76 seconds!
>>>
>>> This was swift 1456 compiled on terminable (or login, i forget).
>>>
>>> Suspicious: when I was running a version compiled in tg-login under Java
>>> 1.4 I would get an error message from a Java method trying to lock the
>>> log file. Not sure if this logging action (which now does not give a
>>> message) is related to this slow start time.
>>>
>>> - Mike
>>>
>>> UC64$ cat swift.out
>>> Swift Script local-noks.swift starting at Mon Nov 5 18:30:49 CST 2007
>>>
>>> Swift v0.3-dev r1456
>>>
>>> RunID: 20071105-1831-d7t5l2n3
>>> angle4 started
>>> angle4 started
>>> angle4 started
>>> angle4 started
>>> angle4 started
>>> angle4 completed
>>> angle4 completed
>>> angle4 completed
>>> angle4 completed
>>> angle4 completed
>>>
>>> Swift Script local-noks.swift ended at Mon Nov 5 18:32:34 CST 2007 with
>>> exit code 0
>>>
>>>
>>> UC64$ head awf*.log
>>> 2007-11-05 18:31:09,566-0600 INFO Loader awf6.swift: source file is
>>> new. Recompiling.
>>> 2007-11-05 18:31:30,454-0600 INFO Karajan Validation of XML
>>> intermediate file was successful
>>> 2007-11-05 18:31:55,465-0600 INFO unknown Using sites file: ./sites.xml
>>> 2007-11-05 18:31:55,466-0600 INFO unknown Using tc.data: ./tc.data
>>> 2007-11-05 18:32:05,518-0600 INFO unknown Swift v0.3-dev r1456
>>>
>>> 2007-11-05 18:32:05,520-0600 INFO unknown RunID: 20071105-1831-d7t5l2n3
>>> 2007-11-05 18:32:06,701-0600 INFO AbstractDataNode Found data
>>> pcapfiles.$[]/1.[0]
>>> 2007-11-05 18:32:06,702-0600 INFO AbstractDataNode Found data
>>> pcapfiles.$[]/2.[1]
>>> 2007-11-05 18:32:06,702-0600 INFO AbstractDataNode Found data
>>> pcapfiles.$[]/3.[2]
>>> UC64$
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
>
More information about the Swift-devel
mailing list