[Swift-devel] Initial tests of new (faster) trunk
Michael Wilde
wilde at mcs.anl.gov
Wed Jul 10 12:53:44 CDT 2013
Initial tests of the new trunk are working for me, but I'm seeing three odd things:
1. [Error] sites.beagle.coasters.xml:1:9: cvc-elt.1: Cannot find the declaration of element 'config'.
(as reported in the email below)
2. {env.HOME} is not interpreted in sites.xml as in prior versions.
Swift created a workdir named "$PWD/{env.HOME}/swiftwork"
3. Progress ticker lines seem to be defaulting to one per second, even with no status changing.
But so far so good - a first test script using the new code is running nicely on SGE.
- Mike
----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: swift-support at ci.uchicago.edu, "Michael Wilde" <wilde at mcs.anl.gov>
> Sent: Tuesday, February 19, 2013 2:46:14 PM
> Subject: Re: [Swift Support #22699] Fwd: [Swift-devel] First tests with swift faster
>
> Yeah. The validation fails. You can ignore it for now. I'll fix in
> the
> future. Basically there is code to validate the sites file against
> the
> XML schema, but it fails. It's not a fatal issue though, and parsing
> still happens.
>
> Mihael
>
> On Tue, 2013-02-19 at 14:22 -0600, David Kelly wrote:
> > I tried updating from svn and running with the added url tags:
> >
> >
> >
> > <config>
> >
> >
> > <pool handle="beagle">
> > <execution provider="coaster" jobmanager="local:pbs"
> > url="localhost"/>
> > <profile namespace="globus" key="jobsPerNode">1</profile>
> > <profile namespace="globus" key="lowOverAllocation">100</profile>
> > <profile namespace="globus" key="highOverAllocation">100</profile>
> > <profile namespace="globus"
> > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> > <profile namespace="globus" key="maxTime">4000</profile>
> > <profile namespace="globus" key="maxWallTime">00:05:00</profile>
> > <profile namespace="globus"
> > key="disableIdleBlockCleanup">true</profile>
> > <profile namespace="globus" key="slots">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> > <profile namespace="globus" key="queue">batch</profile>
> > <profile namespace="karajan" key="jobThrottle">8.00</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <filesystem provider="local" url="localhost" />
> > <workdirectory>/lustre/beagle/davidk</workdirectory>
> > </pool>
> >
> >
> > </config>
> >
> >
> > I am seeing this error:
> >
> >
> > [Error] sites.beagle.coasters.xml:1:9: cvc-elt.1: Cannot find the
> > declaration of element 'config'.
> >
> >
> > ----- Original Message -----
> >
> >
> > From: "Mike Wilde" <swift-support at ci.uchicago.edu>
> > Sent: Tuesday, February 19, 2013 1:50:15 PM
> > Subject: [Swift Support #22699] Fwd: [Swift-devel] First tests with
> > swift faster
> >
> >
> > Tue Feb 19 13:50:14 2013: Request 22699 was acted upon.
> > Transaction: Ticket created by wilde at mcs.anl.gov
> > Queue: swift-support
> > Subject: Fwd: [Swift-devel] First tests with swift faster
> > Owner: Nobody
> > Requestors: wilde at ci.uchicago.edu
> > Status: new
> > Ticket <URL:
> > https://rt.ci.uchicago.edu/Ticket/Display.html?id=22699 >
> >
> >
> >
> > David, Mihael, Yadu: could one of you try this on Beagle on the
> > faster branch?
> >
> > Does the faster branch include the PBS support for Beagle?
> >
> > It shouldnt be too hard to see what part of the PBS pool def it
> > doesnt like.
> >
> > - Mike
> >
> > ----- Forwarded Message -----
> > From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> > To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Tuesday, February 19, 2013 1:26:20 PM
> > Subject: [Swift-devel] First tests with swift faster
> >
> >
> > This is the content of the file where we have the first complaint
> > from swift (see attached):
> >
> >
> > <config>
> > <pool handle="pbs">
> > <execution provider="coaster" jobmanager="local:pbs"/>
> > <!-- replace with your project -->
> > <profile namespace="globus" key="project">CI-DEB000002</profile>
> >
> >
> > <profile namespace="globus"
> > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> >
> >
> >
> >
> > <profile namespace="globus" key="jobsPerNode">24</profile>
> > <profile namespace="globus" key="maxTime">172800</profile>
> > <profile namespace="globus" key="maxwalltime">0:10:00</profile>
> > <profile namespace="globus" key="lowOverallocation">100</profile>
> > <profile namespace="globus" key="highOverallocation">100</profile>
> >
> >
> > <profile namespace="globus" key="slots">200</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> >
> >
> > <profile namespace="karajan" key="jobThrottle">47.99</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> >
> >
> > <filesystem provider="local"/>
> > <!-- replace this with your home on lustre -->
> > <workdirectory>/lustre/beagle/samseaver/GS/swift.workdir</workdirectory>
> > </pool>
> > </config>
> >
> >
> > Any ideas?
> >
> >
> > Begin forwarded message:
> >
> >
> >
> > From: Sam Seaver < samseaver at gmail.com >
> >
> > Date: February 19, 2013 1:16:28 PM CST
> >
> > To: Lorenzo Pesce < lpesce at uchicago.edu >
> >
> > Subject: Re: How are things going?
> >
> >
> > I got this error. I suspect using the new SWIFT_HOME directory
> > means that there's possibly a missing parameter someplace:
> >
> >
> >
> > should we resume a previous calculation? [y/N] y
> > rlog files displayed in reverse time order
> > should I use GS-20130203-0717-jgeppt98.0.rlog ?[y/n]
> > y
> > Using GS-20130203-0717-jgeppt98.0.rlog
> > [Error] GS_sites.xml:1:9: cvc-elt.1: Cannot find the declaration of
> > element 'config'.
> >
> >
> > Execution failed:
> > Failed to parse site catalog
> > swift:siteCatalog @ scheduler.k, line: 31
> > Caused by: Invalid pool entry 'pbs':
> > swift:siteCatalog @ scheduler.k, line: 31
> > Caused by: java.lang.IllegalArgumentException: Missing URL
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.execution(SiteCatalog.java:173)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.pool(SiteCatalog.java:100)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.buildResources(SiteCatalog.java:60)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.function(SiteCatalog.java:48)
> > at
> > org.globus.cog.karajan.compiled.nodes.functions.AbstractFunction.runBody(AbstractFunction.java:38)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> > at
> > org.globus.cog.karajan.compiled.nodes.Import.runBody(Import.java:269)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> > at org.globus.cog.karajan.compiled.nodes.Main.run(Main.java:79)
> > at k.thr.LWThread.run(LWThread.java:243)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:722)
> >
> >
> >
> > On Tue, Feb 19, 2013 at 1:13 PM, Sam Seaver < samseaver at gmail.com >
> > wrote:
> >
> >
> >
> > OK, it got to the point where it really did hang. I'm retrying, but
> > with your suggestions. The other three finished fine!
> >
> >
> >
> > Progress: time: Tue, 19 Feb 2013 19:08:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:09:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:09:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:10:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:10:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:11:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> >
> >
> >
> >
> >
> > On Tue, Feb 19, 2013 at 8:51 AM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > Hmm...
> >
> >
> > foreach.max.threads=100
> >
> >
> > maybe you should increase this number a bit and see what happens.
> >
> >
> > Also, I would try to replace
> >
> >
> > SWIFT_HOME=/home/wilde/swift/rev/swift-r6151-cog-r3552
> >
> >
> > with
> >
> >
> > SWIFT_HOME=/soft/swift/fast
> >
> >
> > Keep me posted. Let's get this rolling.
> >
> >
> > if it doesn't work, I can redo the packing.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Feb 19, 2013, at 1:07 AM, Sam Seaver wrote:
> >
> >
> >
> > Actually, the ten agents job does seem to be stuck in a strange
> > loop. It is incrementing the number of jobs that has finished
> > successfully, and at a fast pace, but the number of jobs its
> > starting is decrementing much more slowly, its almost as its
> > repeatedly attempting the same set of parameters multiple times...
> >
> >
> > I'll see what it's doing in the morning
> > S
> >
> >
> >
> > On Tue, Feb 19, 2013 at 1:00 AM, Sam Seaver < samseaver at gmail.com >
> > wrote:
> >
> >
> >
> > Seems to have worked overall this time!
> >
> >
> > I resume four jobs, each were for a different number of agents
> > (10,100,1000,10000) it made it easier for me to decide on the app
> > time. Two of them have already finished i.e.:
> >
> >
> >
> > Progress: time: Mon, 18 Feb 2013 23:50:12 +0000 Active:4 Checking
> > status:1 Finished in previous run:148098 Finished
> > successfully:37897
> > Progress: time: Mon, 18 Feb 2013 23:50:15 +0000 Active:2 Checking
> > status:1 Finished in previous run:148098 Finished
> > successfully:37899
> > Final status: Mon, 18 Feb 2013 23:50:15 +0000 Finished in previous
> > run:148098 Finished successfully:37902
> >
> >
> > and the only one that is showing any failure (50/110000), is the
> > ten agents version which is so short I can understand why, but its
> > still actively trying to run jobs and is actively finishing jobs,
> > so that's good.
> >
> >
> > Yay!
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 18, 2013 at 1:09 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > Good. Keep me posted, I would really like to solve your problems in
> > running on Beagle this week, I wish that Swift would have been
> > friendlier.
> >
> >
> >
> >
> >
> > On Feb 18, 2013, at 1:01 PM, Sam Seaver wrote:
> >
> >
> >
> > I just resumed the jobs that I'd killed before the system went
> > down, lets see how it does. I always did a mini-review of the data
> > I've got and it seems to be working as expected.
> >
> >
> >
> > On Mon, Feb 18, 2013 at 12:28 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > I have lost track a bit of what's up. I am happy to try and go over
> > it with you when you are ready.
> >
> >
> > Some of the problems of swift might have improved with a new
> > version and the new system.
> >
> >
> >
> >
> >
> >
> >
> > On Feb 18, 2013, at 12:22 PM, Sam Seaver wrote:
> >
> >
> >
> > They're not, I've not looked since Beagle came back up. Will do so
> > later today.
> > S
> >
> >
> >
> > On Mon, Feb 18, 2013 at 12:20 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> > I tried updating from svn and running with the added url tags:
> >
> >
> > <config>
> >
> >
> > <pool handle="beagle">
> > <execution provider="coaster" jobmanager="local:pbs"
> > url="localhost"/>
> > <profile namespace="globus" key="jobsPerNode">1</profile>
> > <profile namespace="globus"
> > key="lowOverAllocation">100</profile>
> > <profile namespace="globus"
> > key="highOverAllocation">100</profile>
> > <profile namespace="globus"
> > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> > <profile namespace="globus" key="maxTime">4000</profile>
> > <profile namespace="globus"
> > key="maxWallTime">00:05:00</profile>
> > <profile namespace="globus"
> > key="disableIdleBlockCleanup">true</profile>
> > <profile namespace="globus" key="slots">1</profile>
> > <profile namespace="globus" key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> > <profile namespace="globus" key="queue">batch</profile>
> > <profile namespace="karajan" key="jobThrottle">8.00</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > <filesystem provider="local" url="localhost" />
> > <workdirectory>/lustre/beagle/davidk</workdirectory>
> > </pool>
> >
> >
> > </config>
> >
> >
> > I am seeing this error:
> >
> >
> > [Error] sites.beagle.coasters.xml:1:9: cvc-elt.1: Cannot find the
> > declaration of element 'config'.
> >
> >
> >
> >
> > ______________________________________________________________________
> > From: "Mike Wilde" <swift-support at ci.uchicago.edu>
> > Sent: Tuesday, February 19, 2013 1:50:15 PM
> > Subject: [Swift Support #22699] Fwd: [Swift-devel] First
> > tests
> > with swift faster
> >
> >
> > Tue Feb 19 13:50:14 2013: Request 22699 was acted upon.
> > Transaction: Ticket created by wilde at mcs.anl.gov
> > Queue: swift-support
> > Subject: Fwd: [Swift-devel] First tests with swift
> > faster
> > Owner: Nobody
> > Requestors: wilde at ci.uchicago.edu
> > Status: new
> > Ticket <URL:
> > https://rt.ci.uchicago.edu/Ticket/Display.html?id=22699 >
> >
> >
> >
> > David, Mihael, Yadu: could one of you try this on Beagle on
> > the faster branch?
> >
> > Does the faster branch include the PBS support for Beagle?
> >
> > It shouldnt be too hard to see what part of the PBS pool
> > def
> > it doesnt like.
> >
> > - Mike
> >
> > ----- Forwarded Message -----
> > From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> > To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Tuesday, February 19, 2013 1:26:20 PM
> > Subject: [Swift-devel] First tests with swift faster
> >
> >
> > This is the content of the file where we have the first
> > complaint from swift (see attached):
> >
> >
> > <config>
> > <pool handle="pbs">
> > <execution provider="coaster" jobmanager="local:pbs"/>
> > <!-- replace with your project -->
> > <profile namespace="globus"
> > key="project">CI-DEB000002</profile>
> >
> >
> > <profile namespace="globus"
> > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> >
> >
> >
> >
> > <profile namespace="globus" key="jobsPerNode">24</profile>
> > <profile namespace="globus" key="maxTime">172800</profile>
> > <profile namespace="globus"
> > key="maxwalltime">0:10:00</profile>
> > <profile namespace="globus"
> > key="lowOverallocation">100</profile>
> > <profile namespace="globus"
> > key="highOverallocation">100</profile>
> >
> >
> > <profile namespace="globus" key="slots">200</profile>
> > <profile namespace="globus"
> > key="nodeGranularity">1</profile>
> > <profile namespace="globus" key="maxNodes">1</profile>
> >
> >
> > <profile namespace="karajan"
> > key="jobThrottle">47.99</profile>
> > <profile namespace="karajan"
> > key="initialScore">10000</profile>
> >
> >
> > <filesystem provider="local"/>
> > <!-- replace this with your home on lustre -->
> > <workdirectory>/lustre/beagle/samseaver/GS/swift.workdir</workdirectory>
> > </pool>
> > </config>
> >
> >
> > Any ideas?
> >
> >
> > Begin forwarded message:
> >
> >
> >
> > From: Sam Seaver < samseaver at gmail.com >
> >
> > Date: February 19, 2013 1:16:28 PM CST
> >
> > To: Lorenzo Pesce < lpesce at uchicago.edu >
> >
> > Subject: Re: How are things going?
> >
> >
> > I got this error. I suspect using the new SWIFT_HOME
> > directory
> > means that there's possibly a missing parameter someplace:
> >
> >
> >
> > should we resume a previous calculation? [y/N] y
> > rlog files displayed in reverse time order
> > should I use GS-20130203-0717-jgeppt98.0.rlog ?[y/n]
> > y
> > Using GS-20130203-0717-jgeppt98.0.rlog
> > [Error] GS_sites.xml:1:9: cvc-elt.1: Cannot find the
> > declaration of element 'config'.
> >
> >
> > Execution failed:
> > Failed to parse site catalog
> > swift:siteCatalog @ scheduler.k, line: 31
> > Caused by: Invalid pool entry 'pbs':
> > swift:siteCatalog @ scheduler.k, line: 31
> > Caused by: java.lang.IllegalArgumentException: Missing URL
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.execution(SiteCatalog.java:173)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.pool(SiteCatalog.java:100)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.buildResources(SiteCatalog.java:60)
> > at
> > org.griphyn.vdl.karajan.lib.SiteCatalog.function(SiteCatalog.java:48)
> > at
> > org.globus.cog.karajan.compiled.nodes.functions.AbstractFunction.runBody(AbstractFunction.java:38)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> > at
> > org.globus.cog.karajan.compiled.nodes.Import.runBody(Import.java:269)
> > at
> > org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> > at
> > org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> > at
> > org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> > at
> > org.globus.cog.karajan.compiled.nodes.Main.run(Main.java:79)
> > at k.thr.LWThread.run(LWThread.java:243)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:722)
> >
> >
> >
> > On Tue, Feb 19, 2013 at 1:13 PM, Sam Seaver <
> > samseaver at gmail.com > wrote:
> >
> >
> >
> > OK, it got to the point where it really did hang. I'm
> > retrying, but with your suggestions. The other three
> > finished
> > fine!
> >
> >
> >
> > Progress: time: Tue, 19 Feb 2013 19:08:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:09:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:09:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:10:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:10:53 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> > Progress: time: Tue, 19 Feb 2013 19:11:23 +0000 Selecting
> > site:18147 Submitted:174 Active:96 Failed:2 Finished
> > successfully:132323 Failed but can retry:183
> >
> >
> >
> >
> >
> > On Tue, Feb 19, 2013 at 8:51 AM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > Hmm...
> >
> >
> > foreach.max.threads=100
> >
> >
> > maybe you should increase this number a bit and see what
> > happens.
> >
> >
> > Also, I would try to replace
> >
> >
> > SWIFT_HOME=/home/wilde/swift/rev/swift-r6151-cog-r3552
> >
> >
> > with
> >
> >
> > SWIFT_HOME=/soft/swift/fast
> >
> >
> > Keep me posted. Let's get this rolling.
> >
> >
> > if it doesn't work, I can redo the packing.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Feb 19, 2013, at 1:07 AM, Sam Seaver wrote:
> >
> >
> >
> > Actually, the ten agents job does seem to be stuck in a
> > strange loop. It is incrementing the number of jobs that
> > has
> > finished successfully, and at a fast pace, but the number
> > of
> > jobs its starting is decrementing much more slowly, its
> > almost
> > as its repeatedly attempting the same set of parameters
> > multiple times...
> >
> >
> > I'll see what it's doing in the morning
> > S
> >
> >
> >
> > On Tue, Feb 19, 2013 at 1:00 AM, Sam Seaver <
> > samseaver at gmail.com > wrote:
> >
> >
> >
> > Seems to have worked overall this time!
> >
> >
> > I resume four jobs, each were for a different number of
> > agents
> > (10,100,1000,10000) it made it easier for me to decide on
> > the
> > app time. Two of them have already finished i.e.:
> >
> >
> >
> > Progress: time: Mon, 18 Feb 2013 23:50:12 +0000 Active:4
> > Checking status:1 Finished in previous run:148098 Finished
> > successfully:37897
> > Progress: time: Mon, 18 Feb 2013 23:50:15 +0000 Active:2
> > Checking status:1 Finished in previous run:148098 Finished
> > successfully:37899
> > Final status: Mon, 18 Feb 2013 23:50:15 +0000 Finished in
> > previous run:148098 Finished successfully:37902
> >
> >
> > and the only one that is showing any failure (50/110000),
> > is
> > the ten agents version which is so short I can understand
> > why,
> > but its still actively trying to run jobs and is actively
> > finishing jobs, so that's good.
> >
> >
> > Yay!
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 18, 2013 at 1:09 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > Good. Keep me posted, I would really like to solve your
> > problems in running on Beagle this week, I wish that Swift
> > would have been friendlier.
> >
> >
> >
> >
> >
> > On Feb 18, 2013, at 1:01 PM, Sam Seaver wrote:
> >
> >
> >
> > I just resumed the jobs that I'd killed before the system
> > went
> > down, lets see how it does. I always did a mini-review of
> > the
> > data I've got and it seems to be working as expected.
> >
> >
> >
> > On Mon, Feb 18, 2013 at 12:28 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> > I have lost track a bit of what's up. I am happy to try and
> > go
> > over it with you when you are ready.
> >
> >
> > Some of the problems of swift might have improved with a
> > new
> > version and the new system.
> >
> >
> >
> >
> >
> >
> >
> > On Feb 18, 2013, at 12:22 PM, Sam Seaver wrote:
> >
> >
> >
> > They're not, I've not looked since Beagle came back up.
> > Will
> > do so later today.
> > S
> >
> >
> >
> > On Mon, Feb 18, 2013 at 12:20 PM, Lorenzo Pesce <
> > lpesce at uchicago.edu > wrote:
> >
> >
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> >
> >
> > --
> > Postdoctoral Fellow
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 S. Cass Avenue
> > Argonne, IL 60439
> >
> > http://www.linkedin.com/pub/sam-seaver/0/412/168
> > samseaver at gmail.com
> > (773) 796-7144
> >
> > "We shall not cease from exploration
> > And the end of all our exploring
> > Will be to arrive where we started
> > And know the place for the first time."
> > --T. S. Eliot
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
>
>
>
More information about the Swift-devel
mailing list