[Swift-devel] First tests with swift faster

Mihael Hategan hategan at mcs.anl.gov
Wed Feb 20 14:55:12 CST 2013


Same idea, except with the filesystem tag. It needs a URL.

Also, the job manager attribute should be "jobManager", not jobmanager.

I'm working on proper validation of the sites file so that these issues
are clearly stated and checked.

Mihael

On Wed, 2013-02-20 at 14:17 -0600, Lorenzo Pesce wrote:
> Trying with a different project. I have no idea here either. 
> 
> Could not initialize shared directory on pbs
>         exception @ swift-int.k, line: 380
> Caused by: null
> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: java.lang.NullPointerException
> Caused by: java.lang.NullPointerException
>         at org.globus.cog.abstraction.impl.common.task.ServiceImpl.hashCode(ServiceImpl.java:171)
>         at java.util.HashMap.getEntry(HashMap.java:344)
>         at java.util.HashMap.containsKey(HashMap.java:335)
>         at org.globus.cog.abstraction.impl.file.FileResourceCache.getResource(FileResourceCache.java:77)
>         at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.getResource(CachingDelegatedFileOperationHandler.java:75)
>         at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:40)
>         at org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28)
>         at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:113)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 
>         parallelFor @ swift.k, line: 206
> 
> Thanks!
> 
> Lorenzo
> 
> On Feb 19, 2013, at 1:52 PM, Mihael Hategan wrote:
> 
> > The way the site catalog is parsed has changed. I committed a fix in
> > svn. A workaround is to make sure you specify a url:
> > 
> > ...
> > <execution provider="coaster" jobmanager="local:pbs" url="localhost"/>
> > ...
> > <filesystem provider="local" url="localhost"/>
> > ...
> > 
> > The URL is actually required for the coaster provider.
> > 
> > Mihael
> > 
> > On Tue, 2013-02-19 at 13:26 -0600, Lorenzo Pesce wrote:
> >> This is the content of the file where we have the first complaint from swift (see attached): 
> >> <config>
> >>  <pool handle="pbs">
> >>    <execution provider="coaster" jobmanager="local:pbs"/>
> >>    <!-- replace with your project -->
> >>    <profile namespace="globus" key="project">CI-DEB000002</profile>
> >> 
> >>    <profile namespace="globus" key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> >> 
> >> 
> >>    <profile namespace="globus" key="jobsPerNode">24</profile>
> >>    <profile namespace="globus" key="maxTime">172800</profile>
> >>    <profile namespace="globus" key="maxwalltime">0:10:00</profile>
> >>    <profile namespace="globus" key="lowOverallocation">100</profile>
> >>    <profile namespace="globus" key="highOverallocation">100</profile>
> >> 
> >>    <profile namespace="globus" key="slots">200</profile>
> >>    <profile namespace="globus" key="nodeGranularity">1</profile>
> >>    <profile namespace="globus" key="maxNodes">1</profile>
> >> 
> >>    <profile namespace="karajan" key="jobThrottle">47.99</profile>
> >>    <profile namespace="karajan" key="initialScore">10000</profile>
> >> 
> >>    <filesystem provider="local"/>
> >>    <!-- replace this with your home on lustre -->
> >>    <workdirectory>/lustre/beagle/samseaver/GS/swift.workdir</workdirectory>
> >>  </pool>
> >> </config>
> >> 
> >> Any ideas?
> >> 
> >> Begin forwarded message:
> >> 
> >>> From: Sam Seaver <samseaver at gmail.com>
> >>> Date: February 19, 2013 1:16:28 PM CST
> >>> To: Lorenzo Pesce <lpesce at uchicago.edu>
> >>> Subject: Re: How are things going?
> >>> 
> >>> I got this error.  I suspect using the new SWIFT_HOME directory means that there's possibly a missing parameter someplace:
> >>> 
> >>> should we resume a previous calculation? [y/N] y
> >>> rlog files displayed in reverse time order
> >>> should I use GS-20130203-0717-jgeppt98.0.rlog ?[y/n]
> >>> y
> >>> Using  GS-20130203-0717-jgeppt98.0.rlog
> >>> [Error] GS_sites.xml:1:9: cvc-elt.1: Cannot find the declaration of element 'config'.
> >>> 
> >>> Execution failed:
> >>> Failed to parse site catalog
> >>>        swift:siteCatalog @ scheduler.k, line: 31
> >>> Caused by: Invalid pool entry 'pbs': 
> >>>        swift:siteCatalog @ scheduler.k, line: 31
> >>> Caused by: java.lang.IllegalArgumentException: Missing URL
> >>>        at org.griphyn.vdl.karajan.lib.SiteCatalog.execution(SiteCatalog.java:173)
> >>>        at org.griphyn.vdl.karajan.lib.SiteCatalog.pool(SiteCatalog.java:100)
> >>>        at org.griphyn.vdl.karajan.lib.SiteCatalog.buildResources(SiteCatalog.java:60)
> >>>        at org.griphyn.vdl.karajan.lib.SiteCatalog.function(SiteCatalog.java:48)
> >>>        at org.globus.cog.karajan.compiled.nodes.functions.AbstractFunction.runBody(AbstractFunction.java:38)
> >>>        at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> >>>        at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> >>>        at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> >>>        at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> >>>        at org.globus.cog.karajan.compiled.nodes.Import.runBody(Import.java:269)
> >>>        at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> >>>        at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> >>>        at org.globus.cog.karajan.compiled.nodes.Main.run(Main.java:79)
> >>>        at k.thr.LWThread.run(LWThread.java:243)
> >>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>        at java.lang.Thread.run(Thread.java:722)
> >>> 
> >>> 
> >>> On Tue, Feb 19, 2013 at 1:13 PM, Sam Seaver <samseaver at gmail.com> wrote:
> >>> OK, it got to the point where it really did hang.  I'm retrying, but with your suggestions.  The other three finished fine!
> >>> 
> >>> Progress:  time: Tue, 19 Feb 2013 19:08:53 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> Progress:  time: Tue, 19 Feb 2013 19:09:23 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> Progress:  time: Tue, 19 Feb 2013 19:09:53 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> Progress:  time: Tue, 19 Feb 2013 19:10:23 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> Progress:  time: Tue, 19 Feb 2013 19:10:53 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> Progress:  time: Tue, 19 Feb 2013 19:11:23 +0000  Selecting site:18147  Submitted:174  Active:96  Failed:2  Finished successfully:132323  Failed but can retry:183
> >>> 
> >>> 
> >>> On Tue, Feb 19, 2013 at 8:51 AM, Lorenzo Pesce <lpesce at uchicago.edu> wrote:
> >>> Hmm... 
> >>> 
> >>> foreach.max.threads=100
> >>> 
> >>> maybe you should increase this number a bit and see what happens.
> >>> 
> >>> Also, I would try to replace
> >>> 
> >>> SWIFT_HOME=/home/wilde/swift/rev/swift-r6151-cog-r3552
> >>> 
> >>> with 
> >>> 
> >>> SWIFT_HOME=/soft/swift/fast
> >>> 
> >>> Keep me posted. Let's get this rolling.
> >>> 
> >>> if it doesn't work, I can redo the packing.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> On Feb 19, 2013, at 1:07 AM, Sam Seaver wrote:
> >>> 
> >>>> Actually, the ten agents job does seem to be stuck in a strange loop.  It is incrementing the number of jobs that has finished successfully, and at a fast pace, but the number of jobs its starting is decrementing much more slowly, its almost as its repeatedly attempting the same set of parameters multiple times...
> >>>> 
> >>>> I'll see what it's doing in the morning
> >>>> S
> >>>> 
> >>>> 
> >>>> On Tue, Feb 19, 2013 at 1:00 AM, Sam Seaver <samseaver at gmail.com> wrote:
> >>>> Seems to have worked overall this time!
> >>>> 
> >>>> I resume four jobs, each were for a different number of agents (10,100,1000,10000) it made it easier for me to decide on the app time.  Two of them have already finished i.e.:
> >>>> 
> >>>> Progress:  time: Mon, 18 Feb 2013 23:50:12 +0000  Active:4  Checking status:1  Finished in previous run:148098  Finished successfully:37897
> >>>> Progress:  time: Mon, 18 Feb 2013 23:50:15 +0000  Active:2  Checking status:1  Finished in previous run:148098  Finished successfully:37899
> >>>> Final status: Mon, 18 Feb 2013 23:50:15 +0000  Finished in previous run:148098  Finished successfully:37902
> >>>> 
> >>>> and the only one that is showing any failure (50/110000), is the ten agents version which is so short I can understand why, but its still actively trying to run jobs and is actively finishing jobs, so that's good.
> >>>> 
> >>>> Yay!
> >>>> 
> >>>> 
> >>>> 
> >>>> On Mon, Feb 18, 2013 at 1:09 PM, Lorenzo Pesce <lpesce at uchicago.edu> wrote:
> >>>> Good. Keep me posted, I would really like to solve your problems in running on Beagle this week, I wish that Swift would have been friendlier.
> >>>> 
> >>>> On Feb 18, 2013, at 1:01 PM, Sam Seaver wrote:
> >>>> 
> >>>>> I just resumed the jobs that I'd killed before the system went down, lets see how it does.  I always did a mini-review of the data I've got and it seems to be working as expected.
> >>>>> 
> >>>>> 
> >>>>> On Mon, Feb 18, 2013 at 12:28 PM, Lorenzo Pesce <lpesce at uchicago.edu> wrote:
> >>>>> I have lost track a bit of what's up. I am happy to try and go over it with you when you are ready.
> >>>>> 
> >>>>> Some of the problems of swift might have improved with a new version and the new system.
> >>>>> 
> >>>>> 
> >>>>> On Feb 18, 2013, at 12:22 PM, Sam Seaver wrote:
> >>>>> 
> >>>>>> They're not, I've not looked since Beagle came back up. Will do so later today.
> >>>>>> S
> >>>>>> 
> >>>>>> 
> >>>>>> On Mon, Feb 18, 2013 at 12:20 PM, Lorenzo Pesce <lpesce at uchicago.edu> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> -- 
> >>>>>> Postdoctoral Fellow
> >>>>>> Mathematics and Computer Science Division
> >>>>>> Argonne National Laboratory
> >>>>>> 9700 S. Cass Avenue
> >>>>>> Argonne, IL 60439
> >>>>>> 
> >>>>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>>>> samseaver at gmail.com
> >>>>>> (773) 796-7144
> >>>>>> 
> >>>>>> "We shall not cease from exploration
> >>>>>> And the end of all our exploring
> >>>>>> Will be to arrive where we started
> >>>>>> And know the place for the first time."
> >>>>>>   --T. S. Eliot
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Postdoctoral Fellow
> >>>>> Mathematics and Computer Science Division
> >>>>> Argonne National Laboratory
> >>>>> 9700 S. Cass Avenue
> >>>>> Argonne, IL 60439
> >>>>> 
> >>>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>>> samseaver at gmail.com
> >>>>> (773) 796-7144
> >>>>> 
> >>>>> "We shall not cease from exploration
> >>>>> And the end of all our exploring
> >>>>> Will be to arrive where we started
> >>>>> And know the place for the first time."
> >>>>>   --T. S. Eliot
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Postdoctoral Fellow
> >>>> Mathematics and Computer Science Division
> >>>> Argonne National Laboratory
> >>>> 9700 S. Cass Avenue
> >>>> Argonne, IL 60439
> >>>> 
> >>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>> samseaver at gmail.com
> >>>> (773) 796-7144
> >>>> 
> >>>> "We shall not cease from exploration
> >>>> And the end of all our exploring
> >>>> Will be to arrive where we started
> >>>> And know the place for the first time."
> >>>>   --T. S. Eliot
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Postdoctoral Fellow
> >>>> Mathematics and Computer Science Division
> >>>> Argonne National Laboratory
> >>>> 9700 S. Cass Avenue
> >>>> Argonne, IL 60439
> >>>> 
> >>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>> samseaver at gmail.com
> >>>> (773) 796-7144
> >>>> 
> >>>> "We shall not cease from exploration
> >>>> And the end of all our exploring
> >>>> Will be to arrive where we started
> >>>> And know the place for the first time."
> >>>>   --T. S. Eliot
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Postdoctoral Fellow
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>> 9700 S. Cass Avenue
> >>> Argonne, IL 60439
> >>> 
> >>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>> samseaver at gmail.com
> >>> (773) 796-7144
> >>> 
> >>> "We shall not cease from exploration
> >>> And the end of all our exploring
> >>> Will be to arrive where we started
> >>> And know the place for the first time."
> >>>   --T. S. Eliot
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Postdoctoral Fellow
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>> 9700 S. Cass Avenue
> >>> Argonne, IL 60439
> >>> 
> >>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>> samseaver at gmail.com
> >>> (773) 796-7144
> >>> 
> >>> "We shall not cease from exploration
> >>> And the end of all our exploring
> >>> Will be to arrive where we started
> >>> And know the place for the first time."
> >>>   --T. S. Eliot
> >> 
> >> This is the content of the file where we have the first complaint from
> >> swift (see attached): 
> >> <config>
> >>  <pool handle="pbs">
> >>    <execution provider="coaster" jobmanager="local:pbs"/>
> >>    <!-- replace with your project -->
> >>    <profile namespace="globus" key="project">CI-DEB000002</profile>
> >> 
> >> 
> >>    <profile namespace="globus"
> >> key="providerAttributes">pbs.aprun;pbs.mpp;depth=24</profile>
> >> 
> >> 
> >> 
> >> 
> >>    <profile namespace="globus" key="jobsPerNode">24</profile>
> >>    <profile namespace="globus" key="maxTime">172800</profile>
> >>    <profile namespace="globus" key="maxwalltime">0:10:00</profile>
> >>    <profile namespace="globus" key="lowOverallocation">100</profile>
> >>    <profile namespace="globus" key="highOverallocation">100</profile>
> >> 
> >> 
> >>    <profile namespace="globus" key="slots">200</profile>
> >>    <profile namespace="globus" key="nodeGranularity">1</profile>
> >>    <profile namespace="globus" key="maxNodes">1</profile>
> >> 
> >> 
> >>    <profile namespace="karajan" key="jobThrottle">47.99</profile>
> >>    <profile namespace="karajan" key="initialScore">10000</profile>
> >> 
> >> 
> >>    <filesystem provider="local"/>
> >>    <!-- replace this with your home on lustre -->
> >> 
> >> <workdirectory>/lustre/beagle/samseaver/GS/swift.workdir</workdirectory>
> >>  </pool>
> >> </config>
> >> 
> >> 
> >> Any ideas?
> >> 
> >> Begin forwarded message:
> >> 
> >>> From: Sam Seaver <samseaver at gmail.com>
> >>> 
> >>> Date: February 19, 2013 1:16:28 PM CST
> >>> 
> >>> To: Lorenzo Pesce <lpesce at uchicago.edu>
> >>> 
> >>> Subject: Re: How are things going?
> >>> 
> >>> 
> >>> I got this error.  I suspect using the new SWIFT_HOME directory
> >>> means that there's possibly a missing parameter someplace:
> >>> 
> >>> 
> >>> should we resume a previous calculation? [y/N] y
> >>> rlog files displayed in reverse time order
> >>> should I use GS-20130203-0717-jgeppt98.0.rlog ?[y/n]
> >>> y
> >>> Using  GS-20130203-0717-jgeppt98.0.rlog
> >>> [Error] GS_sites.xml:1:9: cvc-elt.1: Cannot find the declaration of
> >>> element 'config'.
> >>> 
> >>> 
> >>> Execution failed:
> >>> Failed to parse site catalog
> >>>        swift:siteCatalog @ scheduler.k, line: 31
> >>> Caused by: Invalid pool entry 'pbs': 
> >>>        swift:siteCatalog @ scheduler.k, line: 31
> >>> Caused by: java.lang.IllegalArgumentException: Missing URL
> >>>        at
> >>> org.griphyn.vdl.karajan.lib.SiteCatalog.execution(SiteCatalog.java:173)
> >>>        at
> >>> org.griphyn.vdl.karajan.lib.SiteCatalog.pool(SiteCatalog.java:100)
> >>>        at
> >>> org.griphyn.vdl.karajan.lib.SiteCatalog.buildResources(SiteCatalog.java:60)
> >>>        at
> >>> org.griphyn.vdl.karajan.lib.SiteCatalog.function(SiteCatalog.java:48)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.functions.AbstractFunction.runBody(AbstractFunction.java:38)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:147)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.Import.runBody(Import.java:269)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:154)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:87)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.FramedInternalFunction.run(FramedInternalFunction.java:63)
> >>>        at
> >>> org.globus.cog.karajan.compiled.nodes.Main.run(Main.java:79)
> >>>        at k.thr.LWThread.run(LWThread.java:243)
> >>>        at
> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>        at java.util.concurrent.ThreadPoolExecutor
> >>> $Worker.run(ThreadPoolExecutor.java:603)
> >>>        at java.lang.Thread.run(Thread.java:722)
> >>> 
> >>> 
> >>> On Tue, Feb 19, 2013 at 1:13 PM, Sam Seaver <samseaver at gmail.com>
> >>> wrote:
> >>>        OK, it got to the point where it really did hang.  I'm
> >>>        retrying, but with your suggestions.  The other three
> >>>        finished fine!
> >>> 
> >>> 
> >>>        Progress:  time: Tue, 19 Feb 2013 19:08:53 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>>        Progress:  time: Tue, 19 Feb 2013 19:09:23 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>>        Progress:  time: Tue, 19 Feb 2013 19:09:53 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>>        Progress:  time: Tue, 19 Feb 2013 19:10:23 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>>        Progress:  time: Tue, 19 Feb 2013 19:10:53 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>>        Progress:  time: Tue, 19 Feb 2013 19:11:23 +0000  Selecting
> >>>        site:18147  Submitted:174  Active:96  Failed:2  Finished
> >>>        successfully:132323  Failed but can retry:183
> >>> 
> >>> 
> >>>        On Tue, Feb 19, 2013 at 8:51 AM, Lorenzo Pesce
> >>>        <lpesce at uchicago.edu> wrote:
> >>>                Hmm... 
> >>> 
> >>> 
> >>>                foreach.max.threads=100
> >>> 
> >>> 
> >>>                maybe you should increase this number a bit and see
> >>>                what happens.
> >>> 
> >>> 
> >>>                Also, I would try to replace
> >>> 
> >>> 
> >>>                SWIFT_HOME=/home/wilde/swift/rev/swift-r6151-cog-r3552
> >>> 
> >>> 
> >>>                with 
> >>> 
> >>> 
> >>>                SWIFT_HOME=/soft/swift/fast
> >>> 
> >>> 
> >>>                Keep me posted. Let's get this rolling.
> >>> 
> >>> 
> >>>                if it doesn't work, I can redo the packing.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>                On Feb 19, 2013, at 1:07 AM, Sam Seaver wrote:
> >>> 
> >>>> Actually, the ten agents job does seem to be stuck
> >>>> in a strange loop.  It is incrementing the number
> >>>> of jobs that has finished successfully, and at a
> >>>> fast pace, but the number of jobs its starting is
> >>>> decrementing much more slowly, its almost as its
> >>>> repeatedly attempting the same set of parameters
> >>>> multiple times...
> >>>> 
> >>>> 
> >>>> I'll see what it's doing in the morning
> >>>> S
> >>>> 
> >>>> 
> >>>> On Tue, Feb 19, 2013 at 1:00 AM, Sam Seaver
> >>>> <samseaver at gmail.com> wrote:
> >>>>        Seems to have worked overall this time!
> >>>> 
> >>>> 
> >>>>        I resume four jobs, each were for a
> >>>>        different number of agents
> >>>>        (10,100,1000,10000) it made it easier for
> >>>>        me to decide on the app time.  Two of them
> >>>>        have already finished i.e.:
> >>>> 
> >>>> 
> >>>>        Progress:  time: Mon, 18 Feb 2013 23:50:12
> >>>>        +0000  Active:4  Checking status:1
> >>>>         Finished in previous run:148098  Finished
> >>>>        successfully:37897
> >>>>        Progress:  time: Mon, 18 Feb 2013 23:50:15
> >>>>        +0000  Active:2  Checking status:1
> >>>>         Finished in previous run:148098  Finished
> >>>>        successfully:37899
> >>>>        Final status: Mon, 18 Feb 2013 23:50:15
> >>>>        +0000  Finished in previous run:148098
> >>>>         Finished successfully:37902
> >>>> 
> >>>> 
> >>>>        and the only one that is showing any
> >>>>        failure (50/110000), is the ten agents
> >>>>        version which is so short I can understand
> >>>>        why, but its still actively trying to run
> >>>>        jobs and is actively finishing jobs, so
> >>>>        that's good.
> >>>> 
> >>>> 
> >>>>        Yay!
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>>        On Mon, Feb 18, 2013 at 1:09 PM, Lorenzo
> >>>>        Pesce <lpesce at uchicago.edu> wrote:
> >>>>                Good. Keep me posted, I would
> >>>>                really like to solve your problems
> >>>>                in running on Beagle this week, I
> >>>>                wish that Swift would have been
> >>>>                friendlier.
> >>>> 
> >>>>                On Feb 18, 2013, at 1:01 PM, Sam
> >>>>                Seaver wrote:
> >>>> 
> >>>>> I just resumed the jobs that I'd
> >>>>> killed before the system went
> >>>>> down, lets see how it does.  I
> >>>>> always did a mini-review of the
> >>>>> data I've got and it seems to be
> >>>>> working as expected.
> >>>>> 
> >>>>> 
> >>>>> On Mon, Feb 18, 2013 at 12:28
> >>>>> PM, Lorenzo Pesce
> >>>>> <lpesce at uchicago.edu> wrote:
> >>>>>        I have lost track a bit
> >>>>>        of what's up. I am happy
> >>>>>        to try and go over it
> >>>>>        with you when you are
> >>>>>        ready.
> >>>>> 
> >>>>> 
> >>>>>        Some of the problems of
> >>>>>        swift might have
> >>>>>        improved with a new
> >>>>>        version and the new
> >>>>>        system.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>        On Feb 18, 2013, at
> >>>>>        12:22 PM, Sam Seaver
> >>>>>        wrote:
> >>>>> 
> >>>>>> They're not, I've not
> >>>>>> looked since Beagle
> >>>>>> came back up. Will do
> >>>>>> so later today.
> >>>>>> S
> >>>>>> 
> >>>>>> 
> >>>>>> On Mon, Feb 18, 2013
> >>>>>> at 12:20 PM, Lorenzo
> >>>>>> Pesce
> >>>>>> <lpesce at uchicago.edu>
> >>>>>> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> -- 
> >>>>>> Postdoctoral Fellow
> >>>>>> Mathematics and
> >>>>>> Computer Science
> >>>>>> Division
> >>>>>> Argonne National
> >>>>>> Laboratory
> >>>>>> 9700 S. Cass Avenue
> >>>>>> Argonne, IL 60439
> >>>>>> 
> >>>>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>>>> samseaver at gmail.com
> >>>>>> (773) 796-7144
> >>>>>> 
> >>>>>> "We shall not cease
> >>>>>> from exploration
> >>>>>> And the end of all our
> >>>>>> exploring
> >>>>>> Will be to arrive
> >>>>>> where we started
> >>>>>> And know the place for
> >>>>>> the first time."
> >>>>>>   --T. S. Eliot
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Postdoctoral Fellow
> >>>>> Mathematics and Computer Science
> >>>>> Division
> >>>>> Argonne National Laboratory
> >>>>> 9700 S. Cass Avenue
> >>>>> Argonne, IL 60439
> >>>>> 
> >>>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>>> samseaver at gmail.com
> >>>>> (773) 796-7144
> >>>>> 
> >>>>> "We shall not cease from
> >>>>> exploration
> >>>>> And the end of all our exploring
> >>>>> Will be to arrive where we
> >>>>> started
> >>>>> And know the place for the first
> >>>>> time."
> >>>>>   --T. S. Eliot
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>>        -- 
> >>>>        Postdoctoral Fellow
> >>>>        Mathematics and Computer Science Division
> >>>>        Argonne National Laboratory
> >>>>        9700 S. Cass Avenue
> >>>>        Argonne, IL 60439
> >>>> 
> >>>>        http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>>        samseaver at gmail.com
> >>>>        (773) 796-7144
> >>>> 
> >>>>        "We shall not cease from exploration
> >>>>        And the end of all our exploring
> >>>>        Will be to arrive where we started
> >>>>        And know the place for the first time."
> >>>>           --T. S. Eliot
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Postdoctoral Fellow
> >>>> Mathematics and Computer Science Division
> >>>> Argonne National Laboratory
> >>>> 9700 S. Cass Avenue
> >>>> Argonne, IL 60439
> >>>> 
> >>>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>> samseaver at gmail.com
> >>>> (773) 796-7144
> >>>> 
> >>>> "We shall not cease from exploration
> >>>> And the end of all our exploring
> >>>> Will be to arrive where we started
> >>>> And know the place for the first time."
> >>>>   --T. S. Eliot
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>        -- 
> >>>        Postdoctoral Fellow
> >>>        Mathematics and Computer Science Division
> >>>        Argonne National Laboratory
> >>>        9700 S. Cass Avenue
> >>>        Argonne, IL 60439
> >>> 
> >>>        http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>>        samseaver at gmail.com
> >>>        (773) 796-7144
> >>> 
> >>>        "We shall not cease from exploration
> >>>        And the end of all our exploring
> >>>        Will be to arrive where we started
> >>>        And know the place for the first time."
> >>>           --T. S. Eliot
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Postdoctoral Fellow
> >>> Mathematics and Computer Science Division
> >>> Argonne National Laboratory
> >>> 9700 S. Cass Avenue
> >>> Argonne, IL 60439
> >>> 
> >>> http://www.linkedin.com/pub/sam-seaver/0/412/168
> >>> samseaver at gmail.com
> >>> (773) 796-7144
> >>> 
> >>> "We shall not cease from exploration
> >>> And the end of all our exploring
> >>> Will be to arrive where we started
> >>> And know the place for the first time."
> >>>   --T. S. Eliot
> >> 
> >> 
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 
> > 
> 





More information about the Swift-devel mailing list