From wozniak at mcs.anl.gov Wed Dec 1 10:55:16 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 1 Dec 2010 10:55:16 -0600 (CST) Subject: [Swift-devel] coaster-service error on Intrepid Message-ID: Hello all I'm getting started with the coaster-service on Intrepid. I start up the service and the first run completes. The second fails with the trace below. sites.xml is also included below. I'm looking into this but I thought I should post it... Justin Intrepid: ~> coaster-service -p 2390 -nosec Started coaster service: http://140.221.82.115:2390 original callback URI is http://10.40.5.144:32907 callback URI has been overridden to http://172.17.5.144:32907 Failed to send remote log message org.globus.cog.karajan.workflow.service.channels.ChannelException: Channel died and no contact available at org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) at org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31) at org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100) 172.17.5.144 HTCScienceApps prod-devel zeptoos true 21 10000 1 1 3300 64 64 org.globus.swift.data.policy.AllocationHook /home/wozniak/work -- Justin M Wozniak From wilde at mcs.anl.gov Wed Dec 1 11:15:52 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 1 Dec 2010 11:15:52 -0600 (CST) Subject: [Swift-devel] coaster-service error on Intrepid In-Reply-To: Message-ID: <121884289.150396.1291223752989.JavaMail.root@zimbra.anl.gov> Justin, I was experimenting on PADS with the persistent coaster service; thats where I tested Mihael's fix, which enabled the service to be used repeatedly and to remain up for extended periods of time. I just started yesterday trying to move that to the BG/P - I think for the same reason as you. My script is in /home/wilde/swift/lab/pecos/start-coasters on Surveyor. I'll stop by to see if we can get this working, as it will help us both on the CDM runs. One thing to note: I run one artificial job to put the service into passive mode, which seems necessary to enable externally started workers to connect to it. Ideally we'll soon just make this a command line flag to the service. - Mike ----- Original Message ----- > Hello all > I'm getting started with the coaster-service on Intrepid. I start > up the service and the first run completes. The second fails with the > trace below. sites.xml is also included below. I'm looking into this > but > I thought I should post it... > Justin > > Intrepid: ~> coaster-service -p 2390 -nosec > Started coaster service: http://140.221.82.115:2390 > original callback URI is http://10.40.5.144:32907 > callback URI has been overridden to http://172.17.5.144:32907 > Failed to send remote log message > org.globus.cog.karajan.workflow.service.channels.ChannelException: > Channel > died and no contact available > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) > at > org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100) > > > > jobmanager="local:cobalt" > url="http://140.221.82.115:2390" > /> > > key="internalHostname">172.17.5.144 > HTCScienceApps > prod-devel > zeptoos > true > 21 > 10000 > 1 > 1 > 3300 > 64 > 64 > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > /home/wozniak/work > > > > -- > Justin M Wozniak > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Dec 1 11:46:13 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 01 Dec 2010 09:46:13 -0800 Subject: [Swift-devel] Re: Persistent coaster service fails after several runs In-Reply-To: References: <959208306.127425.1291008222274.JavaMail.root@zimbra.anl.gov> <1291157472.21980.0.camel@blabla2.none> Message-ID: <1291225573.26027.8.camel@blabla2.none> I made a change in r2948. Due to the fact that the array sizes were 32768 (which is exactly the I/O buffer size that the worker uses) I suspect that they held file data. The problem was that the channel needed to hold a persistent buffer because it wasn't necessary that all of the data in one chunk could be read at once from the stream/socket. But that persistence was only needed until the the whole chunk was read, after which it could be released. In the BG/P case, I suspect the problem was a lot worse than in Mike's case because you probably used many more workers. So please give that a try. There will, of course, always be byte arrays with the coaster I/O, but they should be GCed eventually. Mihael On Tue, 2010-11-30 at 16:57 -0600, Justin M Wozniak wrote: > I'm on Intrepid so it's an IBM heap dump. There's a good one there in > ~wozniak/Public/heapdumps . > > The byte[]s are definitely associated with TCPChannel but that's all I > have been able to figure out so far- I don't see where they are retained. > > It is possible that the reader is generating the bytes faster than the > network can push them out, so we just need to tighten up the throttle? > > On Tue, 30 Nov 2010, Mihael Hategan wrote: > > > Can you make a heap dump of the relevant issue? > > > > On Tue, 2010-11-30 at 09:59 -0600, Justin M Wozniak wrote: > >> Along these lines, I'm looking at memory usage in Coasters. There's a > >> plot attached below- usage spikes when the workers start running. > >> > >> 96% of the usage is byte[] which makes me think it could be KarajanChannel > >> stuff... > >> > >> http://www.ci.uchicago.edu/wiki/bin/view/SWFT/PerformanceNotes#Memory > >> > >> On Sun, 28 Nov 2010, Michael Wilde wrote: > >> > >>> This fix looks great so far - Ive tested with varying workflow sizes and delays, and have not seen any problems. > >>> > >>> - Mike > >> > > > > > > > From wilde at mcs.anl.gov Wed Dec 1 12:18:35 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 1 Dec 2010 12:18:35 -0600 (CST) Subject: [Swift-devel] alcfbgpnat and BG/P compute-node-to-login-host connectivity In-Reply-To: <121884289.150396.1291223752989.JavaMail.root@zimbra.anl.gov> Message-ID: <918999100.150870.1291227515772.JavaMail.root@zimbra.anl.gov> was: Re: [Swift-devel] coaster-service error on Intrepid Mihael, how does "alcfbgpnat" work, and what does that imply for running manual persisten coasters on BG/P with the workers launched from a single qsub job? Im probing on surveyor at the moment trying to figure out how worker.pl can reach a persistent coaster service on the login node, and seem unable to ping login6 from a compute node. Does the worker.pl script (or coaster service) do something special when alcfbgpnat is set to enable connectivity? - Mike ----- Original Message ----- > Justin, I was experimenting on PADS with the persistent coaster > service; thats where I tested Mihael's fix, which enabled the service > to be used repeatedly and to remain up for extended periods of time. > > I just started yesterday trying to move that to the BG/P - I think for > the same reason as you. > > My script is in /home/wilde/swift/lab/pecos/start-coasters on > Surveyor. > > I'll stop by to see if we can get this working, as it will help us > both on the CDM runs. > > One thing to note: I run one artificial job to put the service into > passive mode, which seems necessary to enable externally started > workers to connect to it. Ideally we'll soon just make this a command > line flag to the service. > > - Mike > > > ----- Original Message ----- > > Hello all > > I'm getting started with the coaster-service on Intrepid. I start > > up the service and the first run completes. The second fails with > > the > > trace below. sites.xml is also included below. I'm looking into this > > but > > I thought I should post it... > > Justin > > > > Intrepid: ~> coaster-service -p 2390 -nosec > > Started coaster service: http://140.221.82.115:2390 > > original callback URI is http://10.40.5.144:32907 > > callback URI has been overridden to http://172.17.5.144:32907 > > Failed to send remote log message > > org.globus.cog.karajan.workflow.service.channels.ChannelException: > > Channel > > died and no contact available > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) > > at > > org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100) > > > > > > > > > jobmanager="local:cobalt" > > url="http://140.221.82.115:2390" > > /> > > > > > key="internalHostname">172.17.5.144 > > HTCScienceApps > > prod-devel > > zeptoos > > true > > 21 > > 10000 > > 1 > > 1 > > 3300 > > 64 > > 64 > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > > > > /home/wozniak/work > > > > > > > > -- > > Justin M Wozniak > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Dec 1 13:34:24 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 01 Dec 2010 11:34:24 -0800 Subject: [Swift-devel] Re: alcfbgpnat and BG/P compute-node-to-login-host connectivity In-Reply-To: <918999100.150870.1291227515772.JavaMail.root@zimbra.anl.gov> References: <918999100.150870.1291227515772.JavaMail.root@zimbra.anl.gov> Message-ID: <1291232064.28806.0.camel@blabla2.none> This code snippet may be of relevance: if (settings.getAlcfbgpnat()) { spec.addEnvironmentVariable("ZOID_ENABLE_NAT", "true"); } So you should set that env variable for the job if you want NAT. Mihael On Wed, 2010-12-01 at 12:18 -0600, Michael Wilde wrote: > was: Re: [Swift-devel] coaster-service error on Intrepid > > Mihael, how does "alcfbgpnat" work, and what does that imply for running manual persisten coasters on BG/P with the workers launched from a single qsub job? > > Im probing on surveyor at the moment trying to figure out how worker.pl can reach a persistent coaster service on the login node, and seem unable to ping login6 from a compute node. > > Does the worker.pl script (or coaster service) do something special when alcfbgpnat is set to enable connectivity? > > - Mike > > > ----- Original Message ----- > > Justin, I was experimenting on PADS with the persistent coaster > > service; thats where I tested Mihael's fix, which enabled the service > > to be used repeatedly and to remain up for extended periods of time. > > > > I just started yesterday trying to move that to the BG/P - I think for > > the same reason as you. > > > > My script is in /home/wilde/swift/lab/pecos/start-coasters on > > Surveyor. > > > > I'll stop by to see if we can get this working, as it will help us > > both on the CDM runs. > > > > One thing to note: I run one artificial job to put the service into > > passive mode, which seems necessary to enable externally started > > workers to connect to it. Ideally we'll soon just make this a command > > line flag to the service. > > > > - Mike > > > > > > ----- Original Message ----- > > > Hello all > > > I'm getting started with the coaster-service on Intrepid. I start > > > up the service and the first run completes. The second fails with > > > the > > > trace below. sites.xml is also included below. I'm looking into this > > > but > > > I thought I should post it... > > > Justin > > > > > > Intrepid: ~> coaster-service -p 2390 -nosec > > > Started coaster service: http://140.221.82.115:2390 > > > original callback URI is http://10.40.5.144:32907 > > > callback URI has been overridden to http://172.17.5.144:32907 > > > Failed to send remote log message > > > org.globus.cog.karajan.workflow.service.channels.ChannelException: > > > Channel > > > died and no contact available > > > at > > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235) > > > at > > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257) > > > at > > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227) > > > at > > > org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100) > > > > > > > > > > > > > > jobmanager="local:cobalt" > > > url="http://140.221.82.115:2390" > > > /> > > > > > > > > key="internalHostname">172.17.5.144 > > > HTCScienceApps > > > prod-devel > > > zeptoos > > > true > > > 21 > > > 10000 > > > 1 > > > 1 > > > 3300 > > > 64 > > > 64 > > > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > > > > > > > /home/wozniak/work > > > > > > > > > > > > -- > > > Justin M Wozniak > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From aespinosa at cs.uchicago.edu Wed Dec 1 15:17:41 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 1 Dec 2010 15:17:41 -0600 Subject: [Swift-devel] cdm site-aware policy Message-ID: Hi Justin, I get some errors when placing in a site-aware cdm policy: CDM file: fs.data Exception in thread "main" java.lang.NullPointerException at org.globus.swift.data.Director.addRule(Director.java:113) at org.globus.swift.data.Director.addLine(Director.java:97) at org.globus.swift.data.Director.load(Director.java:87) at org.griphyn.vdl.karajan.Loader.loadCDM(Loader.java:223) at org.griphyn.vdl.karajan.Loader.main(Loader.java:97) Am I missing something or is the feature not yet in trunk? rule .*TEST_f[x|y]_644.sgt DIRECT /gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles rule .*LGU_f[x|y]_664.sgt DIRECT . rule PADS .*LGU.*subf[x|y].sgt DIRECT . #rule .*sub*.sgt DIRECT /gpfs/pads/swift/swift/aespinosa/science/cybershake/Results #rule .*[0-9]+/[0-9]+/.*.txt.variation.* DIRECT /gpfs/teraport/osgtg/cybershake/RuptureVariations #rule .* DEFAULT rule ANYWHERE .* DEFAULT Thanksk -Allan From wozniak at mcs.anl.gov Wed Dec 1 15:24:09 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 1 Dec 2010 15:24:09 -0600 (CST) Subject: [Swift-devel] cdm site-aware policy In-Reply-To: References: Message-ID: Hi Allan, The feature is not yet in trunk. Justin On Wed, 1 Dec 2010, Allan Espinosa wrote: > Hi Justin, > > I get some errors when placing in a site-aware cdm policy: > > CDM file: fs.data > Exception in thread "main" java.lang.NullPointerException > at org.globus.swift.data.Director.addRule(Director.java:113) > at org.globus.swift.data.Director.addLine(Director.java:97) > at org.globus.swift.data.Director.load(Director.java:87) > at org.griphyn.vdl.karajan.Loader.loadCDM(Loader.java:223) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:97) > > Am I missing something or is the feature not yet in trunk? > > rule .*TEST_f[x|y]_644.sgt DIRECT > /gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles > rule .*LGU_f[x|y]_664.sgt DIRECT . > rule PADS .*LGU.*subf[x|y].sgt DIRECT . > #rule .*sub*.sgt DIRECT > /gpfs/pads/swift/swift/aespinosa/science/cybershake/Results > #rule .*[0-9]+/[0-9]+/.*.txt.variation.* DIRECT > /gpfs/teraport/osgtg/cybershake/RuptureVariations > #rule .* DEFAULT > rule ANYWHERE .* DEFAULT > > > Thanksk > -Allan > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From wilde at mcs.anl.gov Wed Dec 1 15:29:01 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 1 Dec 2010 15:29:01 -0600 (CST) Subject: [Swift-devel] Coaster worker REGISTER fails from BG/P to persistent passive coaster service In-Reply-To: <65144216.153092.1291238872450.JavaMail.root@zimbra.anl.gov> Message-ID: <882570914.153108.1291238941181.JavaMail.root@zimbra.anl.gov> Mihael, When I try to connect from a BG/P worker back to a passive persistent coaster service on the login host, I get this: from worker.pl: Failed to register (service returned error: Unknown command: REGISTER) at /home/wilde/swift/src/trunk/cog/modules/swift/dist/swift-svn/bin/worker.pl line 703. My service side log shows this for every worker start attempt: Unknown handler: REGISTER. Available handlers: {CHMOD=class org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler, ISDIR =class org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler, etc etc My worker log shows this: 1291238056.000 INFO - mytest Logging started: Wed Dec 1 15:13:53 2010 1291238056.000 INFO - Running on node 172.18.2.83 1291238056.000 DEBUG - uri=http://172.17.3.16:1985 1291238056.000 DEBUG - scheme=http 1291238056.000 DEBUG - host=172.17.3.16 1291238056.000 DEBUG - port=1985 1291238056.000 DEBUG - blockid=mytest 1291238056.000 INFO - Connecting (0)... 1291238056.000 DEBUG - Trying 172.17.3.16:1985... 1291238056.000 INFO - Connected 1291238056.000 DEBUG - Replies: {} 1291238056.000 DEBUG - OUT: len=8, tag=0, flags=0 1291238056.000 DEBUG - OUT: len=6, tag=0, flags=0 1291238056.000 DEBUG - OUT: len=0, tag=0, flags=2 1291238056.000 DEBUG - done sending frags for 0 The dummy swift run that set the service to passive mode said this on stdout/err: Swift svn swift-r3730 cog-r2943 RunID: 20101201-1448-szco4z4c Progress: Find: http://login6.surveyor.alcf.anl.gov:1985 Find: keepalive(120), reconnect - http://login6.surveyor.alcf.anl.gov:1985 Passive queue processor initialized. Callback URI is http://172.17.3.16:50000 Progress: Active:1 Progress: Checking status:1 Final status: Finished successfully:1 Any thoughts on why the worker fails to register with the service? - Mike From wilde at mcs.anl.gov Wed Dec 1 15:40:35 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 1 Dec 2010 15:40:35 -0600 (CST) Subject: [Swift-devel] Coaster worker REGISTER fails from BG/P to persistent passive coaster service In-Reply-To: <882570914.153108.1291238941181.JavaMail.root@zimbra.anl.gov> Message-ID: <938516501.153282.1291239635632.JavaMail.root@zimbra.anl.gov> Sorry - please ignore this, problem solved. I got my service ports crossed. - Mike ----- Original Message ----- > Mihael, > > When I try to connect from a BG/P worker back to a passive persistent > coaster service on the login host, I get this: > > from worker.pl: > > Failed to register (service returned error: Unknown command: REGISTER) > at > /home/wilde/swift/src/trunk/cog/modules/swift/dist/swift-svn/bin/worker.pl > line 703. > > My service side log shows this for every worker start attempt: > > Unknown handler: REGISTER. Available handlers: {CHMOD=class > org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler, > ISDIR > =class > org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler, > etc etc > > My worker log shows this: > > 1291238056.000 INFO - mytest Logging started: Wed Dec 1 15:13:53 2010 > 1291238056.000 INFO - Running on node 172.18.2.83 > 1291238056.000 DEBUG - uri=http://172.17.3.16:1985 > 1291238056.000 DEBUG - scheme=http > 1291238056.000 DEBUG - host=172.17.3.16 > 1291238056.000 DEBUG - port=1985 > 1291238056.000 DEBUG - blockid=mytest > 1291238056.000 INFO - Connecting (0)... > 1291238056.000 DEBUG - Trying 172.17.3.16:1985... > 1291238056.000 INFO - Connected > 1291238056.000 DEBUG - Replies: {} > 1291238056.000 DEBUG - OUT: len=8, tag=0, flags=0 > 1291238056.000 DEBUG - OUT: len=6, tag=0, flags=0 > 1291238056.000 DEBUG - OUT: len=0, tag=0, flags=2 > 1291238056.000 DEBUG - done sending frags for 0 > > The dummy swift run that set the service to passive mode said this on > stdout/err: > > Swift svn swift-r3730 cog-r2943 > > RunID: 20101201-1448-szco4z4c > Progress: > Find: http://login6.surveyor.alcf.anl.gov:1985 > Find: keepalive(120), reconnect - > http://login6.surveyor.alcf.anl.gov:1985 > Passive queue processor initialized. Callback URI is > http://172.17.3.16:50000 > Progress: Active:1 > Progress: Checking status:1 > Final status: Finished successfully:1 > > Any thoughts on why the worker fails to register with the service? > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sun Dec 5 12:36:43 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 5 Dec 2010 12:36:43 -0600 (CST) Subject: [Swift-devel] Does hookClass sites tag affect concurrency/throttling? In-Reply-To: <1954526748.169880.1291512310021.JavaMail.root@zimbra.anl.gov> Message-ID: <1770571507.170487.1291574203224.JavaMail.root@zimbra.anl.gov> Justin, what does this line in sites.xml do on the BG/P? org.globus.swift.data.policy.AllocationHook I think I got that from one of your examples for BG/P use; when I use it, I seem to get throttled to about 20 active jobs max. When I take it off, I seem to be able to utilize all CPUs in a pset (256 cores in my last test). My full pool element which seems to be limiting me to about 20 active is this (despite the fact that the throttle should let 276 jobs run at once): passive HTCScienceApps default zeptoos true 2.75 10000 4 1 3600 64 64 org.globus.swift.data.policy.AllocationHook /dev/shm $rundir When I use the following pool element, I get the full expected concurrency: passive 4 3.00 10000 /dev/shm $rundir While I am not *sure* its the hookClass tag thats causing the throttling, but its my primary suspect. I will try to confirm, but Im still curious what this tag does. Thanks, Mike From wozniak at mcs.anl.gov Sun Dec 5 22:24:43 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sun, 5 Dec 2010 22:24:43 -0600 (Central Standard Time) Subject: [Swift-devel] Re: Does hookClass sites tag affect concurrency/throttling? In-Reply-To: <1770571507.170487.1291574203224.JavaMail.root@zimbra.anl.gov> References: <1770571507.170487.1291574203224.JavaMail.root@zimbra.anl.gov> Message-ID: This is a callback that is triggered by coasters when a Block is allocated. This result is a real surprise, I'll check this out right away. On Sun, 5 Dec 2010, Michael Wilde wrote: > Justin, what does this line in sites.xml do on the BG/P? > > org.globus.swift.data.policy.AllocationHook > > I think I got that from one of your examples for BG/P use; when I use > it, I seem to get throttled to about 20 active jobs max. When I take it > off, I seem to be able to utilize all CPUs in a pset (256 cores in my > last test). > > My full pool element which seems to be limiting me to about 20 active is > this (despite the fact that the throttle should let 276 jobs run at > once): > > > > > passive > > HTCScienceApps > default > zeptoos > true > 2.75 > 10000 > 4 > 1 > 3600 > 64 > 64 > org.globus.swift.data.policy.AllocationHook > /dev/shm > $rundir > > > When I use the following pool element, I get the full expected concurrency: > > > > passive > > 4 > 3.00 > 10000 > > > /dev/shm > $rundir > > > While I am not *sure* its the hookClass tag thats causing the throttling, but its my primary suspect. I will try to confirm, but Im still curious what this tag does. > > Thanks, > > Mike > > -- Justin M Wozniak From skenny at uchicago.edu Mon Dec 6 01:33:55 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 6 Dec 2010 01:33:55 -0600 Subject: [Swift-devel] Next Swift release In-Reply-To: <1278622160.117059.1290543042855.JavaMail.root@zimbra.anl.gov> References: <577622484.117049.1290542892759.JavaMail.root@zimbra.anl.gov> <1278622160.117059.1290543042855.JavaMail.root@zimbra.anl.gov> Message-ID: so, my expectation for the release, as we've discussed somewhat on the list already, is to put out swift 1.0 on 12/20 which, as i see it, involves primarily editing of the documentation/web content more so than anything else since all new code (and documentation associated with the new code) going into trunk is expected to be in the 1.1. release--which hopefully we can have out in the next few months. i'm also assuming we're sticking with the plan to allow each release to have its own doc version along with the code. let me know if anyone thinks there are other things that can/should go into the 12/20 release. ~sk On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde wrote: > All, > > Sarah is going to take the lead in producing the next Swift release, and > will propose a release definition and plan. We want to have the release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Dec 6 07:35:29 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 07:35:29 -0600 (CST) Subject: [Swift-devel] Re: Does hookClass sites tag affect concurrency/throttling? In-Reply-To: Message-ID: <669415963.174296.1291642529245.JavaMail.root@zimbra.anl.gov> Note that I'm not certain that the hookClass is indeed the cause of the throttling. - Mike ----- Original Message ----- > This is a callback that is triggered by coasters when a Block is > allocated. This result is a real surprise, I'll check this out right > away. > > On Sun, 5 Dec 2010, Michael Wilde wrote: > > > Justin, what does this line in sites.xml do on the BG/P? > > > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > > I think I got that from one of your examples for BG/P use; when I > > use > > it, I seem to get throttled to about 20 active jobs max. When I take > > it > > off, I seem to be able to utilize all CPUs in a pset (256 cores in > > my > > last test). > > > > My full pool element which seems to be limiting me to about 20 > > active is > > this (despite the fact that the throttle should let 276 jobs run at > > once): > > > > > > > > > url="http://localhost:1985" jobmanager=""/> > > passive > > > > > key="project">HTCScienceApps > > default > > zeptoos > > true > > 2.75 > > 10000 > > 4 > > 1 > > 3600 > > 64 > > 64 > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > /dev/shm > > $rundir > > > > > > When I use the following pool element, I get the full expected > > concurrency: > > > > > > > url="http://localhost:1985" jobmanager=""/> > > passive > > > > 4 > > 3.00 > > 10000 > > > > > > /dev/shm > > $rundir > > > > > > While I am not *sure* its the hookClass tag thats causing the > > throttling, but its my primary suspect. I will try to confirm, but > > Im still curious what this tag does. > > > > Thanks, > > > > Mike > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Dec 6 07:56:10 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 07:56:10 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <104908231.174379.1291643770947.JavaMail.root@zimbra.anl.gov> Lets call the release 0.10 (oh-point-ten, per Ben) and save 1.0 for something further down the road, when we are in better shape to publicize. Regarding features, lets spend all our effort on testing to make sure the release works on an important set of systems and configurations, which we should now list. Something like: local pbs (on PADS, Fusion, and a set of TG machines) sge (on Ranger, IBIcluster, and a few PSD machines) BG/P (intrepid, surveyor) OSG over Condor-G UCI systems? If possible: Sicortex Amazon EC2 BioNimbus cloud Magellan FutureGrid Im happy for the initial test to be a simple "catsn" script of N cat jobs w/ one file in and out. Each of the systems above would need a few variations of coaster vs plain and a few data staging configs: local, provider-staging, gridftp. If the doc set can be made release-specific and "testable" for 0.10 that would be great. If not, defer the per-release doc effort to 0.11. The main features I'd like to see in 0.11 would be some of the error and logging improvements, and swiftconfig/swiftrun. Maybe support for Globus Online staging. I dont want to think about those for 0.10 as I think "it works" is the most important feature for this next release. Lets define a test plan for the configs above and focus on how to automatically and repeatedly (ie nightly) validate the release on all these configs. That means a set of sites/tc/properties files and the integration of the tests, ideally, into the test harness. - Mike ----- Original Message ----- so, my expectation for the release, as we've discussed somewhat on the list already, is to put out swift 1.0 on 12/20 which, as i see it, involves primarily editing of the documentation/web content more so than anything else since all new code (and documentation associated with the new code) going into trunk is expected to be in the 1.1. release--which hopefully we can have out in the next few months. i'm also assuming we're sticking with the plan to allow each release to have its own doc version along with the code. let me know if anyone thinks there are other things that can/should go into the 12/20 release. ~sk On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: All, Sarah is going to take the lead in producing the next Swift release, and will propose a release definition and plan. We want to have the release done by Dec 20. - Mike _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Mon Dec 6 09:51:27 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 6 Dec 2010 09:51:27 -0600 (Central Standard Time) Subject: [Swift-devel] Next Swift release In-Reply-To: References: <577622484.117049.1290542892759.JavaMail.root@zimbra.anl.gov> <1278622160.117059.1290543042855.JavaMail.root@zimbra.anl.gov> Message-ID: Sounds great- I was actually thinking about setting up the branch-specific docs later this week, do you already have a start on that? On Mon, 6 Dec 2010, Sarah Kenny wrote: > so, my expectation for the release, as we've discussed somewhat on the list > already, is to put out swift 1.0 on 12/20 which, as i see it, involves > primarily editing of the documentation/web content more so than anything > else since all new code (and documentation associated with the new code) > going into trunk is expected to be in the 1.1. release--which hopefully we > can have out in the next few months. i'm also assuming we're sticking with > the plan to allow each release to have its own doc version along with the > code. > > let me know if anyone thinks there are other things that can/should go into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde wrote: > >> All, >> >> Sarah is going to take the lead in producing the next Swift release, and >> will propose a release definition and plan. We want to have the release done >> by Dec 20. >> >> - Mike >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > -- Justin M Wozniak From hategan at mcs.anl.gov Mon Dec 6 12:01:25 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Dec 2010 10:01:25 -0800 Subject: [Swift-devel] Re: Does hookClass sites tag affect concurrency/throttling? In-Reply-To: <669415963.174296.1291642529245.JavaMail.root@zimbra.anl.gov> References: <669415963.174296.1291642529245.JavaMail.root@zimbra.anl.gov> Message-ID: <1291658485.26461.1.camel@blabla2.none> It should not, because that call happens in a thread separate from the job submission one, but while we're at it, it should also return as soon as possible (such that the block management thread doesn't get stuck doing stuff it wasn't supposed to do). Mihael On Mon, 2010-12-06 at 07:35 -0600, Michael Wilde wrote: > Note that I'm not certain that the hookClass is indeed the cause of the throttling. > > - Mike > > ----- Original Message ----- > > This is a callback that is triggered by coasters when a Block is > > allocated. This result is a real surprise, I'll check this out right > > away. > > > > On Sun, 5 Dec 2010, Michael Wilde wrote: > > > > > Justin, what does this line in sites.xml do on the BG/P? > > > > > > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > > > > I think I got that from one of your examples for BG/P use; when I > > > use > > > it, I seem to get throttled to about 20 active jobs max. When I take > > > it > > > off, I seem to be able to utilize all CPUs in a pset (256 cores in > > > my > > > last test). > > > > > > My full pool element which seems to be limiting me to about 20 > > > active is > > > this (despite the fact that the throttle should let 276 jobs run at > > > once): > > > > > > > > > > > > > > url="http://localhost:1985" jobmanager=""/> > > > passive > > > > > > > > key="project">HTCScienceApps > > > default > > > zeptoos > > > true > > > 2.75 > > > 10000 > > > 4 > > > 1 > > > 3600 > > > 64 > > > 64 > > > > > key="hookClass">org.globus.swift.data.policy.AllocationHook > > > /dev/shm > > > $rundir > > > > > > > > > When I use the following pool element, I get the full expected > > > concurrency: > > > > > > > > > > > url="http://localhost:1985" jobmanager=""/> > > > passive > > > > > > 4 > > > 3.00 > > > 10000 > > > > > > > > > /dev/shm > > > $rundir > > > > > > > > > While I am not *sure* its the hookClass tag thats causing the > > > throttling, but its my primary suspect. I will try to confirm, but > > > Im still curious what this tag does. > > > > > > Thanks, > > > > > > Mike > > > > > > > > > > -- > > Justin M Wozniak > From skenny at uchicago.edu Mon Dec 6 13:48:39 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 6 Dec 2010 11:48:39 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <577622484.117049.1290542892759.JavaMail.root@zimbra.anl.gov> <1278622160.117059.1290543042855.JavaMail.root@zimbra.anl.gov> Message-ID: feel free, justin. i'm currently editing stuff that i think should go into doc for the 12/20 release (e.g. describing features that exist but aren't documented, etc.). so, branch 1.0 will become release 0.10...seems a bit confusing to me...also considering the differences between 0.9 and what we're releasing doesn't calling it 1.0 make sense? just a thought... ~sk On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak wrote: > > Sounds great- I was actually thinking about setting up the branch-specific > docs later this week, do you already have a start on that? > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > so, my expectation for the release, as we've discussed somewhat on the >> list >> already, is to put out swift 1.0 on 12/20 which, as i see it, involves >> primarily editing of the documentation/web content more so than anything >> else since all new code (and documentation associated with the new code) >> going into trunk is expected to be in the 1.1. release--which hopefully we >> can have out in the next few months. i'm also assuming we're sticking with >> the plan to allow each release to have its own doc version along with the >> code. >> >> let me know if anyone thinks there are other things that can/should go >> into >> the 12/20 release. >> >> ~sk >> >> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde wrote: >> >> All, >>> >>> Sarah is going to take the lead in producing the next Swift release, and >>> will propose a release definition and plan. We want to have the release >>> done >>> by Dec 20. >>> >>> - Mike >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> > -- > Justin M Wozniak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Dec 6 14:01:24 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 14:01:24 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> Im loosing track, but I thought trunk will become branch 0.10? I wanted to name it based on what we're trying to say to the user community: this next release I feel is still pre-1.0 quality. After more doc cleanup and usability cleanup and web polishing, I feel we're ready to try to make a broader announcement and call it 1.0. Im thinking end of this January for that. - Mike ----- Original Message ----- feel free, justin. i'm currently editing stuff that i think should go into doc for the 12/20 release (e.g. describing features that exist but aren't documented, etc.). so, branch 1.0 will become release 0.10...seems a bit confusing to me...also considering the differences between 0.9 and what we're releasing doesn't calling it 1.0 make sense? just a thought... ~sk On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < wozniak at mcs.anl.gov > wrote: Sounds great- I was actually thinking about setting up the branch-specific docs later this week, do you already have a start on that? On Mon, 6 Dec 2010, Sarah Kenny wrote: so, my expectation for the release, as we've discussed somewhat on the list already, is to put out swift 1.0 on 12/20 which, as i see it, involves primarily editing of the documentation/web content more so than anything else since all new code (and documentation associated with the new code) going into trunk is expected to be in the 1.1. release--which hopefully we can have out in the next few months. i'm also assuming we're sticking with the plan to allow each release to have its own doc version along with the code. let me know if anyone thinks there are other things that can/should go into the 12/20 release. ~sk On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: All, Sarah is going to take the lead in producing the next Swift release, and will propose a release definition and plan. We want to have the release done by Dec 20. - Mike _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Mon Dec 6 14:35:24 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 6 Dec 2010 12:35:24 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> References: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> Message-ID: here's how i understand it (feel free to correct me): 1.0 is the most recent stable branch ready for release--it's probably what most people *should* be downloading now if they want to start using swift, though our web site still has the 1.5 yr old .9 listed as the release download. trunk contains 'bleeding edge' code. for a 12/20 release we'd want to release something that does not have any new features currently being added to it (just bug fixes). i'm suggesting that we do add *some* new doc since that won't break anything and we need to do some cleanup there. but documenation for new features should go into the latest trunk doc. if we want to look at releasing what's in trunk RIGHT NOW, it seems to be it should be brached and go into testing mode if we want to get it to a point where it's stable enough to release (?) that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i suggested .10 was rather confusing as a name for it. thoughts? ~sk On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde wrote: > Im loosing track, but I thought trunk will become branch 0.10? > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After more > doc cleanup and usability cleanup and web polishing, I feel we're ready to > try to make a broader announcement and call it 1.0. Im thinking end of this > January for that. > > - Mike > > ------------------------------ > > feel free, justin. i'm currently editing stuff that i think should go into > doc for the 12/20 release (e.g. describing features that exist but aren't > documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit confusing to > me...also considering the differences between 0.9 and what we're releasing > doesn't calling it 1.0 make sense? just a thought... > > ~sk > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak wrote: > >> >> Sounds great- I was actually thinking about setting up the branch-specific >> docs later this week, do you already have a start on that? >> >> >> On Mon, 6 Dec 2010, Sarah Kenny wrote: >> >> so, my expectation for the release, as we've discussed somewhat on the >>> list >>> already, is to put out swift 1.0 on 12/20 which, as i see it, involves >>> primarily editing of the documentation/web content more so than anything >>> else since all new code (and documentation associated with the new code) >>> going into trunk is expected to be in the 1.1. release--which hopefully >>> we >>> can have out in the next few months. i'm also assuming we're sticking >>> with >>> the plan to allow each release to have its own doc version along with the >>> code. >>> >>> let me know if anyone thinks there are other things that can/should go >>> into >>> the 12/20 release. >>> >>> ~sk >>> >>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde >>> wrote: >>> >>> All, >>>> >>>> Sarah is going to take the lead in producing the next Swift release, and >>>> will propose a release definition and plan. We want to have the release >>>> done >>>> by Dec 20. >>>> >>>> - Mike >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> >> -- >> Justin M Wozniak >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Dec 6 14:49:40 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 14:49:40 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> > here's how i understand it (feel free to correct me): > > 1.0 is the most recent stable branch ready for release--it's probably > what most people *should* be downloading now if they want to start > using swift, though our web site still has the 1.5 yr old .9 listed as > the release download. Right - and thus almost no users know about or use the 1.0 branch. I only use trunk, as do all the users that I'm working with. I believe trunk should be the basis for the 12/20 release. I do not feel we should release test what's in any of the "stable" branches. Instead I feel we should "save" the 1.0 branch for when we are ready for doing a 1.0 release: say Jan 31 2011. I propose we create an 0.10 stable branch as the release candidate for a Dec 20 0.10, and that we use tags to mark release candidates in this branch. > trunk contains 'bleeding edge' code. for a 12/20 > release we'd want to release something that does not have any new > features currently being added to it (just bug fixes). Yes - but just bug fixes over current trunk. No new features, just bug fixes from tests and any user-reported bugs. If we can make a release candidate this week, we can have users starting to test thus 0.10 RC in parallel with our testing. > i'm suggesting > that we do add *some* new doc since that won't break anything and we > need to do some cleanup there. Doc improvements for 0.10 sound good to me, but need to balance the effort required vs testing 0.10. > but documenation for new features > should go into the latest trunk doc. Agreed. But with "new features" defined as features beyond whats in trunk as of this moment. > if we want to look at releasing what's in trunk RIGHT NOW, it seems to > be it should be brached and go into testing mode if we want to get it > to a point where it's stable enough to release (?) Yes, I agree, per above. Lets branch it asap. Does tagging releae candidates on this branch seem the way to go? > that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i > suggested .10 was rather confusing as a name for it. I took the name 0.10 from a suggestion by Ben (long ago) to deal with the fact that we may need more point-releases between 0.9 and 1.0. I agree that 0.10 is a *bit* confusing, but Im hoping that this release has about a 6-week lifetime from 12/20 to 1/31. Sound OK? - Mike > thoughts? > > ~sk > > > On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > Im loosing track, but I thought trunk will become branch 0.10? > > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After > more doc cleanup and usability cleanup and web polishing, I feel we're > ready to try to make a broader announcement and call it 1.0. Im > thinking end of this January for that. > > > - Mike > > > > > > > > feel free, justin. i'm currently editing stuff that i think should go > into doc for the 12/20 release (e.g. describing features that exist > but aren't documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit confusing to > me...also considering the differences between 0.9 and what we're > releasing doesn't calling it 1.0 make sense? just a thought... > > ~sk > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > wrote: > > > > Sounds great- I was actually thinking about setting up the > branch-specific docs later this week, do you already have a start on > that? > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > so, my expectation for the release, as we've discussed somewhat on the > list > already, is to put out swift 1.0 on 12/20 which, as i see it, involves > primarily editing of the documentation/web content more so than > anything > else since all new code (and documentation associated with the new > code) > going into trunk is expected to be in the 1.1. release--which > hopefully we > can have out in the next few months. i'm also assuming we're sticking > with > the plan to allow each release to have its own doc version along with > the > code. > > let me know if anyone thinks there are other things that can/should go > into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > All, > > Sarah is going to take the lead in producing the next Swift release, > and > will propose a release definition and plan. We want to have the > release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > Justin M Wozniak > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Dec 6 15:10:00 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 15:10:00 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <2077726669.178721.1291669800951.JavaMail.root@zimbra.anl.gov> Sounds good to me, given that the release ID is "just a string" at the moment, until we publicize some meaning for the ID - which we have not done to date. - Mike ----- Original Message ----- Been following along. Just a random suggestion but perhaps if you called this next release * 0.10.0* people would realize that it's zero-point-ten-point-oh as in 0.10.0> 0.9 not zero-point-one-oh as in 0.10<0.9 -Glen On Mon, Dec 6, 2010 at 3:49 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: > here's how i understand it (feel free to correct me): > > 1.0 is the most recent stable branch ready for release--it's probably > what most people *should* be downloading now if they want to start > using swift, though our web site still has the 1.5 yr old .9 listed as > the release download. Right - and thus almost no users know about or use the 1.0 branch. I only use trunk, as do all the users that I'm working with. I believe trunk should be the basis for the 12/20 release. I do not feel we should release test what's in any of the "stable" branches. Instead I feel we should "save" the 1.0 branch for when we are ready for doing a 1.0 release: say Jan 31 2011. I propose we create an 0.10 stable branch as the release candidate for a Dec 20 0.10, and that we use tags to mark release candidates in this branch. > trunk contains 'bleeding edge' code. for a 12/20 > release we'd want to release something that does not have any new > features currently being added to it (just bug fixes). Yes - but just bug fixes over current trunk. No new features, just bug fixes from tests and any user-reported bugs. If we can make a release candidate this week, we can have users starting to test thus 0.10 RC in parallel with our testing. > i'm suggesting > that we do add *some* new doc since that won't break anything and we > need to do some cleanup there. Doc improvements for 0.10 sound good to me, but need to balance the effort required vs testing 0.10. > but documenation for new features > should go into the latest trunk doc. Agreed. But with "new features" defined as features beyond whats in trunk as of this moment. > if we want to look at releasing what's in trunk RIGHT NOW, it seems to > be it should be brached and go into testing mode if we want to get it > to a point where it's stable enough to release (?) Yes, I agree, per above. Lets branch it asap. Does tagging releae candidates on this branch seem the way to go? > that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i > suggested .10 was rather confusing as a name for it. I took the name 0.10 from a suggestion by Ben (long ago) to deal with the fact that we may need more point-releases between 0.9 and 1.0. I agree that 0.10 is a *bit* confusing, but Im hoping that this release has about a 6-week lifetime from 12/20 to 1/31. Sound OK? - Mike > thoughts? > > ~sk > > > On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > Im loosing track, but I thought trunk will become branch 0.10? > > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After > more doc cleanup and usability cleanup and web polishing, I feel we're > ready to try to make a broader announcement and call it 1.0. Im > thinking end of this January for that. > > > - Mike > > > > > > > > feel free, justin. i'm currently editing stuff that i think should go > into doc for the 12/20 release (e.g. describing features that exist > but aren't documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit confusing to > me...also considering the differences between 0.9 and what we're > releasing doesn't calling it 1.0 make sense? just a thought... > > ~sk > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > wrote: > > > > Sounds great- I was actually thinking about setting up the > branch-specific docs later this week, do you already have a start on > that? > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > so, my expectation for the release, as we've discussed somewhat on the > list > already, is to put out swift 1.0 on 12/20 which, as i see it, involves > primarily editing of the documentation/web content more so than > anything > else since all new code (and documentation associated with the new > code) > going into trunk is expected to be in the 1.1. release--which > hopefully we > can have out in the next few months. i'm also assuming we're sticking > with > the plan to allow each release to have its own doc version along with > the > code. > > let me know if anyone thinks there are other things that can/should go > into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > All, > > Sarah is going to take the lead in producing the next Swift release, > and > will propose a release definition and plan. We want to have the > release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > Justin M Wozniak > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Dec 6 15:11:09 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 15:11:09 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <1874357545.178745.1291669869507.JavaMail.root@zimbra.anl.gov> Speaking of which, Glen - can you be a friendly release-candidate tester for 0.10.0? - Mike ----- Original Message ----- Been following along. Just a random suggestion but perhaps if you called this next release * 0.10.0* people would realize that it's zero-point-ten-point-oh as in 0.10.0> 0.9 not zero-point-one-oh as in 0.10<0.9 -Glen On Mon, Dec 6, 2010 at 3:49 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: > here's how i understand it (feel free to correct me): > > 1.0 is the most recent stable branch ready for release--it's probably > what most people *should* be downloading now if they want to start > using swift, though our web site still has the 1.5 yr old .9 listed as > the release download. Right - and thus almost no users know about or use the 1.0 branch. I only use trunk, as do all the users that I'm working with. I believe trunk should be the basis for the 12/20 release. I do not feel we should release test what's in any of the "stable" branches. Instead I feel we should "save" the 1.0 branch for when we are ready for doing a 1.0 release: say Jan 31 2011. I propose we create an 0.10 stable branch as the release candidate for a Dec 20 0.10, and that we use tags to mark release candidates in this branch. > trunk contains 'bleeding edge' code. for a 12/20 > release we'd want to release something that does not have any new > features currently being added to it (just bug fixes). Yes - but just bug fixes over current trunk. No new features, just bug fixes from tests and any user-reported bugs. If we can make a release candidate this week, we can have users starting to test thus 0.10 RC in parallel with our testing. > i'm suggesting > that we do add *some* new doc since that won't break anything and we > need to do some cleanup there. Doc improvements for 0.10 sound good to me, but need to balance the effort required vs testing 0.10. > but documenation for new features > should go into the latest trunk doc. Agreed. But with "new features" defined as features beyond whats in trunk as of this moment. > if we want to look at releasing what's in trunk RIGHT NOW, it seems to > be it should be brached and go into testing mode if we want to get it > to a point where it's stable enough to release (?) Yes, I agree, per above. Lets branch it asap. Does tagging releae candidates on this branch seem the way to go? > that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i > suggested .10 was rather confusing as a name for it. I took the name 0.10 from a suggestion by Ben (long ago) to deal with the fact that we may need more point-releases between 0.9 and 1.0. I agree that 0.10 is a *bit* confusing, but Im hoping that this release has about a 6-week lifetime from 12/20 to 1/31. Sound OK? - Mike > thoughts? > > ~sk > > > On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > > Im loosing track, but I thought trunk will become branch 0.10? > > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After > more doc cleanup and usability cleanup and web polishing, I feel we're > ready to try to make a broader announcement and call it 1.0. Im > thinking end of this January for that. > > > - Mike > > > > > > > > feel free, justin. i'm currently editing stuff that i think should go > into doc for the 12/20 release (e.g. describing features that exist > but aren't documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit confusing to > me...also considering the differences between 0.9 and what we're > releasing doesn't calling it 1.0 make sense? just a thought... > > ~sk > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > wrote: > > > > Sounds great- I was actually thinking about setting up the > branch-specific docs later this week, do you already have a start on > that? > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > so, my expectation for the release, as we've discussed somewhat on the > list > already, is to put out swift 1.0 on 12/20 which, as i see it, involves > primarily editing of the documentation/web content more so than > anything > else since all new code (and documentation associated with the new > code) > going into trunk is expected to be in the 1.1. release--which > hopefully we > can have out in the next few months. i'm also assuming we're sticking > with > the plan to allow each release to have its own doc version along with > the > code. > > let me know if anyone thinks there are other things that can/should go > into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > > All, > > Sarah is going to take the lead in producing the next Swift release, > and > will propose a release definition and plan. We want to have the > release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > Justin M Wozniak > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen842 at uchicago.edu Mon Dec 6 15:05:48 2010 From: glen842 at uchicago.edu (Glen Hocky) Date: Mon, 6 Dec 2010 16:05:48 -0500 Subject: [Swift-devel] Next Swift release In-Reply-To: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> References: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> Message-ID: Been following along. Just a random suggestion but perhaps if you called this next release **0.10.0* *people would realize that it's zero-point-ten-point-oh as in 0.10.0> 0.9 not zero-point-one-oh as in 0.10<0.9 -Glen On Mon, Dec 6, 2010 at 3:49 PM, Michael Wilde wrote: > > here's how i understand it (feel free to correct me): > > > > 1.0 is the most recent stable branch ready for release--it's probably > > what most people *should* be downloading now if they want to start > > using swift, though our web site still has the 1.5 yr old .9 listed as > > the release download. > > Right - and thus almost no users know about or use the 1.0 branch. > I only use trunk, as do all the users that I'm working with. > > I believe trunk should be the basis for the 12/20 release. > > I do not feel we should release test what's in any of the "stable" > branches. > > Instead I feel we should "save" the 1.0 branch for when we are ready for > doing a 1.0 release: say Jan 31 2011. > > I propose we create an 0.10 stable branch as the release candidate for a > Dec 20 0.10, and that we use tags to mark release candidates in this branch. > > > trunk contains 'bleeding edge' code. for a 12/20 > > release we'd want to release something that does not have any new > > features currently being added to it (just bug fixes). > > Yes - but just bug fixes over current trunk. No new features, just bug > fixes from tests and any user-reported bugs. If we can make a release > candidate this week, we can have users starting to test thus 0.10 RC in > parallel with our testing. > > > i'm suggesting > > that we do add *some* new doc since that won't break anything and we > > need to do some cleanup there. > > Doc improvements for 0.10 sound good to me, but need to balance the effort > required vs testing 0.10. > > > but documenation for new features > > should go into the latest trunk doc. > > Agreed. But with "new features" defined as features beyond whats in trunk > as of this moment. > > > if we want to look at releasing what's in trunk RIGHT NOW, it seems to > > be it should be brached and go into testing mode if we want to get it > > to a point where it's stable enough to release (?) > > Yes, I agree, per above. Lets branch it asap. > > Does tagging releae candidates on this branch seem the way to go? > > > that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i > > suggested .10 was rather confusing as a name for it. > > I took the name 0.10 from a suggestion by Ben (long ago) to deal with the > fact that we may need more point-releases between 0.9 and 1.0. > > I agree that 0.10 is a *bit* confusing, but Im hoping that this release has > about a 6-week lifetime from 12/20 to 1/31. > > Sound OK? > > - Mike > > > thoughts? > > > > ~sk > > > > > > On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > > > > > Im loosing track, but I thought trunk will become branch 0.10? > > > > > > I wanted to name it based on what we're trying to say to the user > > community: this next release I feel is still pre-1.0 quality. After > > more doc cleanup and usability cleanup and web polishing, I feel we're > > ready to try to make a broader announcement and call it 1.0. Im > > thinking end of this January for that. > > > > > > - Mike > > > > > > > > > > > > > > > > feel free, justin. i'm currently editing stuff that i think should go > > into doc for the 12/20 release (e.g. describing features that exist > > but aren't documented, etc.). > > > > so, branch 1.0 will become release 0.10...seems a bit confusing to > > me...also considering the differences between 0.9 and what we're > > releasing doesn't calling it 1.0 make sense? just a thought... > > > > ~sk > > > > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Sounds great- I was actually thinking about setting up the > > branch-specific docs later this week, do you already have a start on > > that? > > > > > > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > > > > > so, my expectation for the release, as we've discussed somewhat on the > > list > > already, is to put out swift 1.0 on 12/20 which, as i see it, involves > > primarily editing of the documentation/web content more so than > > anything > > else since all new code (and documentation associated with the new > > code) > > going into trunk is expected to be in the 1.1. release--which > > hopefully we > > can have out in the next few months. i'm also assuming we're sticking > > with > > the plan to allow each release to have its own doc version along with > > the > > code. > > > > let me know if anyone thinks there are other things that can/should go > > into > > the 12/20 release. > > > > ~sk > > > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > > > All, > > > > Sarah is going to take the lead in producing the next Swift release, > > and > > will propose a release definition and plan. We want to have the > > release done > > by Dec 20. > > > > - Mike > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > Justin M Wozniak > > > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Dec 6 15:22:47 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Dec 2010 15:22:47 -0600 Subject: [Swift-devel] Next Swift release In-Reply-To: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> References: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> Message-ID: <4CFD5427.4000907@cs.uchicago.edu> How about 0.91, or 0.95? -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= On 12/6/2010 2:49 PM, Michael Wilde wrote: >> here's how i understand it (feel free to correct me): >> >> 1.0 is the most recent stable branch ready for release--it's probably >> what most people *should* be downloading now if they want to start >> using swift, though our web site still has the 1.5 yr old .9 listed as >> the release download. > Right - and thus almost no users know about or use the 1.0 branch. > I only use trunk, as do all the users that I'm working with. > > I believe trunk should be the basis for the 12/20 release. > > I do not feel we should release test what's in any of the "stable" branches. > > Instead I feel we should "save" the 1.0 branch for when we are ready for doing a 1.0 release: say Jan 31 2011. > > I propose we create an 0.10 stable branch as the release candidate for a Dec 20 0.10, and that we use tags to mark release candidates in this branch. > >> trunk contains 'bleeding edge' code. for a 12/20 >> release we'd want to release something that does not have any new >> features currently being added to it (just bug fixes). > Yes - but just bug fixes over current trunk. No new features, just bug fixes from tests and any user-reported bugs. If we can make a release candidate this week, we can have users starting to test thus 0.10 RC in parallel with our testing. > >> i'm suggesting >> that we do add *some* new doc since that won't break anything and we >> need to do some cleanup there. > Doc improvements for 0.10 sound good to me, but need to balance the effort required vs testing 0.10. > >> but documenation for new features >> should go into the latest trunk doc. > Agreed. But with "new features" defined as features beyond whats in trunk as of this moment. > >> if we want to look at releasing what's in trunk RIGHT NOW, it seems to >> be it should be brached and go into testing mode if we want to get it >> to a point where it's stable enough to release (?) > Yes, I agree, per above. Lets branch it asap. > > Does tagging releae candidates on this branch seem the way to go? > >> that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i >> suggested .10 was rather confusing as a name for it. > I took the name 0.10 from a suggestion by Ben (long ago) to deal with the fact that we may need more point-releases between 0.9 and 1.0. > > I agree that 0.10 is a *bit* confusing, but Im hoping that this release has about a 6-week lifetime from 12/20 to 1/31. > > Sound OK? > > - Mike > >> thoughts? >> >> ~sk >> >> >> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> >> wrote: >> >> >> >> >> Im loosing track, but I thought trunk will become branch 0.10? >> >> >> I wanted to name it based on what we're trying to say to the user >> community: this next release I feel is still pre-1.0 quality. After >> more doc cleanup and usability cleanup and web polishing, I feel we're >> ready to try to make a broader announcement and call it 1.0. Im >> thinking end of this January for that. >> >> >> - Mike >> >> >> >> >> >> >> >> feel free, justin. i'm currently editing stuff that i think should go >> into doc for the 12/20 release (e.g. describing features that exist >> but aren't documented, etc.). >> >> so, branch 1.0 will become release 0.10...seems a bit confusing to >> me...also considering the differences between 0.9 and what we're >> releasing doesn't calling it 1.0 make sense? just a thought... >> >> ~sk >> >> >> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< wozniak at mcs.anl.gov >>> wrote: >> >> >> Sounds great- I was actually thinking about setting up the >> branch-specific docs later this week, do you already have a start on >> that? >> >> >> >> >> On Mon, 6 Dec 2010, Sarah Kenny wrote: >> >> >> >> so, my expectation for the release, as we've discussed somewhat on the >> list >> already, is to put out swift 1.0 on 12/20 which, as i see it, involves >> primarily editing of the documentation/web content more so than >> anything >> else since all new code (and documentation associated with the new >> code) >> going into trunk is expected to be in the 1.1. release--which >> hopefully we >> can have out in the next few months. i'm also assuming we're sticking >> with >> the plan to allow each release to have its own doc version along with >> the >> code. >> >> let me know if anyone thinks there are other things that can/should go >> into >> the 12/20 release. >> >> ~sk >> >> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> >> wrote: >> >> >> >> All, >> >> Sarah is going to take the lead in producing the next Swift release, >> and >> will propose a release definition and plan. We want to have the >> release done >> by Dec 20. >> >> - Mike >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> -- >> Justin M Wozniak >> >> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory From wilde at mcs.anl.gov Mon Dec 6 15:34:48 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 15:34:48 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <286224047.179006.1291671288684.JavaMail.root@zimbra.anl.gov> Im much more focused on the content than on what we call it. Im happy to call it anything except 1.0 Thus 0.91 would be fine by me - to leave some headroom for more point releases if we're not ready for 1.0 at end of Jan. Do people like that better? Lets decide this asap so we can make the branch and testable RC. - Mike ----- Original Message ----- > I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist > part and turn on the software engineer part of my brain :) > > just like how GNOME is now 2.28 , 2.30 , etc. > > But i do agree this type of numbering confuses scientists who are the > main users of this software. > > -Allan > > 2010/12/6 Ioan Raicu : > > How about 0.91, or 0.95? > > > > > > > > > > On 12/6/2010 2:49 PM, Michael Wilde wrote: > >>> > >>> here's how i understand it (feel free to correct me): > >>> > >>> 1.0 is the most recent stable branch ready for release--it's > >>> probably > >>> what most people *should* be downloading now if they want to start > >>> using swift, though our web site still has the 1.5 yr old .9 > >>> listed as > >>> the release download. > >> > >> Right - and thus almost no users know about or use the 1.0 branch. > >> I only use trunk, as do all the users that I'm working with. > >> > >> I believe trunk should be the basis for the 12/20 release. > >> > >> I do not feel we should release test what's in any of the "stable" > >> branches. > >> > >> Instead I feel we should "save" the 1.0 branch for when we are > >> ready for > >> doing a 1.0 release: say Jan 31 2011. > >> > >> I propose we create an 0.10 stable branch as the release candidate > >> for a > >> Dec 20 0.10, and that we use tags to mark release candidates in > >> this branch. > >> > >>> trunk contains 'bleeding edge' code. for a 12/20 > >>> release we'd want to release something that does not have any new > >>> features currently being added to it (just bug fixes). > >> > >> Yes - but just bug fixes over current trunk. No new features, just > >> bug > >> fixes from tests and any user-reported bugs. If we can make a > >> release > >> candidate this week, we can have users starting to test thus 0.10 > >> RC in > >> parallel with our testing. > >> > >>> i'm suggesting > >>> that we do add *some* new doc since that won't break anything and > >>> we > >>> need to do some cleanup there. > >> > >> Doc improvements for 0.10 sound good to me, but need to balance the > >> effort > >> required vs testing 0.10. > >> > >>> but documenation for new features > >>> should go into the latest trunk doc. > >> > >> Agreed. But with "new features" defined as features beyond whats in > >> trunk > >> as of this moment. > >> > >>> if we want to look at releasing what's in trunk RIGHT NOW, it > >>> seems to > >>> be it should be brached and go into testing mode if we want to get > >>> it > >>> to a point where it's stable enough to release (?) > >> > >> Yes, I agree, per above. Lets branch it asap. > >> > >> Does tagging releae candidates on this branch seem the way to go? > >> > >>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is > >>> why i > >>> suggested .10 was rather confusing as a name for it. > >> > >> I took the name 0.10 from a suggestion by Ben (long ago) to deal > >> with the > >> fact that we may need more point-releases between 0.9 and 1.0. > >> > >> I agree that 0.10 is a *bit* confusing, but Im hoping that this > >> release > >> has about a 6-week lifetime from 12/20 to 1/31. > >> > >> Sound OK? > >> > >> - Mike > >> > >>> thoughts? > >>> > >>> ~sk > >>> > >>> > >>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> > >>> wrote: > >>> > >>> > >>> > >>> > >>> Im loosing track, but I thought trunk will become branch 0.10? > >>> > >>> > >>> I wanted to name it based on what we're trying to say to the user > >>> community: this next release I feel is still pre-1.0 quality. > >>> After > >>> more doc cleanup and usability cleanup and web polishing, I feel > >>> we're > >>> ready to try to make a broader announcement and call it 1.0. Im > >>> thinking end of this January for that. > >>> > >>> > >>> - Mike > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> feel free, justin. i'm currently editing stuff that i think should > >>> go > >>> into doc for the 12/20 release (e.g. describing features that > >>> exist > >>> but aren't documented, etc.). > >>> > >>> so, branch 1.0 will become release 0.10...seems a bit confusing to > >>> me...also considering the differences between 0.9 and what we're > >>> releasing doesn't calling it 1.0 make sense? just a thought... > >>> > >>> ~sk > >>> > >>> > >>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< > >>> wozniak at mcs.anl.gov > >>>> > >>>> wrote: > >>> > >>> > >>> Sounds great- I was actually thinking about setting up the > >>> branch-specific docs later this week, do you already have a start > >>> on > >>> that? > >>> > >>> > >>> > >>> > >>> On Mon, 6 Dec 2010, Sarah Kenny wrote: > >>> > >>> > >>> > >>> so, my expectation for the release, as we've discussed somewhat on > >>> the > >>> list > >>> already, is to put out swift 1.0 on 12/20 which, as i see it, > >>> involves > >>> primarily editing of the documentation/web content more so than > >>> anything > >>> else since all new code (and documentation associated with the new > >>> code) > >>> going into trunk is expected to be in the 1.1. release--which > >>> hopefully we > >>> can have out in the next few months. i'm also assuming we're > >>> sticking > >>> with > >>> the plan to allow each release to have its own doc version along > >>> with > >>> the > >>> code. > >>> > >>> let me know if anyone thinks there are other things that > >>> can/should go > >>> into > >>> the 12/20 release. > >>> > >>> ~sk > >>> > >>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> > >>> wrote: > >>> > >>> > >>> > >>> All, > >>> > >>> Sarah is going to take the lead in producing the next Swift > >>> release, > >>> and > >>> will propose a release definition and plan. We want to have the > >>> release done > >>> by Dec 20. > >>> > >>> - Mike > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >>> > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Mon Dec 6 15:29:09 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 6 Dec 2010 15:29:09 -0600 Subject: [Swift-devel] Next Swift release In-Reply-To: <4CFD5427.4000907@cs.uchicago.edu> References: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> <4CFD5427.4000907@cs.uchicago.edu> Message-ID: I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist part and turn on the software engineer part of my brain :) just like how GNOME is now 2.28 , 2.30 , etc. But i do agree this type of numbering confuses scientists who are the main users of this software. -Allan 2010/12/6 Ioan Raicu : > How about 0.91, or 0.95? > > > > > On 12/6/2010 2:49 PM, Michael Wilde wrote: >>> >>> here's how i understand it (feel free to correct me): >>> >>> 1.0 is the most recent stable branch ready for release--it's probably >>> what most people *should* be downloading now if they want to start >>> using swift, though our web site still has the 1.5 yr old .9 listed as >>> the release download. >> >> Right - and thus almost no users know about or use the 1.0 branch. >> I only use trunk, as do all the users that I'm working with. >> >> I believe trunk should be the basis for the 12/20 release. >> >> I do not feel we should release test what's in any of the "stable" >> branches. >> >> Instead I feel we should "save" the 1.0 branch for when we are ready for >> doing a 1.0 release: say Jan 31 2011. >> >> I propose we create an 0.10 stable branch as the release candidate for a >> Dec 20 0.10, and that we use tags to mark release candidates in this branch. >> >>> trunk contains 'bleeding edge' code. for a 12/20 >>> release we'd want to release something that does not have any new >>> features currently being added to it (just bug fixes). >> >> Yes - but just bug fixes over current trunk. ?No new features, just bug >> fixes from tests and any user-reported bugs. ?If we can make a release >> candidate this week, we can have users starting to test thus 0.10 RC in >> parallel with our testing. >> >>> i'm suggesting >>> that we do add *some* new doc since that won't break anything and we >>> need to do some cleanup there. >> >> Doc improvements for 0.10 sound good to me, but need to balance the effort >> required vs testing 0.10. >> >>> but documenation for new features >>> should go into the latest trunk doc. >> >> Agreed. ?But with "new features" defined as features beyond whats in trunk >> as of this moment. >> >>> if we want to look at releasing what's in trunk RIGHT NOW, it seems to >>> be it should be brached and go into testing mode if we want to get it >>> to a point where it's stable enough to release (?) >> >> Yes, I agree, per above. Lets branch it asap. >> >> Does tagging releae candidates on this branch seem the way to go? >> >>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i >>> suggested .10 was rather confusing as a name for it. >> >> I took the name 0.10 from a suggestion by Ben (long ago) to deal with the >> fact that we may need more point-releases between 0.9 and 1.0. >> >> I agree that 0.10 is a *bit* confusing, but Im hoping that this release >> has about a 6-week lifetime from 12/20 to 1/31. >> >> Sound OK? >> >> - Mike >> >>> thoughts? >>> >>> ~sk >>> >>> >>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< ?wilde at mcs.anl.gov> >>> wrote: >>> >>> >>> >>> >>> Im loosing track, but I thought trunk will become branch 0.10? >>> >>> >>> I wanted to name it based on what we're trying to say to the user >>> community: this next release I feel is still pre-1.0 quality. After >>> more doc cleanup and usability cleanup and web polishing, I feel we're >>> ready to try to make a broader announcement and call it 1.0. Im >>> thinking end of this January for that. >>> >>> >>> - Mike >>> >>> >>> >>> >>> >>> >>> >>> feel free, justin. i'm currently editing stuff that i think should go >>> into doc for the 12/20 release (e.g. describing features that exist >>> but aren't documented, etc.). >>> >>> so, branch 1.0 will become release 0.10...seems a bit confusing to >>> me...also considering the differences between 0.9 and what we're >>> releasing doesn't calling it 1.0 make sense? just a thought... >>> >>> ~sk >>> >>> >>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< ?wozniak at mcs.anl.gov >>>> >>>> wrote: >>> >>> >>> Sounds great- I was actually thinking about setting up the >>> branch-specific docs later this week, do you already have a start on >>> that? >>> >>> >>> >>> >>> On Mon, 6 Dec 2010, Sarah Kenny wrote: >>> >>> >>> >>> so, my expectation for the release, as we've discussed somewhat on the >>> list >>> already, is to put out swift 1.0 on 12/20 which, as i see it, involves >>> primarily editing of the documentation/web content more so than >>> anything >>> else since all new code (and documentation associated with the new >>> code) >>> going into trunk is expected to be in the 1.1. release--which >>> hopefully we >>> can have out in the next few months. i'm also assuming we're sticking >>> with >>> the plan to allow each release to have its own doc version along with >>> the >>> code. >>> >>> let me know if anyone thinks there are other things that can/should go >>> into >>> the 12/20 release. >>> >>> ~sk >>> >>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< ?wilde at mcs.anl.gov> >>> wrote: >>> >>> >>> >>> All, >>> >>> Sarah is going to take the lead in producing the next Swift release, >>> and >>> will propose a release definition and plan. We want to have the >>> release done >>> by Dec 20. >>> >>> - Mike >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wozniak at mcs.anl.gov Mon Dec 6 15:36:01 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 6 Dec 2010 15:36:01 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: <286224047.179006.1291671288684.JavaMail.root@zimbra.anl.gov> References: <286224047.179006.1291671288684.JavaMail.root@zimbra.anl.gov> Message-ID: I think 0.91 makes sense. On Mon, 6 Dec 2010, Michael Wilde wrote: > Im much more focused on the content than on what we call it. > > Im happy to call it anything except 1.0 > > Thus 0.91 would be fine by me - to leave some headroom for more point releases if we're not ready for 1.0 at end of Jan. > > Do people like that better? > > Lets decide this asap so we can make the branch and testable RC. > > - Mike > > > ----- Original Message ----- >> I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist >> part and turn on the software engineer part of my brain :) >> >> just like how GNOME is now 2.28 , 2.30 , etc. >> >> But i do agree this type of numbering confuses scientists who are the >> main users of this software. >> >> -Allan >> >> 2010/12/6 Ioan Raicu : >>> How about 0.91, or 0.95? >>> >>> >>> >>> >>> On 12/6/2010 2:49 PM, Michael Wilde wrote: >>>>> >>>>> here's how i understand it (feel free to correct me): >>>>> >>>>> 1.0 is the most recent stable branch ready for release--it's >>>>> probably >>>>> what most people *should* be downloading now if they want to start >>>>> using swift, though our web site still has the 1.5 yr old .9 >>>>> listed as >>>>> the release download. >>>> >>>> Right - and thus almost no users know about or use the 1.0 branch. >>>> I only use trunk, as do all the users that I'm working with. >>>> >>>> I believe trunk should be the basis for the 12/20 release. >>>> >>>> I do not feel we should release test what's in any of the "stable" >>>> branches. >>>> >>>> Instead I feel we should "save" the 1.0 branch for when we are >>>> ready for >>>> doing a 1.0 release: say Jan 31 2011. >>>> >>>> I propose we create an 0.10 stable branch as the release candidate >>>> for a >>>> Dec 20 0.10, and that we use tags to mark release candidates in >>>> this branch. >>>> >>>>> trunk contains 'bleeding edge' code. for a 12/20 >>>>> release we'd want to release something that does not have any new >>>>> features currently being added to it (just bug fixes). >>>> >>>> Yes - but just bug fixes over current trunk. No new features, just >>>> bug >>>> fixes from tests and any user-reported bugs. If we can make a >>>> release >>>> candidate this week, we can have users starting to test thus 0.10 >>>> RC in >>>> parallel with our testing. >>>> >>>>> i'm suggesting >>>>> that we do add *some* new doc since that won't break anything and >>>>> we >>>>> need to do some cleanup there. >>>> >>>> Doc improvements for 0.10 sound good to me, but need to balance the >>>> effort >>>> required vs testing 0.10. >>>> >>>>> but documenation for new features >>>>> should go into the latest trunk doc. >>>> >>>> Agreed. But with "new features" defined as features beyond whats in >>>> trunk >>>> as of this moment. >>>> >>>>> if we want to look at releasing what's in trunk RIGHT NOW, it >>>>> seems to >>>>> be it should be brached and go into testing mode if we want to get >>>>> it >>>>> to a point where it's stable enough to release (?) >>>> >>>> Yes, I agree, per above. Lets branch it asap. >>>> >>>> Does tagging releae candidates on this branch seem the way to go? >>>> >>>>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is >>>>> why i >>>>> suggested .10 was rather confusing as a name for it. >>>> >>>> I took the name 0.10 from a suggestion by Ben (long ago) to deal >>>> with the >>>> fact that we may need more point-releases between 0.9 and 1.0. >>>> >>>> I agree that 0.10 is a *bit* confusing, but Im hoping that this >>>> release >>>> has about a 6-week lifetime from 12/20 to 1/31. >>>> >>>> Sound OK? >>>> >>>> - Mike >>>> >>>>> thoughts? >>>>> >>>>> ~sk >>>>> >>>>> >>>>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> Im loosing track, but I thought trunk will become branch 0.10? >>>>> >>>>> >>>>> I wanted to name it based on what we're trying to say to the user >>>>> community: this next release I feel is still pre-1.0 quality. >>>>> After >>>>> more doc cleanup and usability cleanup and web polishing, I feel >>>>> we're >>>>> ready to try to make a broader announcement and call it 1.0. Im >>>>> thinking end of this January for that. >>>>> >>>>> >>>>> - Mike >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> feel free, justin. i'm currently editing stuff that i think should >>>>> go >>>>> into doc for the 12/20 release (e.g. describing features that >>>>> exist >>>>> but aren't documented, etc.). >>>>> >>>>> so, branch 1.0 will become release 0.10...seems a bit confusing to >>>>> me...also considering the differences between 0.9 and what we're >>>>> releasing doesn't calling it 1.0 make sense? just a thought... >>>>> >>>>> ~sk >>>>> >>>>> >>>>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< >>>>> wozniak at mcs.anl.gov >>>>>> >>>>>> wrote: >>>>> >>>>> >>>>> Sounds great- I was actually thinking about setting up the >>>>> branch-specific docs later this week, do you already have a start >>>>> on >>>>> that? >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, 6 Dec 2010, Sarah Kenny wrote: >>>>> >>>>> >>>>> >>>>> so, my expectation for the release, as we've discussed somewhat on >>>>> the >>>>> list >>>>> already, is to put out swift 1.0 on 12/20 which, as i see it, >>>>> involves >>>>> primarily editing of the documentation/web content more so than >>>>> anything >>>>> else since all new code (and documentation associated with the new >>>>> code) >>>>> going into trunk is expected to be in the 1.1. release--which >>>>> hopefully we >>>>> can have out in the next few months. i'm also assuming we're >>>>> sticking >>>>> with >>>>> the plan to allow each release to have its own doc version along >>>>> with >>>>> the >>>>> code. >>>>> >>>>> let me know if anyone thinks there are other things that >>>>> can/should go >>>>> into >>>>> the 12/20 release. >>>>> >>>>> ~sk >>>>> >>>>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> All, >>>>> >>>>> Sarah is going to take the lead in producing the next Swift >>>>> release, >>>>> and >>>>> will propose a release definition and plan. We want to have the >>>>> release done >>>>> by Dec 20. >>>>> >>>>> - Mike >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago > > -- Justin M Wozniak From skenny at uchicago.edu Mon Dec 6 15:43:46 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 6 Dec 2010 13:43:46 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <286224047.179006.1291671288684.JavaMail.root@zimbra.anl.gov> Message-ID: honestly if we're shooting for the end of january on a 1.0 release i think it would be better to branch trunk now for testing/debugging & doc and focus our effort on that rather than to also have an intermediary 12/20 release. releasing branch 1.0 as a stable release in the interim *kind of* made sense to me since it truly is stable,ready for release and would help update our web site so we're not specifiying such an old version for download , but if we're talking about branching trunk in its current state i'd lean towards a 1.0 release on 1/31. ~sk On Mon, Dec 6, 2010 at 1:36 PM, Justin M Wozniak wrote: > > I think 0.91 makes sense. > > > On Mon, 6 Dec 2010, Michael Wilde wrote: > > Im much more focused on the content than on what we call it. >> >> Im happy to call it anything except 1.0 >> >> Thus 0.91 would be fine by me - to leave some headroom for more point >> releases if we're not ready for 1.0 at end of Jan. >> >> Do people like that better? >> >> Lets decide this asap so we can make the branch and testable RC. >> >> - Mike >> >> >> ----- Original Message ----- >> >>> I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist >>> part and turn on the software engineer part of my brain :) >>> >>> just like how GNOME is now 2.28 , 2.30 , etc. >>> >>> But i do agree this type of numbering confuses scientists who are the >>> main users of this software. >>> >>> -Allan >>> >>> 2010/12/6 Ioan Raicu : >>> >>>> How about 0.91, or 0.95? >>>> >>>> >>>> >>>> >>>> On 12/6/2010 2:49 PM, Michael Wilde wrote: >>>> >>>>> >>>>>> here's how i understand it (feel free to correct me): >>>>>> >>>>>> 1.0 is the most recent stable branch ready for release--it's >>>>>> probably >>>>>> what most people *should* be downloading now if they want to start >>>>>> using swift, though our web site still has the 1.5 yr old .9 >>>>>> listed as >>>>>> the release download. >>>>>> >>>>> >>>>> Right - and thus almost no users know about or use the 1.0 branch. >>>>> I only use trunk, as do all the users that I'm working with. >>>>> >>>>> I believe trunk should be the basis for the 12/20 release. >>>>> >>>>> I do not feel we should release test what's in any of the "stable" >>>>> branches. >>>>> >>>>> Instead I feel we should "save" the 1.0 branch for when we are >>>>> ready for >>>>> doing a 1.0 release: say Jan 31 2011. >>>>> >>>>> I propose we create an 0.10 stable branch as the release candidate >>>>> for a >>>>> Dec 20 0.10, and that we use tags to mark release candidates in >>>>> this branch. >>>>> >>>>> trunk contains 'bleeding edge' code. for a 12/20 >>>>>> release we'd want to release something that does not have any new >>>>>> features currently being added to it (just bug fixes). >>>>>> >>>>> >>>>> Yes - but just bug fixes over current trunk. No new features, just >>>>> bug >>>>> fixes from tests and any user-reported bugs. If we can make a >>>>> release >>>>> candidate this week, we can have users starting to test thus 0.10 >>>>> RC in >>>>> parallel with our testing. >>>>> >>>>> i'm suggesting >>>>>> that we do add *some* new doc since that won't break anything and >>>>>> we >>>>>> need to do some cleanup there. >>>>>> >>>>> >>>>> Doc improvements for 0.10 sound good to me, but need to balance the >>>>> effort >>>>> required vs testing 0.10. >>>>> >>>>> but documenation for new features >>>>>> should go into the latest trunk doc. >>>>>> >>>>> >>>>> Agreed. But with "new features" defined as features beyond whats in >>>>> trunk >>>>> as of this moment. >>>>> >>>>> if we want to look at releasing what's in trunk RIGHT NOW, it >>>>>> seems to >>>>>> be it should be brached and go into testing mode if we want to get >>>>>> it >>>>>> to a point where it's stable enough to release (?) >>>>>> >>>>> >>>>> Yes, I agree, per above. Lets branch it asap. >>>>> >>>>> Does tagging releae candidates on this branch seem the way to go? >>>>> >>>>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is >>>>>> why i >>>>>> suggested .10 was rather confusing as a name for it. >>>>>> >>>>> >>>>> I took the name 0.10 from a suggestion by Ben (long ago) to deal >>>>> with the >>>>> fact that we may need more point-releases between 0.9 and 1.0. >>>>> >>>>> I agree that 0.10 is a *bit* confusing, but Im hoping that this >>>>> release >>>>> has about a 6-week lifetime from 12/20 to 1/31. >>>>> >>>>> Sound OK? >>>>> >>>>> - Mike >>>>> >>>>> thoughts? >>>>>> >>>>>> ~sk >>>>>> >>>>>> >>>>>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Im loosing track, but I thought trunk will become branch 0.10? >>>>>> >>>>>> >>>>>> I wanted to name it based on what we're trying to say to the user >>>>>> community: this next release I feel is still pre-1.0 quality. >>>>>> After >>>>>> more doc cleanup and usability cleanup and web polishing, I feel >>>>>> we're >>>>>> ready to try to make a broader announcement and call it 1.0. Im >>>>>> thinking end of this January for that. >>>>>> >>>>>> >>>>>> - Mike >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> feel free, justin. i'm currently editing stuff that i think should >>>>>> go >>>>>> into doc for the 12/20 release (e.g. describing features that >>>>>> exist >>>>>> but aren't documented, etc.). >>>>>> >>>>>> so, branch 1.0 will become release 0.10...seems a bit confusing to >>>>>> me...also considering the differences between 0.9 and what we're >>>>>> releasing doesn't calling it 1.0 make sense? just a thought... >>>>>> >>>>>> ~sk >>>>>> >>>>>> >>>>>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< >>>>>> wozniak at mcs.anl.gov >>>>>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>> >>>>>> >>>>>> Sounds great- I was actually thinking about setting up the >>>>>> branch-specific docs later this week, do you already have a start >>>>>> on >>>>>> that? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, 6 Dec 2010, Sarah Kenny wrote: >>>>>> >>>>>> >>>>>> >>>>>> so, my expectation for the release, as we've discussed somewhat on >>>>>> the >>>>>> list >>>>>> already, is to put out swift 1.0 on 12/20 which, as i see it, >>>>>> involves >>>>>> primarily editing of the documentation/web content more so than >>>>>> anything >>>>>> else since all new code (and documentation associated with the new >>>>>> code) >>>>>> going into trunk is expected to be in the 1.1. release--which >>>>>> hopefully we >>>>>> can have out in the next few months. i'm also assuming we're >>>>>> sticking >>>>>> with >>>>>> the plan to allow each release to have its own doc version along >>>>>> with >>>>>> the >>>>>> code. >>>>>> >>>>>> let me know if anyone thinks there are other things that >>>>>> can/should go >>>>>> into >>>>>> the 12/20 release. >>>>>> >>>>>> ~sk >>>>>> >>>>>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> All, >>>>>> >>>>>> Sarah is going to take the lead in producing the next Swift >>>>>> release, >>>>>> and >>>>>> will propose a release definition and plan. We want to have the >>>>>> release done >>>>>> by Dec 20. >>>>>> >>>>>> - Mike >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>> -- >>> Allan M. Espinosa >>> PhD student, Computer Science >>> University of Chicago >>> > >>> >> >> >> > -- > Justin M Wozniak > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Dec 6 16:03:12 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 16:03:12 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <1047657812.179283.1291672992687.JavaMail.root@zimbra.anl.gov> I think we need to move more toward a "release often" strategy, and I think a 12/20 release is a good step in that direction. I want to stay focused on that target. To focus on that: - call it 0.91 - based on trunk as of this week (i.e. as soon as we can branch) - doc improvements as possible - decide on platforms we can test by 12/20 - test is: language tests already in test suite - platform tests to be added, based on "catsn" simple cat loop Mike ----- Original Message ----- honestly if we're shooting for the end of january on a 1.0 release i think it would be better to branch trunk now for testing/debugging & doc and focus our effort on that rather than to also have an intermediary 12/20 release. releasing branch 1.0 as a stable release in the interim *kind of* made sense to me since it truly is stable,ready for release and would help update our web site so we're not specifiying such an old version for download , but if we're talking about branching trunk in its current state i'd lean towards a 1.0 release on 1/31. ~sk On Mon, Dec 6, 2010 at 1:36 PM, Justin M Wozniak < wozniak at mcs.anl.gov > wrote: I think 0.91 makes sense. On Mon, 6 Dec 2010, Michael Wilde wrote: Im much more focused on the content than on what we call it. Im happy to call it anything except 1.0 Thus 0.91 would be fine by me - to leave some headroom for more point releases if we're not ready for 1.0 at end of Jan. Do people like that better? Lets decide this asap so we can make the branch and testable RC. - Mike ----- Original Message ----- I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist part and turn on the software engineer part of my brain :) just like how GNOME is now 2.28 , 2.30 , etc. But i do agree this type of numbering confuses scientists who are the main users of this software. -Allan 2010/12/6 Ioan Raicu < iraicu at cs.uchicago.edu >: How about 0.91, or 0.95? On 12/6/2010 2:49 PM, Michael Wilde wrote: here's how i understand it (feel free to correct me): 1.0 is the most recent stable branch ready for release--it's probably what most people *should* be downloading now if they want to start using swift, though our web site still has the 1.5 yr old .9 listed as the release download. Right - and thus almost no users know about or use the 1.0 branch. I only use trunk, as do all the users that I'm working with. I believe trunk should be the basis for the 12/20 release. I do not feel we should release test what's in any of the "stable" branches. Instead I feel we should "save" the 1.0 branch for when we are ready for doing a 1.0 release: say Jan 31 2011. I propose we create an 0.10 stable branch as the release candidate for a Dec 20 0.10, and that we use tags to mark release candidates in this branch. trunk contains 'bleeding edge' code. for a 12/20 release we'd want to release something that does not have any new features currently being added to it (just bug fixes). Yes - but just bug fixes over current trunk. No new features, just bug fixes from tests and any user-reported bugs. If we can make a release candidate this week, we can have users starting to test thus 0.10 RC in parallel with our testing. i'm suggesting that we do add *some* new doc since that won't break anything and we need to do some cleanup there. Doc improvements for 0.10 sound good to me, but need to balance the effort required vs testing 0.10. but documenation for new features should go into the latest trunk doc. Agreed. But with "new features" defined as features beyond whats in trunk as of this moment. if we want to look at releasing what's in trunk RIGHT NOW, it seems to be it should be brached and go into testing mode if we want to get it to a point where it's stable enough to release (?) Yes, I agree, per above. Lets branch it asap. Does tagging releae candidates on this branch seem the way to go? that said, .9 vs branch 1.0 is a pretty significant upgrade...is why i suggested .10 was rather confusing as a name for it. I took the name 0.10 from a suggestion by Ben (long ago) to deal with the fact that we may need more point-releases between 0.9 and 1.0. I agree that 0.10 is a *bit* confusing, but Im hoping that this release has about a 6-week lifetime from 12/20 to 1/31. Sound OK? - Mike thoughts? ~sk On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov > wrote: Im loosing track, but I thought trunk will become branch 0.10? I wanted to name it based on what we're trying to say to the user community: this next release I feel is still pre-1.0 quality. After more doc cleanup and usability cleanup and web polishing, I feel we're ready to try to make a broader announcement and call it 1.0. Im thinking end of this January for that. - Mike feel free, justin. i'm currently editing stuff that i think should go into doc for the 12/20 release (e.g. describing features that exist but aren't documented, etc.). so, branch 1.0 will become release 0.10...seems a bit confusing to me...also considering the differences between 0.9 and what we're releasing doesn't calling it 1.0 make sense? just a thought... ~sk On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< wozniak at mcs.anl.gov wrote: Sounds great- I was actually thinking about setting up the branch-specific docs later this week, do you already have a start on that? On Mon, 6 Dec 2010, Sarah Kenny wrote: so, my expectation for the release, as we've discussed somewhat on the list already, is to put out swift 1.0 on 12/20 which, as i see it, involves primarily editing of the documentation/web content more so than anything else since all new code (and documentation associated with the new code) going into trunk is expected to be in the 1.1. release--which hopefully we can have out in the next few months. i'm also assuming we're sticking with the plan to allow each release to have its own doc version along with the code. let me know if anyone thinks there are other things that can/should go into the 12/20 release. ~sk On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov > wrote: All, Sarah is going to take the lead in producing the next Swift release, and will propose a release definition and plan. We want to have the release done by Dec 20. - Mike _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Allan M. Espinosa < http://amespinosa.wordpress.com > PhD student, Computer Science University of Chicago < http://people.cs.uchicago.edu/~aespinosa > -- Justin M Wozniak _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Mon Dec 6 17:45:33 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 6 Dec 2010 15:45:33 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: <1047657812.179283.1291672992687.JavaMail.root@zimbra.anl.gov> References: <1047657812.179283.1291672992687.JavaMail.root@zimbra.anl.gov> Message-ID: alrighty, so looks like we should branch as soon as possible. miheal and justin would you be ok with branching the current trunk into a release candidate tomorrow? also, mike wilde and i were just discussing that it would be good to know if there are any significant features available in 'stable branch 1.0' that are NOT currently in trunk so that we could include them...any ideas on a good way to determine that? ~sk On Mon, Dec 6, 2010 at 2:03 PM, Michael Wilde wrote: > I think we need to move more toward a "release often" strategy, and I think > a 12/20 release is a good step in that direction. I want to stay focused on > that target. > > To focus on that: > > - call it 0.91 > - based on trunk as of this week (i.e. as soon as we can branch) > - doc improvements as possible > - decide on platforms we can test by 12/20 > - test is: language tests already in test suite > - platform tests to be added, based on "catsn" simple cat loop > > Mike > > > ------------------------------ > > honestly if we're shooting for the end of january on a 1.0 release i think > it would be better to branch trunk now for testing/debugging & doc and focus > our effort on that rather than to also have an intermediary 12/20 release. > > releasing branch 1.0 as a stable release in the interim *kind of* made > sense to me since it truly is stable,ready for release and would help update > our web site so we're not specifiying such an old version for download , but > if we're talking about branching trunk in its current state i'd lean towards > a 1.0 release on 1/31. > > ~sk > > On Mon, Dec 6, 2010 at 1:36 PM, Justin M Wozniak wrote: > >> >> I think 0.91 makes sense. >> >> >> On Mon, 6 Dec 2010, Michael Wilde wrote: >> >> Im much more focused on the content than on what we call it. >>> >>> Im happy to call it anything except 1.0 >>> >>> Thus 0.91 would be fine by me - to leave some headroom for more point >>> releases if we're not ready for 1.0 at end of Jan. >>> >>> Do people like that better? >>> >>> Lets decide this asap so we can make the branch and testable RC. >>> >>> - Mike >>> >>> >>> ----- Original Message ----- >>> >>>> I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist >>>> part and turn on the software engineer part of my brain :) >>>> >>>> just like how GNOME is now 2.28 , 2.30 , etc. >>>> >>>> But i do agree this type of numbering confuses scientists who are the >>>> main users of this software. >>>> >>>> -Allan >>>> >>>> 2010/12/6 Ioan Raicu : >>>> >>>>> How about 0.91, or 0.95? >>>>> >>>>> >>>>> >>>>> >>>>> On 12/6/2010 2:49 PM, Michael Wilde wrote: >>>>> >>>>>> >>>>>>> here's how i understand it (feel free to correct me): >>>>>>> >>>>>>> 1.0 is the most recent stable branch ready for release--it's >>>>>>> probably >>>>>>> what most people *should* be downloading now if they want to start >>>>>>> using swift, though our web site still has the 1.5 yr old .9 >>>>>>> listed as >>>>>>> the release download. >>>>>>> >>>>>> >>>>>> Right - and thus almost no users know about or use the 1.0 branch. >>>>>> I only use trunk, as do all the users that I'm working with. >>>>>> >>>>>> I believe trunk should be the basis for the 12/20 release. >>>>>> >>>>>> I do not feel we should release test what's in any of the "stable" >>>>>> branches. >>>>>> >>>>>> Instead I feel we should "save" the 1.0 branch for when we are >>>>>> ready for >>>>>> doing a 1.0 release: say Jan 31 2011. >>>>>> >>>>>> I propose we create an 0.10 stable branch as the release candidate >>>>>> for a >>>>>> Dec 20 0.10, and that we use tags to mark release candidates in >>>>>> this branch. >>>>>> >>>>>> trunk contains 'bleeding edge' code. for a 12/20 >>>>>>> release we'd want to release something that does not have any new >>>>>>> features currently being added to it (just bug fixes). >>>>>>> >>>>>> >>>>>> Yes - but just bug fixes over current trunk. No new features, just >>>>>> bug >>>>>> fixes from tests and any user-reported bugs. If we can make a >>>>>> release >>>>>> candidate this week, we can have users starting to test thus 0.10 >>>>>> RC in >>>>>> parallel with our testing. >>>>>> >>>>>> i'm suggesting >>>>>>> that we do add *some* new doc since that won't break anything and >>>>>>> we >>>>>>> need to do some cleanup there. >>>>>>> >>>>>> >>>>>> Doc improvements for 0.10 sound good to me, but need to balance the >>>>>> effort >>>>>> required vs testing 0.10. >>>>>> >>>>>> but documenation for new features >>>>>>> should go into the latest trunk doc. >>>>>>> >>>>>> >>>>>> Agreed. But with "new features" defined as features beyond whats in >>>>>> trunk >>>>>> as of this moment. >>>>>> >>>>>> if we want to look at releasing what's in trunk RIGHT NOW, it >>>>>>> seems to >>>>>>> be it should be brached and go into testing mode if we want to get >>>>>>> it >>>>>>> to a point where it's stable enough to release (?) >>>>>>> >>>>>> >>>>>> Yes, I agree, per above. Lets branch it asap. >>>>>> >>>>>> Does tagging releae candidates on this branch seem the way to go? >>>>>> >>>>>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is >>>>>>> why i >>>>>>> suggested .10 was rather confusing as a name for it. >>>>>>> >>>>>> >>>>>> I took the name 0.10 from a suggestion by Ben (long ago) to deal >>>>>> with the >>>>>> fact that we may need more point-releases between 0.9 and 1.0. >>>>>> >>>>>> I agree that 0.10 is a *bit* confusing, but Im hoping that this >>>>>> release >>>>>> has about a 6-week lifetime from 12/20 to 1/31. >>>>>> >>>>>> Sound OK? >>>>>> >>>>>> - Mike >>>>>> >>>>>> thoughts? >>>>>>> >>>>>>> ~sk >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Im loosing track, but I thought trunk will become branch 0.10? >>>>>>> >>>>>>> >>>>>>> I wanted to name it based on what we're trying to say to the user >>>>>>> community: this next release I feel is still pre-1.0 quality. >>>>>>> After >>>>>>> more doc cleanup and usability cleanup and web polishing, I feel >>>>>>> we're >>>>>>> ready to try to make a broader announcement and call it 1.0. Im >>>>>>> thinking end of this January for that. >>>>>>> >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> feel free, justin. i'm currently editing stuff that i think should >>>>>>> go >>>>>>> into doc for the 12/20 release (e.g. describing features that >>>>>>> exist >>>>>>> but aren't documented, etc.). >>>>>>> >>>>>>> so, branch 1.0 will become release 0.10...seems a bit confusing to >>>>>>> me...also considering the differences between 0.9 and what we're >>>>>>> releasing doesn't calling it 1.0 make sense? just a thought... >>>>>>> >>>>>>> ~sk >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< >>>>>>> wozniak at mcs.anl.gov >>>>>>> >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Sounds great- I was actually thinking about setting up the >>>>>>> branch-specific docs later this week, do you already have a start >>>>>>> on >>>>>>> that? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, 6 Dec 2010, Sarah Kenny wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> so, my expectation for the release, as we've discussed somewhat on >>>>>>> the >>>>>>> list >>>>>>> already, is to put out swift 1.0 on 12/20 which, as i see it, >>>>>>> involves >>>>>>> primarily editing of the documentation/web content more so than >>>>>>> anything >>>>>>> else since all new code (and documentation associated with the new >>>>>>> code) >>>>>>> going into trunk is expected to be in the 1.1. release--which >>>>>>> hopefully we >>>>>>> can have out in the next few months. i'm also assuming we're >>>>>>> sticking >>>>>>> with >>>>>>> the plan to allow each release to have its own doc version along >>>>>>> with >>>>>>> the >>>>>>> code. >>>>>>> >>>>>>> let me know if anyone thinks there are other things that >>>>>>> can/should go >>>>>>> into >>>>>>> the 12/20 release. >>>>>>> >>>>>>> ~sk >>>>>>> >>>>>>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> All, >>>>>>> >>>>>>> Sarah is going to take the lead in producing the next Swift >>>>>>> release, >>>>>>> and >>>>>>> will propose a release definition and plan. We want to have the >>>>>>> release done >>>>>>> by Dec 20. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>>> >>>>>>> >>>> -- >>>> Allan M. Espinosa >>>> PhD student, Computer Science >>>> University of Chicago >>>> > >>>> >>> >>> >>> >> -- >> Justin M Wozniak >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Mon Dec 6 17:48:00 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 6 Dec 2010 17:48:00 -0600 Subject: [Swift-devel] data channel reuse (was Re: GridFTP small-file optimizations in Swift) Message-ID: Is there a a property to configure channel reuse? 2009/5/2 Mihael Hategan : > > Data channels are re-used. Clients/connections are cached based not on > jobs but on site and time (i.e. they have a maximum idle time). > >> -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Mon Dec 6 21:26:06 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Dec 2010 19:26:06 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> References: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> Message-ID: <1291692366.27727.0.camel@blabla2.none> I am of the opinion that if we have a choice of releasing branches/1.0 now or releasing trunk at the end of January, we should release branches/1.0 now. Mihael On Mon, 2010-12-06 at 14:01 -0600, Michael Wilde wrote: > Im loosing track, but I thought trunk will become branch 0.10? > > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After > more doc cleanup and usability cleanup and web polishing, I feel we're > ready to try to make a broader announcement and call it 1.0. Im > thinking end of this January for that. > > > - Mike > > > ______________________________________________________________________ > feel free, justin. i'm currently editing stuff that i think > should go into doc for the 12/20 release (e.g. describing > features that exist but aren't documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit > confusing to me...also considering the differences between 0.9 > and what we're releasing doesn't calling it 1.0 make sense? > just a thought... > > ~sk > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak > wrote: > > Sounds great- I was actually thinking about setting up > the branch-specific docs later this week, do you > already have a start on that? > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > so, my expectation for the release, as we've > discussed somewhat on the list > already, is to put out swift 1.0 on 12/20 > which, as i see it, involves > primarily editing of the documentation/web > content more so than anything > else since all new code (and documentation > associated with the new code) > going into trunk is expected to be in the 1.1. > release--which hopefully we > can have out in the next few months. i'm also > assuming we're sticking with > the plan to allow each release to have its own > doc version along with the > code. > > let me know if anyone thinks there are other > things that can/should go into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde > wrote: > > All, > > Sarah is going to take the lead in > producing the next Swift release, and > will propose a release definition and > plan. We want to have the release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Justin M Wozniak > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon Dec 6 21:27:03 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Dec 2010 19:27:03 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <1435030155.178494.1291668580454.JavaMail.root@zimbra.anl.gov> Message-ID: <1291692423.27727.1.camel@blabla2.none> I like what Glen says. Mihael On Mon, 2010-12-06 at 16:05 -0500, Glen Hocky wrote: > Been following along. Just a random suggestion but perhaps if you > called this next release *0.10.0* people would realize that it's > zero-point-ten-point-oh as in 0.10.0> 0.9 not zero-point-one-oh as in > 0.10<0.9 > > > -Glen > > On Mon, Dec 6, 2010 at 3:49 PM, Michael Wilde > wrote: > > here's how i understand it (feel free to correct me): > > > > 1.0 is the most recent stable branch ready for release--it's > probably > > what most people *should* be downloading now if they want to > start > > using swift, though our web site still has the 1.5 yr old .9 > listed as > > the release download. > > > Right - and thus almost no users know about or use the 1.0 > branch. > I only use trunk, as do all the users that I'm working with. > > I believe trunk should be the basis for the 12/20 release. > > I do not feel we should release test what's in any of the > "stable" branches. > > Instead I feel we should "save" the 1.0 branch for when we are > ready for doing a 1.0 release: say Jan 31 2011. > > I propose we create an 0.10 stable branch as the release > candidate for a Dec 20 0.10, and that we use tags to mark > release candidates in this branch. > > > trunk contains 'bleeding edge' code. for a 12/20 > > release we'd want to release something that does not have > any new > > features currently being added to it (just bug fixes). > > > Yes - but just bug fixes over current trunk. No new features, > just bug fixes from tests and any user-reported bugs. If we > can make a release candidate this week, we can have users > starting to test thus 0.10 RC in parallel with our testing. > > > i'm suggesting > > that we do add *some* new doc since that won't break > anything and we > > need to do some cleanup there. > > > Doc improvements for 0.10 sound good to me, but need to > balance the effort required vs testing 0.10. > > > but documenation for new features > > should go into the latest trunk doc. > > > Agreed. But with "new features" defined as features beyond > whats in trunk as of this moment. > > > if we want to look at releasing what's in trunk RIGHT NOW, > it seems to > > be it should be brached and go into testing mode if we want > to get it > > to a point where it's stable enough to release (?) > > > Yes, I agree, per above. Lets branch it asap. > > Does tagging releae candidates on this branch seem the way to > go? > > > that said, .9 vs branch 1.0 is a pretty significant > upgrade...is why i > > suggested .10 was rather confusing as a name for it. > > > I took the name 0.10 from a suggestion by Ben (long ago) to > deal with the fact that we may need more point-releases > between 0.9 and 1.0. > > I agree that 0.10 is a *bit* confusing, but Im hoping that > this release has about a 6-week lifetime from 12/20 to 1/31. > > Sound OK? > > - Mike > > > > thoughts? > > > > ~sk > > > > > > On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde < > wilde at mcs.anl.gov > > > wrote: > > > > > > > > > > Im loosing track, but I thought trunk will become branch > 0.10? > > > > > > I wanted to name it based on what we're trying to say to the > user > > community: this next release I feel is still pre-1.0 > quality. After > > more doc cleanup and usability cleanup and web polishing, I > feel we're > > ready to try to make a broader announcement and call it 1.0. > Im > > thinking end of this January for that. > > > > > > - Mike > > > > > > > > > > > > > > > > feel free, justin. i'm currently editing stuff that i think > should go > > into doc for the 12/20 release (e.g. describing features > that exist > > but aren't documented, etc.). > > > > so, branch 1.0 will become release 0.10...seems a bit > confusing to > > me...also considering the differences between 0.9 and what > we're > > releasing doesn't calling it 1.0 make sense? just a > thought... > > > > ~sk > > > > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak < > wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Sounds great- I was actually thinking about setting up the > > branch-specific docs later this week, do you already have a > start on > > that? > > > > > > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > > > > > so, my expectation for the release, as we've discussed > somewhat on the > > list > > already, is to put out swift 1.0 on 12/20 which, as i see > it, involves > > primarily editing of the documentation/web content more so > than > > anything > > else since all new code (and documentation associated with > the new > > code) > > going into trunk is expected to be in the 1.1. > release--which > > hopefully we > > can have out in the next few months. i'm also assuming we're > sticking > > with > > the plan to allow each release to have its own doc version > along with > > the > > code. > > > > let me know if anyone thinks there are other things that > can/should go > > into > > the 12/20 release. > > > > ~sk > > > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde < > wilde at mcs.anl.gov > > > wrote: > > > > > > > > All, > > > > Sarah is going to take the lead in producing the next Swift > release, > > and > > will propose a release definition and plan. We want to have > the > > release done > > by Dec 20. > > > > - Mike > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > Justin M Wozniak > > > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon Dec 6 21:28:10 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Dec 2010 19:28:10 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <286224047.179006.1291671288684.JavaMail.root@zimbra.anl.gov> Message-ID: <1291692490.27727.2.camel@blabla2.none> Ok. I also like what Justin says. Mihael On Mon, 2010-12-06 at 15:36 -0600, Justin M Wozniak wrote: > I think 0.91 makes sense. > > On Mon, 6 Dec 2010, Michael Wilde wrote: > > > Im much more focused on the content than on what we call it. > > > > Im happy to call it anything except 1.0 > > > > Thus 0.91 would be fine by me - to leave some headroom for more point releases if we're not ready for 1.0 at end of Jan. > > > > Do people like that better? > > > > Lets decide this asap so we can make the branch and testable RC. > > > > - Mike > > > > > > ----- Original Message ----- > >> I would think 0.95 > 0.90 >> 0.10 > 0.9 if i turn off the scientist > >> part and turn on the software engineer part of my brain :) > >> > >> just like how GNOME is now 2.28 , 2.30 , etc. > >> > >> But i do agree this type of numbering confuses scientists who are the > >> main users of this software. > >> > >> -Allan > >> > >> 2010/12/6 Ioan Raicu : > >>> How about 0.91, or 0.95? > >>> > >>> > >>> > >>> > >>> On 12/6/2010 2:49 PM, Michael Wilde wrote: > >>>>> > >>>>> here's how i understand it (feel free to correct me): > >>>>> > >>>>> 1.0 is the most recent stable branch ready for release--it's > >>>>> probably > >>>>> what most people *should* be downloading now if they want to start > >>>>> using swift, though our web site still has the 1.5 yr old .9 > >>>>> listed as > >>>>> the release download. > >>>> > >>>> Right - and thus almost no users know about or use the 1.0 branch. > >>>> I only use trunk, as do all the users that I'm working with. > >>>> > >>>> I believe trunk should be the basis for the 12/20 release. > >>>> > >>>> I do not feel we should release test what's in any of the "stable" > >>>> branches. > >>>> > >>>> Instead I feel we should "save" the 1.0 branch for when we are > >>>> ready for > >>>> doing a 1.0 release: say Jan 31 2011. > >>>> > >>>> I propose we create an 0.10 stable branch as the release candidate > >>>> for a > >>>> Dec 20 0.10, and that we use tags to mark release candidates in > >>>> this branch. > >>>> > >>>>> trunk contains 'bleeding edge' code. for a 12/20 > >>>>> release we'd want to release something that does not have any new > >>>>> features currently being added to it (just bug fixes). > >>>> > >>>> Yes - but just bug fixes over current trunk. No new features, just > >>>> bug > >>>> fixes from tests and any user-reported bugs. If we can make a > >>>> release > >>>> candidate this week, we can have users starting to test thus 0.10 > >>>> RC in > >>>> parallel with our testing. > >>>> > >>>>> i'm suggesting > >>>>> that we do add *some* new doc since that won't break anything and > >>>>> we > >>>>> need to do some cleanup there. > >>>> > >>>> Doc improvements for 0.10 sound good to me, but need to balance the > >>>> effort > >>>> required vs testing 0.10. > >>>> > >>>>> but documenation for new features > >>>>> should go into the latest trunk doc. > >>>> > >>>> Agreed. But with "new features" defined as features beyond whats in > >>>> trunk > >>>> as of this moment. > >>>> > >>>>> if we want to look at releasing what's in trunk RIGHT NOW, it > >>>>> seems to > >>>>> be it should be brached and go into testing mode if we want to get > >>>>> it > >>>>> to a point where it's stable enough to release (?) > >>>> > >>>> Yes, I agree, per above. Lets branch it asap. > >>>> > >>>> Does tagging releae candidates on this branch seem the way to go? > >>>> > >>>>> that said, .9 vs branch 1.0 is a pretty significant upgrade...is > >>>>> why i > >>>>> suggested .10 was rather confusing as a name for it. > >>>> > >>>> I took the name 0.10 from a suggestion by Ben (long ago) to deal > >>>> with the > >>>> fact that we may need more point-releases between 0.9 and 1.0. > >>>> > >>>> I agree that 0.10 is a *bit* confusing, but Im hoping that this > >>>> release > >>>> has about a 6-week lifetime from 12/20 to 1/31. > >>>> > >>>> Sound OK? > >>>> > >>>> - Mike > >>>> > >>>>> thoughts? > >>>>> > >>>>> ~sk > >>>>> > >>>>> > >>>>> On Mon, Dec 6, 2010 at 12:01 PM, Michael Wilde< wilde at mcs.anl.gov> > >>>>> wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Im loosing track, but I thought trunk will become branch 0.10? > >>>>> > >>>>> > >>>>> I wanted to name it based on what we're trying to say to the user > >>>>> community: this next release I feel is still pre-1.0 quality. > >>>>> After > >>>>> more doc cleanup and usability cleanup and web polishing, I feel > >>>>> we're > >>>>> ready to try to make a broader announcement and call it 1.0. Im > >>>>> thinking end of this January for that. > >>>>> > >>>>> > >>>>> - Mike > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> feel free, justin. i'm currently editing stuff that i think should > >>>>> go > >>>>> into doc for the 12/20 release (e.g. describing features that > >>>>> exist > >>>>> but aren't documented, etc.). > >>>>> > >>>>> so, branch 1.0 will become release 0.10...seems a bit confusing to > >>>>> me...also considering the differences between 0.9 and what we're > >>>>> releasing doesn't calling it 1.0 make sense? just a thought... > >>>>> > >>>>> ~sk > >>>>> > >>>>> > >>>>> On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak< > >>>>> wozniak at mcs.anl.gov > >>>>>> > >>>>>> wrote: > >>>>> > >>>>> > >>>>> Sounds great- I was actually thinking about setting up the > >>>>> branch-specific docs later this week, do you already have a start > >>>>> on > >>>>> that? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Mon, 6 Dec 2010, Sarah Kenny wrote: > >>>>> > >>>>> > >>>>> > >>>>> so, my expectation for the release, as we've discussed somewhat on > >>>>> the > >>>>> list > >>>>> already, is to put out swift 1.0 on 12/20 which, as i see it, > >>>>> involves > >>>>> primarily editing of the documentation/web content more so than > >>>>> anything > >>>>> else since all new code (and documentation associated with the new > >>>>> code) > >>>>> going into trunk is expected to be in the 1.1. release--which > >>>>> hopefully we > >>>>> can have out in the next few months. i'm also assuming we're > >>>>> sticking > >>>>> with > >>>>> the plan to allow each release to have its own doc version along > >>>>> with > >>>>> the > >>>>> code. > >>>>> > >>>>> let me know if anyone thinks there are other things that > >>>>> can/should go > >>>>> into > >>>>> the 12/20 release. > >>>>> > >>>>> ~sk > >>>>> > >>>>> On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde< wilde at mcs.anl.gov> > >>>>> wrote: > >>>>> > >>>>> > >>>>> > >>>>> All, > >>>>> > >>>>> Sarah is going to take the lead in producing the next Swift > >>>>> release, > >>>>> and > >>>>> will propose a release definition and plan. We want to have the > >>>>> release done > >>>>> by Dec 20. > >>>>> > >>>>> - Mike > >>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>>> > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > > > > > From skenny at uchicago.edu Tue Dec 7 16:15:33 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 7 Dec 2010 14:15:33 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: <1291692366.27727.0.camel@blabla2.none> References: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> <1291692366.27727.0.camel@blabla2.none> Message-ID: mike wilde was suggesting that trunk and branches/1.0 are equally stable so he wants to do a release of what's currently in trunk (which means we need to branch it). the users i'm supporting only use branches/1.0 and have been doing so reliably for many months...the actual 'usability' of trunk i'm less familiar with but mike was hoping to put effort towards releasing that rather than spending time/effort on releasing branches/1.0 since it lacks many of the newer fixes and features. ~sk On Mon, Dec 6, 2010 at 7:26 PM, Mihael Hategan wrote: > I am of the opinion that if we have a choice of releasing branches/1.0 > now or releasing trunk at the end of January, we should release > branches/1.0 now. > > Mihael > > On Mon, 2010-12-06 at 14:01 -0600, Michael Wilde wrote: > > Im loosing track, but I thought trunk will become branch 0.10? > > > > > > I wanted to name it based on what we're trying to say to the user > > community: this next release I feel is still pre-1.0 quality. After > > more doc cleanup and usability cleanup and web polishing, I feel we're > > ready to try to make a broader announcement and call it 1.0. Im > > thinking end of this January for that. > > > > > > - Mike > > > > > > ______________________________________________________________________ > > feel free, justin. i'm currently editing stuff that i think > > should go into doc for the 12/20 release (e.g. describing > > features that exist but aren't documented, etc.). > > > > so, branch 1.0 will become release 0.10...seems a bit > > confusing to me...also considering the differences between 0.9 > > and what we're releasing doesn't calling it 1.0 make sense? > > just a thought... > > > > ~sk > > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak > > wrote: > > > > Sounds great- I was actually thinking about setting up > > the branch-specific docs later this week, do you > > already have a start on that? > > > > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > so, my expectation for the release, as we've > > discussed somewhat on the list > > already, is to put out swift 1.0 on 12/20 > > which, as i see it, involves > > primarily editing of the documentation/web > > content more so than anything > > else since all new code (and documentation > > associated with the new code) > > going into trunk is expected to be in the 1.1. > > release--which hopefully we > > can have out in the next few months. i'm also > > assuming we're sticking with > > the plan to allow each release to have its own > > doc version along with the > > code. > > > > let me know if anyone thinks there are other > > things that can/should go into > > the 12/20 release. > > > > ~sk > > > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde > > wrote: > > > > All, > > > > Sarah is going to take the lead in > > producing the next Swift release, and > > will propose a release definition and > > plan. We want to have the release done > > by Dec 20. > > > > - Mike > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Justin M Wozniak > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Dec 7 16:23:46 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 7 Dec 2010 16:23:46 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: Message-ID: <1052475130.186211.1291760626125.JavaMail.root@zimbra.anl.gov> Thats correct - from experience I feel that trunk is usable and likely on par with stable-1.0 in terms of reliability, especially for basic applications. So what Im eager to do here is to develop and run the tests we've long needed to validate a release on a set of site configs, and then run Trunk -> Stable 0.91 through those tests, and make a release by 12/20. - Mike ----- Original Message ----- mike wilde was suggesting that trunk and branches/1.0 are equally stable so he wants to do a release of what's currently in trunk (which means we need to branch it). the users i'm supporting only use branches/1.0 and have been doing so reliably for many months...the actual 'usability' of trunk i'm less familiar with but mike was hoping to put effort towards releasing that rather than spending time/effort on releasing branches/1.0 since it lacks many of the newer fixes and features. ~sk On Mon, Dec 6, 2010 at 7:26 PM, Mihael Hategan < hategan at mcs.anl.gov > wrote: I am of the opinion that if we have a choice of releasing branches/1.0 now or releasing trunk at the end of January, we should release branches/1.0 now. Mihael On Mon, 2010-12-06 at 14:01 -0600, Michael Wilde wrote: > Im loosing track, but I thought trunk will become branch 0.10? > > > I wanted to name it based on what we're trying to say to the user > community: this next release I feel is still pre-1.0 quality. After > more doc cleanup and usability cleanup and web polishing, I feel we're > ready to try to make a broader announcement and call it 1.0. Im > thinking end of this January for that. > > > - Mike > > > ______________________________________________________________________ > feel free, justin. i'm currently editing stuff that i think > should go into doc for the 12/20 release (e.g. describing > features that exist but aren't documented, etc.). > > so, branch 1.0 will become release 0.10...seems a bit > confusing to me...also considering the differences between 0.9 > and what we're releasing doesn't calling it 1.0 make sense? > just a thought... > > ~sk > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak > < wozniak at mcs.anl.gov > wrote: > > Sounds great- I was actually thinking about setting up > the branch-specific docs later this week, do you > already have a start on that? > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > so, my expectation for the release, as we've > discussed somewhat on the list > already, is to put out swift 1.0 on 12/20 > which, as i see it, involves > primarily editing of the documentation/web > content more so than anything > else since all new code (and documentation > associated with the new code) > going into trunk is expected to be in the 1.1. > release--which hopefully we > can have out in the next few months. i'm also > assuming we're sticking with > the plan to allow each release to have its own > doc version along with the > code. > > let me know if anyone thinks there are other > things that can/should go into > the 12/20 release. > > ~sk > > On Tue, Nov 23, 2010 at 2:10 PM, Michael Wilde > < wilde at mcs.anl.gov > wrote: > > All, > > Sarah is going to take the lead in > producing the next Swift release, and > will propose a release definition and > plan. We want to have the release done > by Dec 20. > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Justin M Wozniak > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Dec 7 21:36:09 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Dec 2010 19:36:09 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <1047657812.179283.1291672992687.JavaMail.root@zimbra.anl.gov> Message-ID: <1291779369.4811.2.camel@blabla2.none> On Mon, 2010-12-06 at 15:45 -0800, Sarah Kenny wrote: > alrighty, so looks like we should branch as soon as possible. miheal > and justin would you be ok with branching the current trunk into a > release candidate tomorrow? Sorry. I should have mentioned this, but this is finals week. On the up side (though that depends on perspective), I'll be in Chicago next week and stay there until Jan 1st. > also, mike wilde and i were just discussing that it would be good to > know if there are any significant features available in 'stable branch > 1.0' that are NOT currently in trunk so that we could include > them...any ideas on a good way to determine that? We (I) would need to go through it and merge any relevant bug fixes that went into the branch since it was created. Mihael From wilde at mcs.anl.gov Wed Dec 8 09:52:31 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 8 Dec 2010 09:52:31 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: <1291779369.4811.2.camel@blabla2.none> Message-ID: <4623501.188986.1291823551039.JavaMail.root@zimbra.anl.gov> Mihael, of course feel free to set all this aside till finals are past. We'll push forward on the discussion (and on other devel issues) fully understanding that you'll be focused on school this week. One item I would *like* to address next week is validating the current SGE provider on the many diverse SGE systems that users are asking for. We now also have a request to support a machine in Edinburgh ("Eddie" ;) with PE's that dont match any we've seen before. Eddie's PEs are described at: https://www.wiki.ed.ac.uk/display/ecdfwiki/Documentation (Eddie home page) This web post alludes to a method to pretty much bypass PE processing: https://www.wiki.ed.ac.uk/display/ecdfwiki/Parallel+Environments http://gridengine.info/2005/09/19/parallel-environments-pes-loose-vs-tight-integration Its possible that the Swift SGE provider already does this and just needs to get a PE spec that gets the right number of nodes or cores allocated. If not, it seems desirable to have it behave in that manner. We should ideally plan on testing SGE on: Ranger (local + GT2); IBI; Godzilla; sisboombah (the latter two are UC PSD clusters) Does anyone on this list know of a SGE guru we can turn to for advice? - Mike ----- Original Message ----- > On Mon, 2010-12-06 at 15:45 -0800, Sarah Kenny wrote: > > alrighty, so looks like we should branch as soon as possible. miheal > > and justin would you be ok with branching the current trunk into a > > release candidate tomorrow? > > Sorry. I should have mentioned this, but this is finals week. On the > up > side (though that depends on perspective), I'll be in Chicago next > week > and stay there until Jan 1st. > > > also, mike wilde and i were just discussing that it would be good > > to > > know if there are any significant features available in 'stable > > branch > > 1.0' that are NOT currently in trunk so that we could include > > them...any ideas on a good way to determine that? > > We (I) would need to go through it and merge any relevant bug fixes > that > went into the branch since it was created. > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Wed Dec 8 14:38:56 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 8 Dec 2010 12:38:56 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: <1291779369.4811.2.camel@blabla2.none> References: <1047657812.179283.1291672992687.JavaMail.root@zimbra.anl.gov> <1291779369.4811.2.camel@blabla2.none> Message-ID: On Tue, Dec 7, 2010 at 7:36 PM, Mihael Hategan wrote: > On Mon, 2010-12-06 at 15:45 -0800, Sarah Kenny wrote: > > alrighty, so looks like we should branch as soon as possible. miheal > > and justin would you be ok with branching the current trunk into a > > release candidate tomorrow? > > Sorry. I should have mentioned this, but this is finals week. On the up > side (though that depends on perspective), I'll be in Chicago next week > and stay there until Jan 1st. > fyi, i'll be in chicago 12/22 thru 1/3 also i think we should really confuse our users and name the next swift release VDL2.91 (kidding ;) > > > also, mike wilde and i were just discussing that it would be good to > > know if there are any significant features available in 'stable branch > > 1.0' that are NOT currently in trunk so that we could include > > them...any ideas on a good way to determine that? > > We (I) would need to go through it and merge any relevant bug fixes that > went into the branch since it was created. > > Mihael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Dec 9 00:48:42 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Dec 2010 22:48:42 -0800 Subject: [Swift-devel] Next Swift release In-Reply-To: References: <623912472.178112.1291665684520.JavaMail.root@zimbra.anl.gov> <1291692366.27727.0.camel@blabla2.none> Message-ID: <1291877322.8057.10.camel@blabla2.none> So I am currently of the opinion that we should first release branches/1.0. In that process, we should try to see if we can streamline our release process such that future releases are easier to do. Traditionally, Ben was the one to do releases and that worked so I didn't ask questions. I say that because of the amount of testing that 1.0 has received. It may be that trunk is equally stable, but the uncertainty about that is higher, and I think that the point of having stable branches is to reduce that uncertainty. There is probably going to be a conflict between stability and features when it comes to stable branch vs. trunk, and I think that it boils down to the weight we put on each of those qualities. But it's ultimately up to you (plural). Mihael On Tue, 2010-12-07 at 14:15 -0800, Sarah Kenny wrote: > mike wilde was suggesting that trunk and branches/1.0 are equally > stable so he wants to do a release of what's currently in trunk (which > means we need to branch it). the users i'm supporting only use > branches/1.0 and have been doing so reliably for many months...the > actual 'usability' of trunk i'm less familiar with but mike was hoping > to put effort towards releasing that rather than spending time/effort > on releasing branches/1.0 since it lacks many of the newer fixes and > features. > > ~sk > > On Mon, Dec 6, 2010 at 7:26 PM, Mihael Hategan > wrote: > I am of the opinion that if we have a choice of releasing > branches/1.0 > now or releasing trunk at the end of January, we should > release > branches/1.0 now. > > Mihael > > > On Mon, 2010-12-06 at 14:01 -0600, Michael Wilde wrote: > > Im loosing track, but I thought trunk will become branch > 0.10? > > > > > > I wanted to name it based on what we're trying to say to the > user > > community: this next release I feel is still pre-1.0 > quality. After > > more doc cleanup and usability cleanup and web polishing, I > feel we're > > ready to try to make a broader announcement and call it > 1.0. Im > > thinking end of this January for that. > > > > > > - Mike > > > > > > > ______________________________________________________________________ > > feel free, justin. i'm currently editing stuff that > i think > > should go into doc for the 12/20 release (e.g. > describing > > features that exist but aren't documented, etc.). > > > > so, branch 1.0 will become release 0.10...seems a > bit > > confusing to me...also considering the differences > between 0.9 > > and what we're releasing doesn't calling it 1.0 make > sense? > > just a thought... > > > > ~sk > > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak > > wrote: > > > > Sounds great- I was actually thinking about > setting up > > the branch-specific docs later this week, do > you > > already have a start on that? > > > > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > so, my expectation for the release, > as we've > > discussed somewhat on the list > > already, is to put out swift 1.0 on > 12/20 > > which, as i see it, involves > > primarily editing of the > documentation/web > > content more so than anything > > else since all new code (and > documentation > > associated with the new code) > > going into trunk is expected to be > in the 1.1. > > release--which hopefully we > > can have out in the next few months. > i'm also > > assuming we're sticking with > > the plan to allow each release to > have its own > > doc version along with the > > code. > > > > let me know if anyone thinks there > are other > > things that can/should go into > > the 12/20 release. > > > > ~sk > > > > On Tue, Nov 23, 2010 at 2:10 PM, > Michael Wilde > > wrote: > > > > All, > > > > Sarah is going to take the > lead in > > producing the next Swift > release, and > > will propose a release > definition and > > plan. We want to have the > release done > > by Dec 20. > > > > - Mike > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Justin M Wozniak > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From wilde at mcs.anl.gov Thu Dec 9 08:33:23 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 9 Dec 2010 08:33:23 -0600 (CST) Subject: [Swift-devel] Next Swift release In-Reply-To: <1291877322.8057.10.camel@blabla2.none> Message-ID: <400599527.338.1291905203161.JavaMail.root@zimbra.anl.gov> My thinking on this is: - trunk has many fixes that the users Im working with depend on, in particular to coasters. - most users Ive worked with are running trunk from builds Ive placed on a range of machines - I *suspect* that trunk is as stable as branches/1.0 - the main point of this release effort is to test a release on a set of site configs, in addition to the localhost language tests - if we're doing that, we should push forward and test a branch of today's trunk (call it branches/0.91) and make trunk pass those tests. - if we see that this 0.91 does not readily pass site-based stress tests, we can fall back and run those same tests against branches/0.91. In this case I would do some re-naming: 1.0 to 0.91, and trunk to 0.92 - i.e. saving the "1.0" number for a release that does more cleanup of docs and config tools, and is timed for a point when we can do more effective publicity for it. Thats what I would like to do. I suspect that the work to get trunk stable (ie passing site-based stress tests) vs the current branches/1.0 stable will be about equal. Even if testing on a trunk-based release takes us longer (ie stretches into January) I feel it will be worth the effort. Mihael, I'd see two major tasks you'd need to take on in this: integration of branches/1.0 fixes and trunk fixes into 0.91 (or 0.92 if we want to allow a fallback), and fixing bugs deemed to be release showstoppers. Can we all live with this plan and move forward on it now? - Mike ----- Original Message ----- > So I am currently of the opinion that we should first release > branches/1.0. In that process, we should try to see if we can > streamline > our release process such that future releases are easier to do. > > Traditionally, Ben was the one to do releases and that worked so I > didn't ask questions. > > I say that because of the amount of testing that 1.0 has received. > It may be that trunk is equally stable, but the uncertainty about that > is higher, and I think that the point of having stable branches is to > reduce that uncertainty. > > There is probably going to be a conflict between stability and > features > when it comes to stable branch vs. trunk, and I think that it boils > down > to the weight we put on each of those qualities. > > But it's ultimately up to you (plural). > > Mihael > > On Tue, 2010-12-07 at 14:15 -0800, Sarah Kenny wrote: > > mike wilde was suggesting that trunk and branches/1.0 are equally > > stable so he wants to do a release of what's currently in trunk > > (which > > means we need to branch it). the users i'm supporting only use > > branches/1.0 and have been doing so reliably for many months...the > > actual 'usability' of trunk i'm less familiar with but mike was > > hoping > > to put effort towards releasing that rather than spending > > time/effort > > on releasing branches/1.0 since it lacks many of the newer fixes and > > features. > > > > ~sk > > > > On Mon, Dec 6, 2010 at 7:26 PM, Mihael Hategan > > wrote: > > I am of the opinion that if we have a choice of releasing > > branches/1.0 > > now or releasing trunk at the end of January, we should > > release > > branches/1.0 now. > > > > Mihael > > > > > > On Mon, 2010-12-06 at 14:01 -0600, Michael Wilde wrote: > > > Im loosing track, but I thought trunk will become branch > > 0.10? > > > > > > > > > I wanted to name it based on what we're trying to say to > > > the > > user > > > community: this next release I feel is still pre-1.0 > > quality. After > > > more doc cleanup and usability cleanup and web polishing, > > > I > > feel we're > > > ready to try to make a broader announcement and call it > > 1.0. Im > > > thinking end of this January for that. > > > > > > > > > - Mike > > > > > > > > > > > ______________________________________________________________________ > > > feel free, justin. i'm currently editing stuff > > > that > > i think > > > should go into doc for the 12/20 release (e.g. > > describing > > > features that exist but aren't documented, etc.). > > > > > > so, branch 1.0 will become release 0.10...seems a > > bit > > > confusing to me...also considering the differences > > between 0.9 > > > and what we're releasing doesn't calling it 1.0 > > > make > > sense? > > > just a thought... > > > > > > ~sk > > > > > > On Mon, Dec 6, 2010 at 7:51 AM, Justin M Wozniak > > > wrote: > > > > > > Sounds great- I was actually thinking > > > about > > setting up > > > the branch-specific docs later this week, > > > do > > you > > > already have a start on that? > > > > > > > > > > > > On Mon, 6 Dec 2010, Sarah Kenny wrote: > > > > > > so, my expectation for the > > > release, > > as we've > > > discussed somewhat on the list > > > already, is to put out swift 1.0 > > > on > > 12/20 > > > which, as i see it, involves > > > primarily editing of the > > documentation/web > > > content more so than anything > > > else since all new code (and > > documentation > > > associated with the new code) > > > going into trunk is expected to be > > in the 1.1. > > > release--which hopefully we > > > can have out in the next few > > > months. > > i'm also > > > assuming we're sticking with > > > the plan to allow each release to > > have its own > > > doc version along with the > > > code. > > > > > > let me know if anyone thinks there > > are other > > > things that can/should go into > > > the 12/20 release. > > > > > > ~sk > > > > > > On Tue, Nov 23, 2010 at 2:10 PM, > > Michael Wilde > > > wrote: > > > > > > All, > > > > > > Sarah is going to take the > > lead in > > > producing the next Swift > > release, and > > > will propose a release > > definition and > > > plan. We want to have the > > release done > > > by Dec 20. > > > > > > - Mike > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > -- > > > Justin M Wozniak > > > > > > > > > > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From iraicu at cs.iit.edu Thu Dec 9 13:13:55 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 13:13:55 -0600 Subject: [Swift-devel] CFP: Workshop on Data Intensive Computing in the Clouds (DataCloud) 2011, deadline extended to January 3rd, 2011 Message-ID: <4D012A73.7060503@cs.iit.edu> --------------------------------------------------------------------------------- *** Call for Papers *** WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011) In conjunction with IPDPS 2011, May 16, Anchorage, Alaska http://www.cse.buffalo.edu/faculty/tkosar/datacloud2011/index.php --------------------------------------------------------------------------------- The First International Workshop on Data Intensive Computing in the Clouds (DataCloud2011) will be held in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, Alaska. Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes and even petabytes. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and data intensive computing is now considered as the "fourth paradigm" in scientific discovery after theoretical, experimental, and computational science. DataCloud2011 will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running data-intensive computing workloads on Cloud Computing infrastructures. The DataCloud2011 workshop will focus on the use of cloud-based technologies to meet the new data intensive scientific challenges that are not well served by the current supercomputers, grids or compute-intensive clouds. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and present architectures and services for future clouds supporting data intensive computing. TOPICS --------------------------------------------------------------------------------- - Data-intensive cloud computing applications, characteristics, challenges - Case studies of data intensive computing in the clouds - Performance evaluation of data clouds, data grids, and data centers - Energy-efficient data cloud design and management - Data placement, scheduling, and interoperability in the clouds - Accountability, QoS, and SLAs - Data privacy and protection in a public cloud environment - Distributed file systems for clouds - Data streaming and parallelization - New programming models for data-intensive cloud computing - Scalability issues in clouds - Social computing and massively social gaming - 3D Internet and implications - Future research challenges in data-intensive cloud computing IMPORTANT DATES --------------------------------------------------------------------------------- Paper submission: January 3rd, 2011 Acceptance notification: February 1st, 2011 Final papers due: February 15th, 2011 PAPER SUBMISSION --------------------------------------------------------------------------------- DataCloud2011 invites authors to submit original and unpublished technical papers. All submissions will be peer-reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and relevance to the workshop topics of interest. Submitted papers may not have appeared in or be under consideration for another workshop, conference or a journal, nor may they be under review or submitted to another forum during the DataCloud2011 review process. Submitted papers may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style, document templates can be found at ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.pdf and ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.doc), including figures, tables, and references. The final 10 page papers (PDF format) must be submitted online at https://cmt.research.microsoft.com/DataCloud2011/ before the deadline of January 3rd, 2011 at 11:59PM PST. Authors of the selected DataCloud2011 papers will be invited to submit extended versions of their workshop papers to the Journal of Grid Computing (published by Springer), Special Issue on "Data Intensive Computing in the Clouds." WORKSHOP and PROGRAM CHAIRS --------------------------------------------------------------------------------- Tevfik Kosar, University at Buffalo Ioan Raicu, Illinois Institute of Technology STEERING COMMITTEE --------------------------------------------------------------------------------- Ian Foster, Univ of Chicago & Argonne National Lab Geoffrey Fox, Indiana University James Hamilton, Amazon Web Services Manish Parashar, Rutgers University & NSF Dan Reed, Microsoft Research Rich Wolski, University of California, Santa Barbara Liang-Jie Zhang, IBM Research PROGRAM COMMITTEE --------------------------------------------------------------------------------- David Abramson, Monash University, Australia Roger Barga, Microsoft Research John Bent, Los Alamos National Laboratory Umit Catalyurek, Ohio State University Abhishek Chandra, University of Minnesota Rong N. Chang, IBM Research Alok Choudhary, Northwestern University Brian Cooper, Google Ewa Deelman, University of Southern California Murat Demirbas, University at Buffalo Adriana Iamnitchi, University of South Florida Maria Indrawan, Monash University, Australia Alexandru Iosup, Delft University of Technology, Netherlands Peter Kacsuk, Hungarian Academy of Sciences, Hungary Dan Katz, University of Chicago Steven Ko, University at Buffalo Gregor von Laszewski, Rochester Institute of Technology Erwin Laure, CERN, Switzerland Ignacio Llorente, Universidad Complutense de Madrid, Spain Reagan Moore, University of North Carolina Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory Florian Schintke, Zuse Institute Berlin Ian Taylor, Cardiff University, UK Douglas Thain, University of Notre Dame Bernard Traversat, Oracle Yong Zhao, Univ of Electronic Science & Tech of China -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Thu Dec 9 15:53:54 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 15:53:54 -0600 Subject: [Swift-devel] CFP: 2nd ACM Workshop on Scientific Cloud Computing (ScienceCloud) 2011, co-located with HPDC 2011 Message-ID: <4D014FF2.9000306@cs.iit.edu> --------------------------------------------------------------------------------- * ** Call for Papers *** 2nd Workshop on Scientific Cloud Computing (ScienceCloud) 2011 In conjunction with ACM HPDC 2011, June 8th, 2011, San Jose, California http://www.cs.iit.edu/~iraicu/ScienceCloud2011/ --------------------------------------------------------------------------------- The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Scientific Computing has already begun to change how science is done, enabling scientific breakthroughs through new kinds of experiments that would have been impossible only a decade ago. Today's science is generating datasets that are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. The support for data intensive computing is critical to advancing modern science as storage systems have experienced an increasing gap between their capacity and bandwidth by more than 10-fold over the last decade. There is an emerging need for advanced techniques to manipulate, visualize and interpret large datasets. Scientific computing involves a broad range of technologies, from high-performance computing (HPC) which is heavily focused on compute-intensive applications, high-throughput computing (HTC) which focuses on using many computing resources over long periods of time to accomplish its computational tasks, many-task computing (MTC) which aims to bridge the gap between HPC and HTC by focusing on using many resources over short periods of time, to data-intensive computing which is heavily focused on data distribution and harnessing data locality by scheduling of computations close to the data. The 2nd workshop on Scientific Cloud Computing (ScienceCloud) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running these kinds of scientific computing workloads on Cloud Computing infrastructures. The ScienceCloud workshop will focus on the use of cloud-based technologies to meet new compute intensive and data intensive scientific challenges that are not well served by the current supercomputers, grids or commercial clouds. What architectural changes to the current cloud frameworks (hardware, operating systems, networking and/or programming models) are needed to support science? Dynamic information derived from remote instruments and coupled simulation and sensor ensembles are both important new science pathways and tremendous challenges for current HPC/HTC/MTC technologies. How can cloud technologies enable these new scientific approaches? How are scientists using clouds? Are there scientific HPC/HTC/MTC workloads that are suitable candidates to take advantage of emerging cloud computing resources with high efficiency? What benefits exist by adopting the cloud model, over clusters, grids, or supercomputers? What factors are limiting clouds use or would make them more usable/efficient? This workshop encourages interaction and cross-pollination between those developing applications, algorithms, software, hardware and networking, emphasizing scientific computing for such cloud platforms. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and define architectures and services for future science clouds. For more information about the workshop, please see http://www.cs.iit.edu/~iraicu/ScienceCloud2011/. To see last year's workshop program agenda, and accepted papers and presentations, please see http://dsl.cs.uchicago.edu/ScienceCloud2010/. TOPICS --------------------------------------------------------------------------------- # scientific computing applications * case studies on public, private and open source cloud computing * case studies comparing between cloud computing and cluster, grids, and/or supercomputers * performance evaluation # performance evaluation * real systems * cloud computing benchmarks * reliability of large systems # programming models and tools * map-reduce and its generalizations * many-task computing middleware and applications * integrating parallel programming frameworks with storage clouds * message passing interface (MPI) * service-oriented science applications # storage cloud architectures and implementations * distributed file systems * content distribution systems for large data * data caching frameworks and techniques * data management within and across data centers * data streaming applications * data-aware scheduling * data-intensive computing applications * eventual-consistency storage usage and management # compute resource management * dynamic resource provisioning * scheduling * techniques to manage many-core resources and/or GPUs # high-performance computing * high-performance I/O systems * interconnect and network interface architectures for HPC * multi-gigabit wide-area networking * scientific computing tradeoffs between clusters/grids/supercomputers and clouds * parallel file systems in dynamic environments # models, frameworks and systems for cloud security * implementation of access control and scalable isolation IMPORTANT DATES --------------------------------------------------------------------------------- Abstract submission: January 25th, 2011 Paper submission: February 1st, 2011 Acceptance notification: February 28th, 2011 Final papers due: March 24th, 2011 Workshop date: June 8th, 2011 PAPER SUBMISSION --------------------------------------------------------------------------------- Authors are invited to submit papers with unpublished, original work of not more than 10 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages (including all text, figures, and references), as per ACM 8.5 x 11 manuscript guidelines (http://www.acm.org/publications/instructions_for_proceedings_volumes); document templates can be found at http://www.acm.org/sigs/publications/proceedings-templates. A 250 word abstract (PDF format) must be submitted online at https://cmt.research.microsoft.com/ScienceCloud2011/ before the deadline of January 25th, 2011 at 11:59PM PST; the final 5/10 page papers in PDF format will be due on February 1st, 2011 at 11:59PM PST. Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM digital library. Notifications of the paper decisions will be sent out by February 28th, 2011. Selected excellent work will be invited to submit extended versions of the workshop paper to a special issue journal. Submission implies the willingness of at least one of the authors to register and present the paper. For more information, please visit http://www.cs.iit.edu/~iraicu/ScienceCloud2011/. WORKSHOP GENERAL CHAIRS --------------------------------------------------------------------------------- * Ioan Raicu, Illinois Institute of Technology * Pete Beckman, University of Chicago & Argonne National Laboratory * Ian Foster, University of Chicago & Argonne National Laboratory PROGRAM CHAIR --------------------------------------------------------------------------------- Yogesh Simmhan, University of Southern California STEERING COMMITTEE --------------------------------------------------------------------------------- * Dennis Gannon, Microsoft Research, USA * Robert Grossman, University of Chicago, USA * Kate Keahey, Nimbus, University of Chicago, Argonne National Laboratory, USA * Ed Lazowska, University of Washington & Computing Community Consortium, USA * Ignacio Llorente, Open Nebula, Universidad Complutense de Madrid, Spain * David O'Hallaron, Carnegie Mellon University & Intel Labs, USA * Jack Dongarra, University of Tennessee, USA * Geoffrey Fox, Indiana University, USA PROGRAM COMMITTEE --------------------------------------------------------------------------------- * Remzi Arpaci-Dusseau, University of Wisconsin, Madison * Roger Barga, Microsoft Research * Jeff Broughton, Lawrence Berkeley National Lab. * Rajkumar Buyya, University of Melbourne, Australia * Roy Campbell, Univ. of Illinois at Urbana Champaign * Henri Casanova, University of Hawaii at Manoa * Jeff Chase, Duke University * Alok Choudhary, Northwestern University * Bill Howe, University of Washington * Alexandru Iosup, Delft University of Technology, Netherlands * Shantenu Jha, Louisiana State University * Tevfik Kosar, Louisiana State University * Shiyong Lu, Wayne State University * Joe Mambretti, Northwestern University * David Martin, Argonne National Laboratory * Paolo Missier, University of Manchester, UK * Ruben Montero, Univ. Complutense de Madrid, Spain * Reagan Moore, Univ. of North Carolina, Chappel Hill * Jose Moreira, IBM Research * Jim Myers, NCSA * Viktor Prasanna, University of Southern California * Lavanya Ramakrishnan, Lawrence Berkeley Nat. Lab. * Matei Ripeanu, University of British Columbia, Canada * Josh Simons, VMWare * Marc Snir, University of Illinois at Urbana Champaign * Ion Stoica, University of California Berkeley * Daniel Zinn, University of California at Davis -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From bugzilla-daemon at mcs.anl.gov Thu Dec 9 16:04:30 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 9 Dec 2010 16:04:30 -0600 (CST) Subject: [Swift-devel] [Bug 237] New: swift command argument parsing yields misleading error messages Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=237 Summary: swift command argument parsing yields misleading error messages Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov [mwilde at master lab]$ swift -confg cf -tc.file tc -sites.file sgecoast.xml catsn.swift -n=1 SwiftScript program does not exist: -confg For usage information: swift -help # two common issues: 1. any non-recognized flag makes the command think that the swift script should follow. The error message should instead be "unknown arg". (shown above) 2. non-accepted @arg() options are silently ignored. This could be fixed with something like @options("a","b","n") etc. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From iraicu at cs.iit.edu Thu Dec 9 16:45:37 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 16:45:37 -0600 Subject: [Swift-devel] CFP: The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2011 Message-ID: <4D015C11.2040909@cs.iit.edu> Call For Papers The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing http://www.hpdc.org/2011/ San Jose, California, June 8-11, 2011 The ACM International Symposium on High-Performance Parallel and Distributed Computing is the premier conference for presenting the latest research on the design, implementation, evaluation, and use of parallel and distributed systems for high end computing. The 20th installment of HPDC will take place in San Jose, California, in the heart of Silicon Valley. This year, HPDC is affiliated with the ACM Federated Computing Research Conference, consisting of fifteen leading ACM conferences all in one week. HPDC will be held on June 9-11 (Thursday through Saturday) with affiliated workshops taking place on June 8th (Wednesday). Submissions are welcomed on all forms of high performance parallel and distributed computing, including but not limited to clusters, clouds, grids, utility computing, data-intensive computing, multicore and parallel computing. All papers will be reviewed by a distinguished program committee, with a strong preference for rigorous results obtained in operational parallel and distributed systems. All papers will be evaluated for correctness, originality, potential impact, quality of presentation, and interest and relevance to the conference. In addition to traditional technical papers, we also invite experience papers. Such papers should present operational details of a production high end system or application, and draw out conclusions gained from operating the system or application. The evaluation of experience papers will place a greater weight on the real-world impact of the system and the value of conclusions to future system designs. Topics of interest include, but are not limited to: ------------------------------------------------------------------------------- # Applications of parallel and distributed computing. # Systems, networks, and architectures for high end computing. # Parallel and multicore issues and opportunities. # Virtualization of machines, networks, and storage. # Programming languages and environments. # I/O, file systems, and data management. # Data intensive computing. # Resource management, scheduling, and load-balancing. # Performance modeling, simulation, and prediction. # Fault tolerance, reliability and availability. # Security, configuration, policy, and management issues. # Models and use cases for utility, grid, and cloud computing. Authors are invited to submit technical papers of at most 12 pages in PDF format, including all figures and references. Papers should be formatted in the ACM Proceedings Style and submitted via the conference web site. Accepted papers will appear in the conference proceedings, and will be incorporated into the ACM Digital Library. Papers must be self-contained and provide the technical substance required for the program committee to evaluate the paper's contribution. Papers should thoughtfully address all related work, particularly work presented at previous HPDC events. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. See the ACM Prior Publication Policy for more details. Workshops ------------------------------------------------------------------------------- Seven workshops affiliated with HPDC will be held on Wednesday, June 8th. For more information, see the Workshops page at http://www.hpdc.org/2011/workshops.php. # ScienceCloud: 2nd Workshop on Scientific Cloud Computing # MapReduce: The Second International Workshop on MapReduce and its Applications # VTDC: Virtual Technologies in Distributed Computing # ECMLS: The Second International Emerging Computational Methods for the Life Sciences Workshop # LSAP: Workshop on Large-Scale System and Application Performance # DIDC: The Fourth International Workshop on Data-Intensive Distributed Computing # 3DAPAS: Workshop on Dynamic Distributed Data-Intensive Applications, Programming Abstractions, and Systems Important Dates ------------------------------------------------------------------------------- Technical Papers Due: 17 January 2011 PAPER DEADLINE EXTENDED: 24 January 2011 at 12:01 PM (NOON) Eastern Time Author Notifications: 28 February 2011 Final Papers Due: 24 March 2011 Conference Dates: 8-11 June 2011 Organization ------------------------------------------------------------------------------- General Chair Barney Maccabe, Oak Ridge National Laboratory Program Chair Douglas Thain, University of Notre Dame Workshops Chair Mike Lewis, Binghamton University Local Arrangements Chair Nick Wright, Lawrence Berkeley National Laboratory Student Activities Chairs Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Publicity Chairs Alexandru Iosup, Delft University John Lange, University of Pittsburgh Ioan Raicu, Illinois Institute of Technology Yong Zhao, Microsoft Program Committee Kento Aida, National Institute of Informatics Henri Bal, Vrije Universiteit Roger Barga, Microsoft Jim Basney, NCSA John Bent, Los Alamos National Laboratory Ron Brightwell, Sandia National Laboratories Shawn Brown, Pittsburgh Supercomputer Center Claris Castillo, IBM Andrew A. Chien, UC San Diego and SDSC Ewa Deelman, USC Information Sciences Institute Peter Dinda, Northwestern University Scott Emrich, University of Notre Dame Dick Epema, Delft University of Technology Gilles Fedak, INRIA Renato Figuierdo, University of Florida Ian Foster, University of Chicago and Argonne National Laboratory Gabriele Garzoglio, Fermi National Accelerator Laboratory Rong Ge, Marquette University Sebastien Goasguen, Clemson University Kartik Gopalan, Binghamton University Dean Hildebrand, IBM Almaden Adriana Iamnitchi, University of South Florida Alexandru Iosup, Delft University of Technology Keith Jackson, Lawrence Berkeley Shantenu Jha, Louisiana State University Daniel S. Katz, University of Chicago and Argonne National Laboratory Thilo Kielmann, Vrije Universiteit Charles Killian, Purdue University Tevfik Kosar, Louisiana State University John Lange, University of Pittsburgh Mike Lewis, Binghamton University Barney Maccabe, Oak Ridge National Laboratory Grzegorz Malewicz, Google Satoshi Matsuoka, Tokyo Institute of Technology Jarek Nabrzyski, University of Notre Dame Manish Parashar, Rutgers University Beth Plale, Indiana University Ioan Raicu, Illinois Institute of Technology Philip Rhodes, University of Mississippi Matei Ripeanu, University of British Columbia Philip Roth, Oak Ridge National Laboratory Karsten Schwan, Georgia Tech Martin Swany, University of Delaware Jon Weissman, University of Minnesota Dongyan Xu, Purdue University Ken Yocum, UC San Diego Yong Zhao, Microsoft Steering Committee Henri Bal, Vrije Universiteit Andrew A. Chien, UC San Diego and SDSC Peter Dinda, Northwestern University Ian Foster, Argonne National Laboratory and University of Chicago Dennis Gannon, Microsoft Salim Hariri, University of Arizona Dieter Kranzlmueller, Ludwig-Maximilians-Univ. Muenchen Satoshi Matsuoka, Tokyo Institute of Technology Manish Parashar, Rutgers University Karsten Schwan, Georgia Tech Jon Weissman, University of Minnesota (Chair) -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From aespinosa at cs.uchicago.edu Fri Dec 10 10:17:32 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 10:17:32 -0600 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: <20101012210458.GE2510@origin> References: <20101012210458.GE2510@origin> Message-ID: The idle timeout having a non-zero exitcode generated a lot of "JOB FAILED" stats in OSG . this skews their usage report in a weird fashion. I made some modifications before but my upgrade to the latest trunk code somehow broke it. 2010/10/12 Allan Espinosa : > Poking at worker.pl, I see that it accepts a third argument for idle time. ?Is > this in seconds? > > Also, I'm using swift to driver a number of passive workers. ?The worker jobs > fail due to this timeout. ?I may have to modify things to suit this kind of > setup. > > Thanks, > -Allan > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 10:20:02 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 10:20:02 -0600 (CST) Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: Message-ID: <1109883466.8235.1291998002949.JavaMail.root@zimbra.anl.gov> I added that idle timeout arg to worker.pl I think. But in recent changes I think Mihael removed the idle timeout entirely. Are you using a recent trunk version with those changes? That seemed to work best for me in my latest tests using passive persistent coaster servers. ----- Original Message ----- > The idle timeout having a non-zero exitcode generated a lot of "JOB > FAILED" stats in OSG . this skews their usage report in a weird > fashion. I made some modifications before but my upgrade to the > latest trunk code somehow broke it. > > 2010/10/12 Allan Espinosa : > > Poking at worker.pl, I see that it accepts a third argument for idle > > time. Is > > this in seconds? > > > > Also, I'm using swift to driver a number of passive workers. The > > worker jobs > > fail due to this timeout. I may have to modify things to suit this > > kind of > > setup. > > > > Thanks, > > -Allan > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Dec 10 12:30:29 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 12:30:29 -0600 (CST) Subject: [Swift-devel] Format of site tests In-Reply-To: <1960190283.9263.1292005593600.JavaMail.root@zimbra.anl.gov> Message-ID: <705480754.9307.1292005829243.JavaMail.root@zimbra.anl.gov> Sarah, Justin and I use a format for site testing that is roughly as shown below: a single script that emits swift.property settings, tc, sites.xml, the swift script. It can and should create any input data and external mappers as needed. I hope this helps you get started creating and running these tests. You and Justin should schedule a call to discuss how to integrate these tests into the test suite. This will take some discussion, which we should do on this list, but hopefully this is a good starting point for the site tests. I think Ben has something similar in the existing tests, but I have not looked at those yet. - Mike [mwilde at master tests]$ cat t1.sh cat >tc <sites.xml < shm 00:01:00 10000 .20 $PWD EOF cat >cf <catsn.swift <; foreach j in [1:@toint(@arg("n","1"))] { file data<"data.txt">; out[j] = cat(data); } EOF echo Hi There Swift! >data.txt swift -config cf -tc.file tc -sites.file sites.xml catsn.swift -n=100 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Fri Dec 10 12:40:00 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 10 Dec 2010 12:40:00 -0600 (CST) Subject: [Swift-devel] Format of site tests In-Reply-To: <705480754.9307.1292005829243.JavaMail.root@zimbra.anl.gov> References: <705480754.9307.1292005829243.JavaMail.root@zimbra.anl.gov> Message-ID: In the new nightly.sh, each test group has an associated set of configuration file templates that is lightly processed by sed. On Fri, 10 Dec 2010, Michael Wilde wrote: > Sarah, > > Justin and I use a format for site testing that is roughly as shown > below: a single script that emits swift.property settings, tc, > sites.xml, the swift script. > > It can and should create any input data and external mappers as needed. > > I hope this helps you get started creating and running these tests. You > and Justin should schedule a call to discuss how to integrate these > tests into the test suite. > > This will take some discussion, which we should do on this list, but > hopefully this is a good starting point for the site tests. > > I think Ben has something similar in the existing tests, but I have not > looked at those yet. > > - Mike > > [mwilde at master tests]$ cat t1.sh > cat >tc < > sge cat /bin/cat null null null > > EOF > > cat >sites.xml < > > > shm > 00:01:00 > 10000 > .20 > > $PWD > > > > EOF > > cat >cf < > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=0 > lazy.errors=false > status.mode=provider > use.provider.staging=false > provider.staging.pin.swiftfiles=false > > EOF > > cat >catsn.swift < > type file; > > app (file o) cat (file i) > { > cat @i stdout=@o; > } > > file out[]; > foreach j in [1:@toint(@arg("n","1"))] { > file data<"data.txt">; > out[j] = cat(data); > } > > EOF > > echo Hi There Swift! >data.txt > > swift -config cf -tc.file tc -sites.file sites.xml catsn.swift -n=100 > > > -- Justin M Wozniak From wilde at mcs.anl.gov Fri Dec 10 14:20:28 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 14:20:28 -0600 (CST) Subject: [Swift-devel] Format of site tests In-Reply-To: Message-ID: <987422099.10926.1292012428483.JavaMail.root@zimbra.anl.gov> Does that work well for site testing, where each test will likely change subset of config settings? I assume any test script could also opt to provide its own config files as in the example I posted? If you can specify a style, Justin, then Sarah and the rest of us can develop and run tests, and then plug them into the framework. - Mike ----- Original Message ----- > In the new nightly.sh, each test group has an associated set of > configuration file templates that is lightly processed by sed. > > On Fri, 10 Dec 2010, Michael Wilde wrote: > > > Sarah, > > > > Justin and I use a format for site testing that is roughly as shown > > below: a single script that emits swift.property settings, tc, > > sites.xml, the swift script. > > > > It can and should create any input data and external mappers as > > needed. > > > > I hope this helps you get started creating and running these tests. > > You > > and Justin should schedule a call to discuss how to integrate these > > tests into the test suite. > > > > This will take some discussion, which we should do on this list, but > > hopefully this is a good starting point for the site tests. > > > > I think Ben has something similar in the existing tests, but I have > > not > > looked at those yet. > > > > - Mike > > > > [mwilde at master tests]$ cat t1.sh > > cat >tc < > > > sge cat /bin/cat null null null > > > > EOF > > > > cat >sites.xml < > > > > > > > shm > > 00:01:00 > > 10000 > > .20 > > > > $PWD > > > > > > > > EOF > > > > cat >cf < > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=false > > status.mode=provider > > use.provider.staging=false > > provider.staging.pin.swiftfiles=false > > > > EOF > > > > cat >catsn.swift < > > > type file; > > > > app (file o) cat (file i) > > { > > cat @i stdout=@o; > > } > > > > file out[] > prefix="f.",suffix=".out">; > > foreach j in [1:@toint(@arg("n","1"))] { > > file data<"data.txt">; > > out[j] = cat(data); > > } > > > > EOF > > > > echo Hi There Swift! >data.txt > > > > swift -config cf -tc.file tc -sites.file sites.xml catsn.swift > > -n=100 > > > > > > > > -- > Justin M Wozniak -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 10 17:07:07 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 17:07:07 -0600 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: <1109883466.8235.1291998002949.JavaMail.root@zimbra.anl.gov> References: <1109883466.8235.1291998002949.JavaMail.root@zimbra.anl.gov> Message-ID: Looking at the worker.pl I use, yes there is no more IDLE timeout cases. Then this will leave pilot jobs failing when it exceeds the maxwalltime. This is another explanation for the large amount of job failures in OSG as well. Before the changes, I simply changed the IDLE timeout to exit cleanly (exit 0 instead of die) -Allan 2010/12/10 Michael Wilde : > I added that idle timeout arg to worker.pl I think. ?But in recent changes I think Mihael removed the idle timeout entirely. ?Are you using a recent trunk version with those changes? ?That seemed to work best for me in my latest tests using passive persistent coaster servers. > > > > ----- Original Message ----- >> The idle timeout having a non-zero exitcode generated a lot of "JOB >> FAILED" stats in OSG . this skews their usage report in a weird >> fashion. I made some modifications before but my upgrade to the >> latest trunk code somehow broke it. >> >> 2010/10/12 Allan Espinosa : >> > Poking at worker.pl, I see that it accepts a third argument for idle >> > time. Is >> > this in seconds? >> > >> > Also, I'm using swift to driver a number of passive workers. The >> > worker jobs >> > fail due to this timeout. I may have to modify things to suit this >> > kind of >> > setup. >> > >> > Thanks, >> > -Allan >> > >> >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 17:16:23 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 17:16:23 -0600 (CST) Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: Message-ID: <555961577.12352.1292022983780.JavaMail.root@zimbra.anl.gov> Since your pilot jobs are scripts that launch worker.pl, you could put a timer in those scripts to kill worker.pl and exit cleanly. If you set maxtime in the pool entry to be somewhat less than the Condor jobtime setting for the pilot job, will Swift, even in the case of persistent coasters, (a) not start a job whose maxwalltime is > than the maxtime remaining, and (b) shut down workers when no queued job has fit into the remaining time of the worker for some idle timeout period? (I.e., I thought the reason IDLETIMEOUT could be removed from the worker was that the client (or the service) has similar logic. - Mike ----- Original Message ----- > Looking at the worker.pl I use, yes there is no more IDLE timeout > cases. Then this will leave pilot jobs failing when it exceeds the > maxwalltime. This is another explanation for the large amount of job > failures in OSG as well. > > Before the changes, I simply changed the IDLE timeout to exit cleanly > (exit 0 instead of die) > > -Allan > > 2010/12/10 Michael Wilde : > > I added that idle timeout arg to worker.pl I think. But in recent > > changes I think Mihael removed the idle timeout entirely. Are you > > using a recent trunk version with those changes? That seemed to work > > best for me in my latest tests using passive persistent coaster > > servers. > > > > > > > > ----- Original Message ----- > >> The idle timeout having a non-zero exitcode generated a lot of "JOB > >> FAILED" stats in OSG . this skews their usage report in a weird > >> fashion. I made some modifications before but my upgrade to the > >> latest trunk code somehow broke it. > >> > >> 2010/10/12 Allan Espinosa : > >> > Poking at worker.pl, I see that it accepts a third argument for > >> > idle > >> > time. Is > >> > this in seconds? > >> > > >> > Also, I'm using swift to driver a number of passive workers. The > >> > worker jobs > >> > fail due to this timeout. I may have to modify things to suit > >> > this > >> > kind of > >> > setup. > >> > > >> > Thanks, > >> > -Allan > >> > > >> > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 10 18:26:09 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 18:26:09 -0600 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: <555961577.12352.1292022983780.JavaMail.root@zimbra.anl.gov> References: <555961577.12352.1292022983780.JavaMail.root@zimbra.anl.gov> Message-ID: I am not sure about passive workers though. Since swift is not involved in the creation of the workers, it has no idea when to issue the SHUTDOWN command to the workers (and service). -Allan 2010/12/10 Michael Wilde : > Since your pilot jobs are scripts that launch worker.pl, you could put a timer in those scripts to kill worker.pl and exit cleanly. > > If you set maxtime in the pool entry to be somewhat less than the Condor jobtime setting for the pilot job, will Swift, even in the case of persistent coasters, (a) not start a job whose maxwalltime is > than the maxtime remaining, and (b) shut down workers when no queued job has fit into the remaining time of the worker for some idle timeout period? (I.e., I thought the reason IDLETIMEOUT could be removed from the worker was that the client (or the service) has similar logic. > > - Mike > > > ----- Original Message ----- >> Looking at the worker.pl I use, yes there is no more IDLE timeout >> cases. Then this will leave pilot jobs failing when it exceeds the >> maxwalltime. This is another explanation for the large amount of job >> failures in OSG as well. >> >> Before the changes, I simply changed the IDLE timeout to exit cleanly >> (exit 0 instead of die) >> >> -Allan >> >> 2010/12/10 Michael Wilde : >> > I added that idle timeout arg to worker.pl I think. But in recent >> > changes I think Mihael removed the idle timeout entirely. Are you >> > using a recent trunk version with those changes? That seemed to work >> > best for me in my latest tests using passive persistent coaster >> > servers. >> > >> > >> > >> > ----- Original Message ----- >> >> The idle timeout having a non-zero exitcode generated a lot of "JOB >> >> FAILED" stats in OSG . this skews their usage report in a weird >> >> fashion. I made some modifications before but my upgrade to the >> >> latest trunk code somehow broke it. >> >> >> >> 2010/10/12 Allan Espinosa : >> >> > Poking at worker.pl, I see that it accepts a third argument for >> >> > idle >> >> > time. Is >> >> > this in seconds? >> >> > >> >> > Also, I'm using swift to driver a number of passive workers. The >> >> > worker jobs >> >> > fail due to this timeout. I may have to modify things to suit >> >> > this >> >> > kind of >> >> > setup. >> >> > >> >> > Thanks, >> >> > -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 19:56:57 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 19:56:57 -0600 (CST) Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: Message-ID: <496816489.12613.1292032617166.JavaMail.root@zimbra.anl.gov> I would think the service knows when each worker registered and how long the worker has been idle, regardless of whether the server started the worker itself. We should be able to test this readily in a small controlled setup, and validate the results with Mihael regarding whats supposed to happen and what we'd like to have happen. - Mike ----- Original Message ----- > I am not sure about passive workers though. Since swift is not > involved in the creation of the workers, it has no idea when to issue > the SHUTDOWN command to the workers (and service). > > -Allan > > 2010/12/10 Michael Wilde : > > Since your pilot jobs are scripts that launch worker.pl, you could > > put a timer in those scripts to kill worker.pl and exit cleanly. > > > > If you set maxtime in the pool entry to be somewhat less than the > > Condor jobtime setting for the pilot job, will Swift, even in the > > case of persistent coasters, (a) not start a job whose maxwalltime > > is > than the maxtime remaining, and (b) shut down workers when no > > queued job has fit into the remaining time of the worker for some > > idle timeout period? (I.e., I thought the reason IDLETIMEOUT could > > be removed from the worker was that the client (or the service) has > > similar logic. > > > > - Mike > > > > > > ----- Original Message ----- > >> Looking at the worker.pl I use, yes there is no more IDLE timeout > >> cases. Then this will leave pilot jobs failing when it exceeds the > >> maxwalltime. This is another explanation for the large amount of > >> job > >> failures in OSG as well. > >> > >> Before the changes, I simply changed the IDLE timeout to exit > >> cleanly > >> (exit 0 instead of die) > >> > >> -Allan > >> > >> 2010/12/10 Michael Wilde : > >> > I added that idle timeout arg to worker.pl I think. But in recent > >> > changes I think Mihael removed the idle timeout entirely. Are you > >> > using a recent trunk version with those changes? That seemed to > >> > work > >> > best for me in my latest tests using passive persistent coaster > >> > servers. > >> > > >> > > >> > > >> > ----- Original Message ----- > >> >> The idle timeout having a non-zero exitcode generated a lot of > >> >> "JOB > >> >> FAILED" stats in OSG . this skews their usage report in a weird > >> >> fashion. I made some modifications before but my upgrade to the > >> >> latest trunk code somehow broke it. > >> >> > >> >> 2010/10/12 Allan Espinosa : > >> >> > Poking at worker.pl, I see that it accepts a third argument > >> >> > for > >> >> > idle > >> >> > time. Is > >> >> > this in seconds? > >> >> > > >> >> > Also, I'm using swift to driver a number of passive workers. > >> >> > The > >> >> > worker jobs > >> >> > fail due to this timeout. I may have to modify things to suit > >> >> > this > >> >> > kind of > >> >> > setup. > >> >> > > >> >> > Thanks, > >> >> > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Dec 10 22:36:25 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Dec 2010 20:36:25 -0800 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: References: <20101012210458.GE2510@origin> Message-ID: <1292042185.3760.6.camel@blabla2.none> Idle timeout of the workers? That should be disabled. On Fri, 2010-12-10 at 10:17 -0600, Allan Espinosa wrote: > The idle timeout having a non-zero exitcode generated a lot of "JOB > FAILED" stats in OSG . this skews their usage report in a weird > fashion. I made some modifications before but my upgrade to the > latest trunk code somehow broke it. > > 2010/10/12 Allan Espinosa : > > Poking at worker.pl, I see that it accepts a third argument for idle time. Is > > this in seconds? > > > > Also, I'm using swift to driver a number of passive workers. The worker jobs > > fail due to this timeout. I may have to modify things to suit this kind of > > setup. > > > > Thanks, > > -Allan > > > > From hategan at mcs.anl.gov Fri Dec 10 22:38:27 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Dec 2010 20:38:27 -0800 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: References: <1109883466.8235.1291998002949.JavaMail.root@zimbra.anl.gov> Message-ID: <1292042307.3760.8.camel@blabla2.none> On Fri, 2010-12-10 at 17:07 -0600, Allan Espinosa wrote: > Looking at the worker.pl I use, yes there is no more IDLE timeout > cases. Then this will leave pilot jobs failing when it exceeds the > maxwalltime. This is another explanation for the large amount of job > failures in OSG as well. If the server dies, the worker should eventually die due to lack of heartbeats. Theoretically. So I'm not sure what the circumstances are that cause the maxwalltime to be exceeded. Can you give some more details? Mihael > > Before the changes, I simply changed the IDLE timeout to exit cleanly > (exit 0 instead of die) > > -Allan > > 2010/12/10 Michael Wilde : > > I added that idle timeout arg to worker.pl I think. But in recent changes I think Mihael removed the idle timeout entirely. Are you using a recent trunk version with those changes? That seemed to work best for me in my latest tests using passive persistent coaster servers. > > > > > > > > ----- Original Message ----- > >> The idle timeout having a non-zero exitcode generated a lot of "JOB > >> FAILED" stats in OSG . this skews their usage report in a weird > >> fashion. I made some modifications before but my upgrade to the > >> latest trunk code somehow broke it. > >> > >> 2010/10/12 Allan Espinosa : > >> > Poking at worker.pl, I see that it accepts a third argument for idle > >> > time. Is > >> > this in seconds? > >> > > >> > Also, I'm using swift to driver a number of passive workers. The > >> > worker jobs > >> > fail due to this timeout. I may have to modify things to suit this > >> > kind of > >> > setup. > >> > > >> > Thanks, > >> > -Allan > >> > > >> > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > From hategan at mcs.anl.gov Fri Dec 10 22:39:56 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Dec 2010 20:39:56 -0800 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: References: <555961577.12352.1292022983780.JavaMail.root@zimbra.anl.gov> Message-ID: <1292042396.3760.10.camel@blabla2.none> On Fri, 2010-12-10 at 18:26 -0600, Allan Espinosa wrote: > I am not sure about passive workers though. Since swift is not > involved in the creation of the workers, it has no idea when to issue > the SHUTDOWN command to the workers (and service). Well, the problem with passive workers is that it becomes your responsibility not only to start them, but also to shut them down. > > -Allan > > 2010/12/10 Michael Wilde : > > Since your pilot jobs are scripts that launch worker.pl, you could put a timer in those scripts to kill worker.pl and exit cleanly. > > > > If you set maxtime in the pool entry to be somewhat less than the Condor jobtime setting for the pilot job, will Swift, even in the case of persistent coasters, (a) not start a job whose maxwalltime is > than the maxtime remaining, and (b) shut down workers when no queued job has fit into the remaining time of the worker for some idle timeout period? (I.e., I thought the reason IDLETIMEOUT could be removed from the worker was that the client (or the service) has similar logic. > > > > - Mike > > > > > > ----- Original Message ----- > >> Looking at the worker.pl I use, yes there is no more IDLE timeout > >> cases. Then this will leave pilot jobs failing when it exceeds the > >> maxwalltime. This is another explanation for the large amount of job > >> failures in OSG as well. > >> > >> Before the changes, I simply changed the IDLE timeout to exit cleanly > >> (exit 0 instead of die) > >> > >> -Allan > >> > >> 2010/12/10 Michael Wilde : > >> > I added that idle timeout arg to worker.pl I think. But in recent > >> > changes I think Mihael removed the idle timeout entirely. Are you > >> > using a recent trunk version with those changes? That seemed to work > >> > best for me in my latest tests using passive persistent coaster > >> > servers. > >> > > >> > > >> > > >> > ----- Original Message ----- > >> >> The idle timeout having a non-zero exitcode generated a lot of "JOB > >> >> FAILED" stats in OSG . this skews their usage report in a weird > >> >> fashion. I made some modifications before but my upgrade to the > >> >> latest trunk code somehow broke it. > >> >> > >> >> 2010/10/12 Allan Espinosa : > >> >> > Poking at worker.pl, I see that it accepts a third argument for > >> >> > idle > >> >> > time. Is > >> >> > this in seconds? > >> >> > > >> >> > Also, I'm using swift to driver a number of passive workers. The > >> >> > worker jobs > >> >> > fail due to this timeout. I may have to modify things to suit > >> >> > this > >> >> > kind of > >> >> > setup. > >> >> > > >> >> > Thanks, > >> >> > -Allan > From hategan at mcs.anl.gov Fri Dec 10 22:41:22 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Dec 2010 20:41:22 -0800 Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: <496816489.12613.1292032617166.JavaMail.root@zimbra.anl.gov> References: <496816489.12613.1292032617166.JavaMail.root@zimbra.anl.gov> Message-ID: <1292042482.3760.12.camel@blabla2.none> On Fri, 2010-12-10 at 19:56 -0600, Michael Wilde wrote: > I would think the service knows when each worker registered and how > long the worker has been idle, regardless of whether the server > started the worker itself. See my previous email. Essentially, no. The whole point of the passive workers was to allow the user to control the workers entirely. The server will not try to shut them down, regardless of how much work there is/isn't. > > We should be able to test this readily in a small controlled setup, > and validate the results with Mihael regarding whats supposed to > happen and what we'd like to have happen. > > - Mike > > > ----- Original Message ----- > > I am not sure about passive workers though. Since swift is not > > involved in the creation of the workers, it has no idea when to issue > > the SHUTDOWN command to the workers (and service). > > > > -Allan > > > > 2010/12/10 Michael Wilde : > > > Since your pilot jobs are scripts that launch worker.pl, you could > > > put a timer in those scripts to kill worker.pl and exit cleanly. > > > > > > If you set maxtime in the pool entry to be somewhat less than the > > > Condor jobtime setting for the pilot job, will Swift, even in the > > > case of persistent coasters, (a) not start a job whose maxwalltime > > > is > than the maxtime remaining, and (b) shut down workers when no > > > queued job has fit into the remaining time of the worker for some > > > idle timeout period? (I.e., I thought the reason IDLETIMEOUT could > > > be removed from the worker was that the client (or the service) has > > > similar logic. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > >> Looking at the worker.pl I use, yes there is no more IDLE timeout > > >> cases. Then this will leave pilot jobs failing when it exceeds the > > >> maxwalltime. This is another explanation for the large amount of > > >> job > > >> failures in OSG as well. > > >> > > >> Before the changes, I simply changed the IDLE timeout to exit > > >> cleanly > > >> (exit 0 instead of die) > > >> > > >> -Allan > > >> > > >> 2010/12/10 Michael Wilde : > > >> > I added that idle timeout arg to worker.pl I think. But in recent > > >> > changes I think Mihael removed the idle timeout entirely. Are you > > >> > using a recent trunk version with those changes? That seemed to > > >> > work > > >> > best for me in my latest tests using passive persistent coaster > > >> > servers. > > >> > > > >> > > > >> > > > >> > ----- Original Message ----- > > >> >> The idle timeout having a non-zero exitcode generated a lot of > > >> >> "JOB > > >> >> FAILED" stats in OSG . this skews their usage report in a weird > > >> >> fashion. I made some modifications before but my upgrade to the > > >> >> latest trunk code somehow broke it. > > >> >> > > >> >> 2010/10/12 Allan Espinosa : > > >> >> > Poking at worker.pl, I see that it accepts a third argument > > >> >> > for > > >> >> > idle > > >> >> > time. Is > > >> >> > this in seconds? > > >> >> > > > >> >> > Also, I'm using swift to driver a number of passive workers. > > >> >> > The > > >> >> > worker jobs > > >> >> > fail due to this timeout. I may have to modify things to suit > > >> >> > this > > >> >> > kind of > > >> >> > setup. > > >> >> > > > >> >> > Thanks, > > >> >> > -Allan > > > > -- > > Allan M. Espinosa > > PhD student, Computer Science > > University of Chicago > From wilde at mcs.anl.gov Sat Dec 11 09:25:30 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 11 Dec 2010 09:25:30 -0600 (CST) Subject: [Swift-devel] Re: worker.pl IDLETIMEOUT In-Reply-To: <1292042482.3760.12.camel@blabla2.none> Message-ID: <1903353837.13109.1292081130243.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Fri, 2010-12-10 at 19:56 -0600, Michael Wilde wrote: > > I would think the service knows when each worker registered and how > > long the worker has been idle, regardless of whether the server > > started the worker itself. > > See my previous email. Essentially, no. The whole point of the passive > workers was to allow the user to control the workers entirely. The > server will not try to shut them down, regardless of how much work > there > is/isn't. Yes, I recall the intent now - that makes perfect sense. - Mike From wilde at mcs.anl.gov Sat Dec 11 09:32:24 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 11 Dec 2010 09:32:24 -0600 (CST) Subject: [Swift-devel] 2 coaster issues In-Reply-To: <392787186.13107.1292081064617.JavaMail.root@zimbra.anl.gov> Message-ID: <1375372607.13122.1292081544887.JavaMail.root@zimbra.anl.gov> Mihael, can you comment on two new coaster questions that have come up this week: 1. worker.pl seems to consume a fair bit of time polling. (We need to verify and quantify, but I think Justin is seeing this on Intrepid). If thats correct, can the polling be done with a select that waits on both the service socket and on signals of SIGCHLD child process termination events? 2. On SGE machines (or other schedulers) that honor requests for multiple slots in a packed/fill manner) we typically get a varying number of slots assigned per node. For manually started workers for persistent servers, can each worker have its own setting of WorkersPerNode, so that a single worker can run the right number of jobs on each node? This relates a bit to question 1, as we can solve the problem by using WorkersPerNode=1 and starting Nslots workers per node, but if each is polling thats somewhat undesirable. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Sat Dec 11 12:20:46 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Sat, 11 Dec 2010 12:20:46 -0600 (Central Standard Time) Subject: [Swift-devel] 2 coaster issues In-Reply-To: <1375372607.13122.1292081544887.JavaMail.root@zimbra.anl.gov> References: <1375372607.13122.1292081544887.JavaMail.root@zimbra.anl.gov> Message-ID: On Sat, 11 Dec 2010, Michael Wilde wrote: > 1. worker.pl seems to consume a fair bit of time polling. Just to be clear, I haven't seen that the current implementation is a problem, it's just a thought. I tried different delay times in the loop and didn't get much of a performance difference. -- Justin M Wozniak From hategan at mcs.anl.gov Sat Dec 11 15:03:03 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 11 Dec 2010 13:03:03 -0800 Subject: [Swift-devel] 2 coaster issues In-Reply-To: References: <1375372607.13122.1292081544887.JavaMail.root@zimbra.anl.gov> Message-ID: <1292101383.5369.1.camel@blabla2.none> On Sat, 2010-12-11 at 12:20 -0600, Justin M Wozniak wrote: > On Sat, 11 Dec 2010, Michael Wilde wrote: > > > 1. worker.pl seems to consume a fair bit of time polling. > > Just to be clear, I haven't seen that the current implementation is a > problem, it's just a thought. I tried different delay times in the loop > and didn't get much of a performance difference. > Right. There shouldn't be. The maximum number of jobs is small compared to the scales at which polling for a process status would become a problem. From wilde at mcs.anl.gov Mon Dec 13 12:01:49 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 13 Dec 2010 12:01:49 -0600 (CST) Subject: [Swift-devel] Using multicore servers as Swift pools In-Reply-To: <1820609121.18056.1292262639864.JavaMail.root@zimbra.anl.gov> Message-ID: <100795199.18204.1292263309508.JavaMail.root@zimbra.anl.gov> Luiz, An example of the config files you will need in order to use the 10 8-core 64-but MCS compute servers as Swift pools is on the CI net under /home/wilde/swift/lab/{coasters.xml,auth.defaults.sample} The servers (10 x 64 bit and 3 x 32 bit) are listed at: http://wiki.mcs.anl.gov/IT/index.php/General_MCS_Questions#computeservers Since these machines are behind a firewall, you can use the ~/.ssh/config example below (adapted as needed) to make them accessible as if they permitted direct login. You login to login.mcs.anl.gov using your local ssh key, and then ssh ports are forwarded to each of the target machines that you want on the MCS network. The technique is explained at: http://articles.techrepublic.com.com/5100-10878_11-6155832.html This is one of the configurations we should test with new test scripts for Swift 0.91 Ive pasted the files below as well. - Mike === auth.defaults.sample === xlogin1.pads.ci.uchicago.edu.type=password xlogin1.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.type=key login.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa login.pads.ci.uchicago.edu.passphrase=mypassphrasegoeshere login.mcs.anl.gov.type=key login.mcs.anl.gov.username=wilde login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa login.mcs.anl.gov.passphrase=mypassphrasegoeshere === ~/.ssh/config set up for forwarding one compute server, "crush" Host * ServerAliveInterval 15 #ControlMaster auto #ControlPath ~/.ssh/ssh-connections/%r@%h:%p # COMPUTEHOSTS='crush thwomp stomp crank steamroller grind churn trounce thrash vanquish' Host mcs login.mcs.anl.gov Hostname login.mcs.anl.gov ForwardAgent yes ForwardX11 no LocalForward 19001 140.221.8.62:22 Host crush thwomp stomp crank steamroller grind churn trounce thrash vanquish ForwardAgent yes ForwardX11 no Hostname localhost NoHostAuthenticationForLocalhost yes Host crush Port 19001 === coasters.xml === 8 3500 1 1 1 .07 10000 /home/wilde/swiftwork/crush 8 3500 1 1 1 .31 10000 /home/wilde/swiftwork/thwomp ...etc for the rest of the 10 compute servers... - Mike From bugzilla-daemon at mcs.anl.gov Mon Dec 13 19:56:25 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 13 Dec 2010 19:56:25 -0600 (CST) Subject: [Swift-devel] [Bug 93] document URIs in mappers In-Reply-To: References: Message-ID: <20101214015625.56033563FD@wind-2.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=93 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |skenny at uchicago.edu AssignedTo|benc at hawaga.org.uk |skenny at uchicago.edu Status|ASSIGNED |RESOLVED Resolution| |FIXED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon Dec 13 19:56:25 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 13 Dec 2010 19:56:25 -0600 (CST) Subject: [Swift-devel] [Bug 94] document use of dcache with swift In-Reply-To: References: Message-ID: <20101214015625.98830563F9@wind-2.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=94 Bug 94 depends on bug 93, which changed state. Bug 93 Summary: document URIs in mappers http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=93 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Status|ASSIGNED |RESOLVED Resolution| |FIXED -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From benc at hawaga.org.uk Tue Dec 14 03:41:36 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 14 Dec 2010 09:41:36 +0000 (GMT) Subject: [Swift-devel] Format of site tests In-Reply-To: <705480754.9307.1292005829243.JavaMail.root@zimbra.anl.gov> References: <705480754.9307.1292005829243.JavaMail.root@zimbra.anl.gov> Message-ID: > I think Ben has something similar in the existing tests, but I have not > looked at those yet. The site tests as I left them were pretty much: for each site file in $SOMEDIR: run some subset of the local tests with that site file I think I made the subset of the local tests be one that would test "site-like functionality" eg file transfer and job submission, rather than "language-like functionality" eg the boolean not-operator. That structure didn't allow some things that would have been good to regularly test - eg. running a 1000 NOP job test through coasters on a site, while not running that test on a pure GRAM2 site, where such a run would be hilarious/tragic. -- From wilde at mcs.anl.gov Wed Dec 15 12:09:42 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 15 Dec 2010 12:09:42 -0600 (CST) Subject: [Swift-devel] Integration of prototype Globus Online interface Message-ID: <1439933143.7930.1292436582286.JavaMail.root@zimbra.anl.gov> Allan, Justin will look at both improving the prototype Globus Online interface (especially the hard-coded calls to external.sh) and integrating it into trunk (assuming that can be done with little or no risk to Swift reliability. Perhaps as a separate vdl-int.go.k for now, iff needed). This will probably happen latter in the week; for now I think just live with the hacks and hard-coded shortcuts for the sake of getting some larger-scale measurements and reliability tests. We'll discuss with Mihael on Friday the best way to proceed with long-term support for a Globus Online provider. - Mike From wilde at mcs.anl.gov Wed Dec 15 12:22:17 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 15 Dec 2010 12:22:17 -0600 (CST) Subject: [Swift-devel] Re: Probing running jobs In-Reply-To: <1238880872.8212.13.camel@localhost> Message-ID: <1711421966.8093.1292437337332.JavaMail.root@zimbra.anl.gov> Mihael, I never tried this feature but have a good use for it now for Swift-R debugging. Is the console monitor you refer to below the -tui or -monitor? Do you know if this feature is currently working in trunk? In the same vein, I think that the -tui option was broken last I tried it in trunk (I think it was just silent: nothing showed up on the screen). Whoever has a chance to try both of these interfaces next, can you report back to the list if they worked or failed for you? Thanks, - Mike ----- Original Message ----- > On Fri, 2009-04-03 at 08:38 -0500, Michael Wilde wrote: > > Following up on Mihael's question about a feature I listed in the > > to-do > > list I proposed for coasters: > > > > On 4/2/09 11:17 PM, Mihael Hategan wrote: > > > On Thu, 2009-04-02 at 21:01 -0500, Michael Wilde wrote: > > >>>> - some way to probe a job thats running on a coaster? > > >>> Define "probe". > > >> - ps -f on the running process. > > >> - probe its resource usage (/proc, also ps, etc) > > >> - ls -lR of its jobdir (as these will more often be on /tmp) > > >> > > >> We have these needs today; on the BGP under falkon we manually > > >> login to > > >> the node, but thats cumbersome: hard to find the node; 2-stage > > >> login > > >> process. > > >> > > >> Low prio, a pipe dream. But theoretically do-able. > > > > > > It should be possible (and somewhat interesting) to have a simple > > > shell > > > that can execute stuff on the workers while the job is running, so > > > that > > > you can issue your own commands. > > > > > > The question of how to find the right worker remains. Can you go a > > > bit > > > deeper into the details? How do you find the node currently (be as > > > specific as you can be)? > > > > In the oops workflow, I recall these cases at the moment: > > > > 1) Have my (large set of similar) jobs started? > > > > 2) Most jobs have finished. Are the remaining ones hung, or > > proceeding > > normally but slower for some application- or data-specific reason? > [...] > > In swift r2821 cog r2365 (I think), there is such a feature. > > If you start with the console monitor, you can go to the list of jobs. > Then select desired job, and push enter to display a detail pane. If > the > job is in the active state and if it's running on a coaster worker, > that > detail pane will have an extra button named "Worker Terminal". > Pressing > that will pop up a simple terminal that can be used to run relatively > arbitrary commands on the worker that the job is running on. > > It won't run commands that require console input (e.g., vi), so don't > try. > > It won't start you in the job directory, but the swift workflow > directory. That's because at some point we stopped using the GRAM > directory attribute for setting the initial job dir because some silly > site on OSG doesn't honor it. I think we should revisit the issue (I > suspect there is a solution that works in both cases). -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Wed Dec 15 15:27:02 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 15 Dec 2010 15:27:02 -0600 Subject: [Swift-devel] extracting the properties channel in vdl-ink.k Message-ID: Hi, How do you extract some properties in a host like the workdirectory? thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Wed Dec 15 17:59:56 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 15 Dec 2010 17:59:56 -0600 Subject: [Swift-devel] Re: Running the GO Swift prototype In-Reply-To: <412080246.19717.1292272548199.JavaMail.root@zimbra.anl.gov> References: <112892479.18799.1292267180081.JavaMail.root@zimbra.anl.gov> <412080246.19717.1292272548199.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike, I got the basic functionality working from your sample external.sh scripts. I was able to synthesize a workload of 200 transfers. I'll send you and raj about that in another email. The basic scripts starts to break at 3000 transfers. I set my number of files to 10k and foreach.maxthread to 3000 . I guess at this point, external.sh already create too much files at a time for the ready queue to handle. The number of processes forked is probably too much as well. communicado is already crawling at this point. Even though Swift already reported 3000 files staged in. The logs in external.sh only reported 758 transfers initiated to globus online. A CDM external handler will probably blow-up in general as it will fork a process / shellscript for each transfer. If foreach is set to 10000, we can't scale. I guess a more scalable solution for swift is to make a native call (karajan/java) to a queueing service (something like Stork in condor) for data transfer. -Allan 2010/12/13 Michael Wilde : > Allan, the code is on PADS login1 under /scratch and seems to work. > > You will need to look into the swift/src/trunk.gomods src tree to see what I changed in there. ?Some but perhaps not all the diffs in that tree are for supporting globus online. > > Let me know if you can replicate the example test below. > > Justin, it would be good if we can integrate the mods for this into trunk in some non-invasive way as way to share these tests, even if we do it as a separate vdl-int.GO.k that the user/experimenter needs to manually copy to vdl-int.k > > Or I guess we could put them in an experimental branch? > > - Mike > > --- > > 1 ) gorunner.sh >& gorunner.out > 2 ) PATH=/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/swift/dist/swift-svn/bin:$PATH > 3 ) swift -config cf -tc.file tc.data -sites.file sites.xml -cdm.file fs.ftponly gcat2.swift > Thats it. > In gorunner.out, should see: > ... > ./gorunner.sh: joblist is empty > ./gorunner.sh: joblist is empty > ./gorunner.sh: joblist is: > cp-yuptfz2k.job.in > ./gorunner.sh: started transfer task 4dfe903e-06f7-11e0-aa30-1231350018b1 > /home/wilde/swift/lab/go/gowaiter.sh: waiting on 4dfe903e-06f7-11e0-aa30-1231350018b1 > ./gorunner.sh: joblist is empty > ./gorunner.sh: joblist is empty > ./gorunner.sh: joblist is empty > /home/wilde/swift/lab/go/gowaiter.sh: 4dfe903e-06f7-11e0-aa30-1231350018b1 has completed > /home/wilde/swift/lab/go/gowaiter.sh: marked cp-yuptfz2k.job.in transferred > ./gorunner.sh: joblist is empty > ./gorunner.sh: joblist is empty > ... > On swift stdout/err should see: > login1$ swift -config cf -tc.file tc.data -sites.file sites.xml -cdm.file fs.ftponly gcat2.swift > CDM file: fs.ftponly > Swift svn swift-r3707 (swift modified locally) cog-r2932 (cog modified locally) > > RunID: 20101213-1426-dff3my97 > Progress: > /home/wilde/swift/lab/go/external.sh: running in /home/wilde/swift/lab/go > /home/wilde/swift/lab/go/external.sh: running in /home/wilde/swift/lab/go > Progress: ?Submitting:1 > in /home/wilde/swift/lab/go/cp.sh: wd=/home/wilde/swift/lab/go/work/gcat2-20101213-1426-dff3my97/jobs/y/cp-yuptfz2k arg1=etc/group arg2=output/plainoutput.txt > in /home/wilde/swift/lab/go/cp.sh: rc=0 > Progress: ?Checking status:1 > Final status: ?Finished successfully:1 > login1$ > Thats it. From wilde at mcs.anl.gov Wed Dec 15 18:10:43 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 15 Dec 2010 18:10:43 -0600 (CST) Subject: [Swift-devel] Re: Running the GO Swift prototype In-Reply-To: Message-ID: <1297944413.13100.1292458243030.JavaMail.root@zimbra.anl.gov> Hi Allan, This is at least partial good news, and nice progress. First step we can try on scaling (maybe easy, maybe not) is to cut down on external processes. I'll take a quick look and see if I can spot another strategy. The obvious strategy would be to bite the bullet and move to a true provider, that can keep a huge number of requests pending without consuming heavy resources. I'm fearful that external.sh needs to be synchronous, but maybe we can use a slightly different interface to separate the requests from the notifications. - Mike ----- Original Message ----- > Hi Mike, > > I got the basic functionality working from your sample external.sh > scripts. I was able to synthesize a workload of 200 transfers. I'll > send you and raj about that in another email. > > The basic scripts starts to break at 3000 transfers. I set my number > of files to 10k and foreach.maxthread to 3000 . I guess at this > point, external.sh already create too much files at a time for the > ready queue to handle. The number of processes forked is probably too > much as well. communicado is already crawling at this point. Even > though Swift already reported 3000 files staged in. The logs in > external.sh only reported 758 transfers initiated to globus online. > > A CDM external handler will probably blow-up in general as it will > fork a process / shellscript for each transfer. If foreach is set to > 10000, we can't scale. > > I guess a more scalable solution for swift is to make a native call > (karajan/java) to a queueing service (something like Stork in condor) > for data transfer. > > -Allan > > 2010/12/13 Michael Wilde : > > Allan, the code is on PADS login1 under /scratch and seems to work. > > > > You will need to look into the swift/src/trunk.gomods src tree to > > see what I changed in there. Some but perhaps not all the diffs in > > that tree are for supporting globus online. > > > > Let me know if you can replicate the example test below. > > > > Justin, it would be good if we can integrate the mods for this into > > trunk in some non-invasive way as way to share these tests, even if > > we do it as a separate vdl-int.GO.k that the user/experimenter needs > > to manually copy to vdl-int.k > > > > Or I guess we could put them in an experimental branch? > > > > - Mike > > > > --- > > > > 1 ) gorunner.sh >& gorunner.out > > 2 ) > > PATH=/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/swift/dist/swift-svn/bin:$PATH > > 3 ) swift -config cf -tc.file tc.data -sites.file sites.xml > > -cdm.file fs.ftponly gcat2.swift > > Thats it. > > In gorunner.out, should see: > > ... > > ./gorunner.sh: joblist is empty > > ./gorunner.sh: joblist is empty > > ./gorunner.sh: joblist is: > > cp-yuptfz2k.job.in > > ./gorunner.sh: started transfer task > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > /home/wilde/swift/lab/go/gowaiter.sh: waiting on > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > ./gorunner.sh: joblist is empty > > ./gorunner.sh: joblist is empty > > ./gorunner.sh: joblist is empty > > /home/wilde/swift/lab/go/gowaiter.sh: > > 4dfe903e-06f7-11e0-aa30-1231350018b1 has completed > > /home/wilde/swift/lab/go/gowaiter.sh: marked cp-yuptfz2k.job.in > > transferred > > ./gorunner.sh: joblist is empty > > ./gorunner.sh: joblist is empty > > ... > > On swift stdout/err should see: > > login1$ swift -config cf -tc.file tc.data -sites.file sites.xml > > -cdm.file fs.ftponly gcat2.swift > > CDM file: fs.ftponly > > Swift svn swift-r3707 (swift modified locally) cog-r2932 (cog > > modified locally) > > > > RunID: 20101213-1426-dff3my97 > > Progress: > > /home/wilde/swift/lab/go/external.sh: running in > > /home/wilde/swift/lab/go > > /home/wilde/swift/lab/go/external.sh: running in > > /home/wilde/swift/lab/go > > Progress: Submitting:1 > > in /home/wilde/swift/lab/go/cp.sh: > > wd=/home/wilde/swift/lab/go/work/gcat2-20101213-1426-dff3my97/jobs/y/cp-yuptfz2k > > arg1=etc/group arg2=output/plainoutput.txt > > in /home/wilde/swift/lab/go/cp.sh: rc=0 > > Progress: Checking status:1 > > Final status: Finished successfully:1 > > login1$ > > Thats it. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Dec 15 18:42:06 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 15 Dec 2010 18:42:06 -0600 (CST) Subject: [Swift-devel] Re: Running the GO Swift prototype In-Reply-To: <1297944413.13100.1292458243030.JavaMail.root@zimbra.anl.gov> Message-ID: <235522224.13184.1292460126184.JavaMail.root@zimbra.anl.gov> One possibility for scaling: - instead of calling task:execute(external.sh), have dostagein() put the request into the "ready/" queue/dir, just like external.sh does now, but direct from Karajan (ie, dont fork a process and shell). - then wait on a future in a map of futures, with the map key being the request id. - a listener reads the done/ queue periodically, posting the futures of the completed requests based on keys that remain associated with the requests in the queue This approach does not replace a true data provider: its still just a prototype to learn more about how to do staging efficiently using external interfaces. But its possible that if you know Karajan well the above logic is pretty easy, while writing a real data provider is more like a week of work or more, just to learn the mechanics. (And I think working out the interface logic to the external tool or service, as above, will help in building the provider). - Mike ----- Original Message ----- > Hi Allan, > > This is at least partial good news, and nice progress. > > First step we can try on scaling (maybe easy, maybe not) is to cut > down on external processes. I'll take a quick look and see if I can > spot another strategy. > > The obvious strategy would be to bite the bullet and move to a true > provider, that can keep a huge number of requests pending without > consuming heavy resources. > > I'm fearful that external.sh needs to be synchronous, but maybe we can > use a slightly different interface to separate the requests from the > notifications. > > - Mike > > ----- Original Message ----- > > Hi Mike, > > > > I got the basic functionality working from your sample external.sh > > scripts. I was able to synthesize a workload of 200 transfers. I'll > > send you and raj about that in another email. > > > > The basic scripts starts to break at 3000 transfers. I set my number > > of files to 10k and foreach.maxthread to 3000 . I guess at this > > point, external.sh already create too much files at a time for the > > ready queue to handle. The number of processes forked is probably > > too > > much as well. communicado is already crawling at this point. Even > > though Swift already reported 3000 files staged in. The logs in > > external.sh only reported 758 transfers initiated to globus online. > > > > A CDM external handler will probably blow-up in general as it will > > fork a process / shellscript for each transfer. If foreach is set to > > 10000, we can't scale. > > > > I guess a more scalable solution for swift is to make a native call > > (karajan/java) to a queueing service (something like Stork in > > condor) > > for data transfer. > > > > -Allan > > > > 2010/12/13 Michael Wilde : > > > Allan, the code is on PADS login1 under /scratch and seems to > > > work. > > > > > > You will need to look into the swift/src/trunk.gomods src tree to > > > see what I changed in there. Some but perhaps not all the diffs in > > > that tree are for supporting globus online. > > > > > > Let me know if you can replicate the example test below. > > > > > > Justin, it would be good if we can integrate the mods for this > > > into > > > trunk in some non-invasive way as way to share these tests, even > > > if > > > we do it as a separate vdl-int.GO.k that the user/experimenter > > > needs > > > to manually copy to vdl-int.k > > > > > > Or I guess we could put them in an experimental branch? > > > > > > - Mike > > > > > > --- > > > > > > 1 ) gorunner.sh >& gorunner.out > > > 2 ) > > > PATH=/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/swift/dist/swift-svn/bin:$PATH > > > 3 ) swift -config cf -tc.file tc.data -sites.file sites.xml > > > -cdm.file fs.ftponly gcat2.swift > > > Thats it. > > > In gorunner.out, should see: > > > ... > > > ./gorunner.sh: joblist is empty > > > ./gorunner.sh: joblist is empty > > > ./gorunner.sh: joblist is: > > > cp-yuptfz2k.job.in > > > ./gorunner.sh: started transfer task > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > > /home/wilde/swift/lab/go/gowaiter.sh: waiting on > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > > ./gorunner.sh: joblist is empty > > > ./gorunner.sh: joblist is empty > > > ./gorunner.sh: joblist is empty > > > /home/wilde/swift/lab/go/gowaiter.sh: > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 has completed > > > /home/wilde/swift/lab/go/gowaiter.sh: marked cp-yuptfz2k.job.in > > > transferred > > > ./gorunner.sh: joblist is empty > > > ./gorunner.sh: joblist is empty > > > ... > > > On swift stdout/err should see: > > > login1$ swift -config cf -tc.file tc.data -sites.file sites.xml > > > -cdm.file fs.ftponly gcat2.swift > > > CDM file: fs.ftponly > > > Swift svn swift-r3707 (swift modified locally) cog-r2932 (cog > > > modified locally) > > > > > > RunID: 20101213-1426-dff3my97 > > > Progress: > > > /home/wilde/swift/lab/go/external.sh: running in > > > /home/wilde/swift/lab/go > > > /home/wilde/swift/lab/go/external.sh: running in > > > /home/wilde/swift/lab/go > > > Progress: Submitting:1 > > > in /home/wilde/swift/lab/go/cp.sh: > > > wd=/home/wilde/swift/lab/go/work/gcat2-20101213-1426-dff3my97/jobs/y/cp-yuptfz2k > > > arg1=etc/group arg2=output/plainoutput.txt > > > in /home/wilde/swift/lab/go/cp.sh: rc=0 > > > Progress: Checking status:1 > > > Final status: Finished successfully:1 > > > login1$ > > > Thats it. > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Dec 15 19:38:40 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 15 Dec 2010 19:38:40 -0600 (CST) Subject: [Swift-devel] Re: Running the GO Swift prototype In-Reply-To: <235522224.13184.1292460126184.JavaMail.root@zimbra.anl.gov> Message-ID: <1789816446.13305.1292463520728.JavaMail.root@zimbra.anl.gov> Hmmm - looking briefly at the Karajan library, I see elements that will *perhaps* let us write out request files. We'd need to read notifications in in units of a file at a time, and parse them. Somewhat primitive, but maybe workable for prototyping. Writing a regular provider is starting to look like a better idea. - Mike ----- Original Message ----- > One possibility for scaling: > > - instead of calling task:execute(external.sh), have dostagein() put > the request into the "ready/" queue/dir, just like external.sh does > now, but direct from Karajan (ie, dont fork a process and shell). > > - then wait on a future in a map of futures, with the map key being > the request id. > > - a listener reads the done/ queue periodically, posting the futures > of the completed requests based on keys that remain associated with > the requests in the queue > > This approach does not replace a true data provider: its still just a > prototype to learn more about how to do staging efficiently using > external interfaces. > > But its possible that if you know Karajan well the above logic is > pretty easy, while writing a real data provider is more like a week of > work or more, just to learn the mechanics. (And I think working out > the interface logic to the external tool or service, as above, will > help in building the provider). > > - Mike > > ----- Original Message ----- > > Hi Allan, > > > > This is at least partial good news, and nice progress. > > > > First step we can try on scaling (maybe easy, maybe not) is to cut > > down on external processes. I'll take a quick look and see if I can > > spot another strategy. > > > > The obvious strategy would be to bite the bullet and move to a true > > provider, that can keep a huge number of requests pending without > > consuming heavy resources. > > > > I'm fearful that external.sh needs to be synchronous, but maybe we > > can > > use a slightly different interface to separate the requests from the > > notifications. > > > > - Mike > > > > ----- Original Message ----- > > > Hi Mike, > > > > > > I got the basic functionality working from your sample external.sh > > > scripts. I was able to synthesize a workload of 200 transfers. > > > I'll > > > send you and raj about that in another email. > > > > > > The basic scripts starts to break at 3000 transfers. I set my > > > number > > > of files to 10k and foreach.maxthread to 3000 . I guess at this > > > point, external.sh already create too much files at a time for the > > > ready queue to handle. The number of processes forked is probably > > > too > > > much as well. communicado is already crawling at this point. Even > > > though Swift already reported 3000 files staged in. The logs in > > > external.sh only reported 758 transfers initiated to globus > > > online. > > > > > > A CDM external handler will probably blow-up in general as it will > > > fork a process / shellscript for each transfer. If foreach is set > > > to > > > 10000, we can't scale. > > > > > > I guess a more scalable solution for swift is to make a native > > > call > > > (karajan/java) to a queueing service (something like Stork in > > > condor) > > > for data transfer. > > > > > > -Allan > > > > > > 2010/12/13 Michael Wilde : > > > > Allan, the code is on PADS login1 under /scratch and seems to > > > > work. > > > > > > > > You will need to look into the swift/src/trunk.gomods src tree > > > > to > > > > see what I changed in there. Some but perhaps not all the diffs > > > > in > > > > that tree are for supporting globus online. > > > > > > > > Let me know if you can replicate the example test below. > > > > > > > > Justin, it would be good if we can integrate the mods for this > > > > into > > > > trunk in some non-invasive way as way to share these tests, even > > > > if > > > > we do it as a separate vdl-int.GO.k that the user/experimenter > > > > needs > > > > to manually copy to vdl-int.k > > > > > > > > Or I guess we could put them in an experimental branch? > > > > > > > > - Mike > > > > > > > > --- > > > > > > > > 1 ) gorunner.sh >& gorunner.out > > > > 2 ) > > > > PATH=/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/swift/dist/swift-svn/bin:$PATH > > > > 3 ) swift -config cf -tc.file tc.data -sites.file sites.xml > > > > -cdm.file fs.ftponly gcat2.swift > > > > Thats it. > > > > In gorunner.out, should see: > > > > ... > > > > ./gorunner.sh: joblist is empty > > > > ./gorunner.sh: joblist is empty > > > > ./gorunner.sh: joblist is: > > > > cp-yuptfz2k.job.in > > > > ./gorunner.sh: started transfer task > > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > > > /home/wilde/swift/lab/go/gowaiter.sh: waiting on > > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 > > > > ./gorunner.sh: joblist is empty > > > > ./gorunner.sh: joblist is empty > > > > ./gorunner.sh: joblist is empty > > > > /home/wilde/swift/lab/go/gowaiter.sh: > > > > 4dfe903e-06f7-11e0-aa30-1231350018b1 has completed > > > > /home/wilde/swift/lab/go/gowaiter.sh: marked cp-yuptfz2k.job.in > > > > transferred > > > > ./gorunner.sh: joblist is empty > > > > ./gorunner.sh: joblist is empty > > > > ... > > > > On swift stdout/err should see: > > > > login1$ swift -config cf -tc.file tc.data -sites.file sites.xml > > > > -cdm.file fs.ftponly gcat2.swift > > > > CDM file: fs.ftponly > > > > Swift svn swift-r3707 (swift modified locally) cog-r2932 (cog > > > > modified locally) > > > > > > > > RunID: 20101213-1426-dff3my97 > > > > Progress: > > > > /home/wilde/swift/lab/go/external.sh: running in > > > > /home/wilde/swift/lab/go > > > > /home/wilde/swift/lab/go/external.sh: running in > > > > /home/wilde/swift/lab/go > > > > Progress: Submitting:1 > > > > in /home/wilde/swift/lab/go/cp.sh: > > > > wd=/home/wilde/swift/lab/go/work/gcat2-20101213-1426-dff3my97/jobs/y/cp-yuptfz2k > > > > arg1=etc/group arg2=output/plainoutput.txt > > > > in /home/wilde/swift/lab/go/cp.sh: rc=0 > > > > Progress: Checking status:1 > > > > Final status: Finished successfully:1 > > > > login1$ > > > > Thats it. > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 17 19:56:01 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 17 Dec 2010 19:56:01 -0600 Subject: [Swift-devel] Re: coaster service log4j.properties In-Reply-To: References: Message-ID: I'm bumping this thread to swift-devel. I'm starting to need (at a higher priority) the timing information in the persistent logs. -Allan 2010/10/21 Allan Espinosa : > Hi, > > I set my > > log4j.logger.org.globus.cog.abstraction.coaster.rlog=DEBUG > > > but the persistent coaster-service still seems to be in INFO mode. > Does bin/coaster-service still look at etc/log4j.properties ? Or do i > need to specify the log4j.properties file in the bin/coaster-service > script itself as a java flag? > > Thanks. > -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Sat Dec 18 00:17:35 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 18 Dec 2010 00:17:35 -0600 Subject: [Swift-devel] Re: coaster service log4j.properties In-Reply-To: References: Message-ID: <1292653055.322.13.camel@blabla2.none> Whatever goes to coaster.log is governed by the log4j.properties in provider-coaster/resources and that is stuck into the coaster jar file at compile time (so modifications to that require recompilation). The rlog stuff, that's the coaster service forwarding some specific log messages to the client. It does not currently forward all log4j messages, but some select messages distinct from what goes to log4j. They are forwarded as INFO level (given that that's just a sample thing for now), so you won't see anything else than INFO from the rlog package. Given that the actual calls to log4j are done on the client side, the swift log4j.properties would decide the level there. I think in the long run we would probably want to have all log4j stuff forwarded, at least in the case in which the service is deployed automatically, but that's not there now. I think it should also be done seamlessly, such that log messages appear to come from the original classes rather than RemoteLogHandler. Mihael On Fri, 2010-12-17 at 19:56 -0600, Allan Espinosa wrote: > I'm bumping this thread to swift-devel. I'm starting to need (at a > higher priority) the timing information in the persistent logs. > > -Allan > > 2010/10/21 Allan Espinosa : > > Hi, > > > > I set my > > > > log4j.logger.org.globus.cog.abstraction.coaster.rlog=DEBUG > > > > > > but the persistent coaster-service still seems to be in INFO mode. > > Does bin/coaster-service still look at etc/log4j.properties ? Or do i > > need to specify the log4j.properties file in the bin/coaster-service > > script itself as a java flag? > > > > Thanks. > > -Allan > > > From ketancmaheshwari at gmail.com Mon Dec 20 06:37:40 2010 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 20 Dec 2010 13:37:40 +0100 Subject: [Swift-devel] Understanding the Swift Codebase Message-ID: Hello Swift Developers, My name is Ketan. I am initiating this conversation to ask some questions to begin understanding the Swift code and the overall structure of the codebase. Is the codebase suited to an IDE such as Eclipse or Netbeans? Is there a recommended point from where I can start understanding the code. My special interest lies in the aspects of workflow translation to low level representation and classes handling the input data and dataflow across the activities in a Swift workflow. Regards, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Mon Dec 20 11:14:00 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 20 Dec 2010 11:14:00 -0600 (Central Standard Time) Subject: [Swift-devel] Understanding the Swift Codebase In-Reply-To: References: Message-ID: Hi Ketan I think the best thing to do is check out the code using command-line tools and open things up in Eclipse. Project files for Eclipse are in the repos to make things a bit easier. Let us know how that goes and if there are ... As far as understanding the code, I would look at the tests directories and work your way through those. The entry points to the language level functionality are in classes named Loader. Using Eclipse to step through the first few operations is a great way to start. I actually started a wiki page about Swift/Eclipse/SVN back in the Spring but kind of ran out of ideas on what to add to it- feel free to contribute: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/EclipseNotes Justin On Mon, 20 Dec 2010, Ketan Maheshwari wrote: > Hello Swift Developers, > > My name is Ketan. > > I am initiating this conversation to ask some questions to begin > understanding the Swift code and the overall structure of the codebase. > > Is the codebase suited to an IDE such as Eclipse or Netbeans? > > Is there a recommended point from where I can start understanding the code. > > My special interest lies in the aspects of workflow translation to low level > representation and classes handling the input data and dataflow across the > activities in a Swift workflow. > > Regards, > Ketan > -- Justin M Wozniak From skenny at uchicago.edu Wed Dec 29 12:11:06 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 29 Dec 2010 10:11:06 -0800 Subject: [Swift-devel] branching for stabilization of release .95 Message-ID: hey all, i was planning to branch the current trunk tomorrow so it can be stabilized for release .95 unless anyone thinks there's a reason to hold off on this (?) ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Dec 29 12:15:15 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 29 Dec 2010 12:15:15 -0600 Subject: [Swift-devel] branching for stabilization of release .95 In-Reply-To: References: Message-ID: <1293646515.31270.0.camel@blabla2.none> I have yet to merge the stable branch to trunk. This may involve some manual work and it might take a while. Mihael On Wed, 2010-12-29 at 10:11 -0800, Sarah Kenny wrote: > hey all, i was planning to branch the current trunk tomorrow so it can > be stabilized for release .95 unless anyone thinks there's a reason to > hold off on this (?) > > ~sk > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Wed Dec 29 12:16:30 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 29 Dec 2010 10:16:30 -0800 Subject: [Swift-devel] branching for stabilization of release .95 In-Reply-To: <1293646515.31270.0.camel@blabla2.none> References: <1293646515.31270.0.camel@blabla2.none> Message-ID: ah, ok, no problem. On Wed, Dec 29, 2010 at 10:15 AM, Mihael Hategan wrote: > I have yet to merge the stable branch to trunk. This may involve some > manual work and it might take a while. > > Mihael > > On Wed, 2010-12-29 at 10:11 -0800, Sarah Kenny wrote: > > hey all, i was planning to branch the current trunk tomorrow so it can > > be stabilized for release .95 unless anyone thinks there's a reason to > > hold off on this (?) > > > > ~sk > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Wed Dec 29 15:28:08 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 29 Dec 2010 15:28:08 -0600 Subject: [Swift-devel] Some remote workers + provider staging logs (ReplyTimeouts on large workflows) Message-ID: Run cat jobs (with 2.3 MB data file) to 6 remote sites. The coaster service is run in communicado. I attached the log file and the service log file of one of the services that show the exception. sites file is coaster_osg.xml provider.staging=true (default proxy) Snippet of error messages (log): 10.000(0.039):2623/3 overload: 1, 0.1 2010-12-29 14:52:34,092-0600 INFO vdl:execute Exception in cat: Arguments: [RuptureVariations/100/5/100_5.txt.variation-s0004-h0005] Host: USCMS-FNAL-WC1__cmsosgce3.fnal.gov Directory: catsall-20101229-1449-7rs3j584/jobs/2/cat-239zrp3kTODO: outs ---- Caused by: Job failed with an exit code of 521 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 521 2010-12-29 14:52:34,092-0600 DEBUG WeightedHostScoreScheduler multiplyScore(USCMS-FNAL-WC1__cmsosgce3.fnal.gov:-9.900(0.039):2623/3 overload: 1, -0.5) from service log: Congestion queue size: 0 Plan time: 1 Sender 315976503 queue size: 0 Command(5, GET): handling reply timeout; sendReqTime=101229-144933.031, sendTime=101229-144933.033, now=101229-145133 .036 Command(5, GET): re-sending Command(5, GET)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:283) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:288) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Sending Command(5, GET) on MetaChannel: 1286943672[832416103: {}] -> GSSSChannel-null(3)[832416103: {}] Command(28, GET): handling reply timeout; sendReqTime=101229-144933.109, sendTime=101229-144933.126, now=101229-145133.129 Command(28, GET): re-sending Command(28, GET)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:283) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:288) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Sending Command(28, GET) on MetaChannel: 1286943672[832416103: {}] -> GSSSChannel-null(3)[832416103: {}] Command(29, GET): handling reply timeout; sendReqTime=101229-144933.109, sendTime=101229-144933.127, now=101229-145133.136 ... ... ... USCMS-FNAL-WC1__cmsosgce3.fnal.gov:101 pull Sending Command(1, SUBMITJOB) on SC-USCMS-FNAL-WC1__cmsosgce3.fnal.gov-000101 USCMS-FNAL-WC1__cmsosgce3.fnal.gov:102 pull java.lang.IllegalStateException: Timer already cancelled. at java.util.Timer.sched(Timer.java:354) at java.util.Timer.schedule(Timer.java:170) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) Sending Command(1, SUBMITJOB) on SC-USCMS-FNAL-WC1__cmsosgce3.fnal.gov-000102 Sending Command(682, JOBSTATUS) on GSSSChannel-null(3)[832416103: {}] java.lang.IllegalStateException: Timer already cancelled. at java.util.Timer.sched(Timer.java:354) at java.util.Timer.schedule(Timer.java:170) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) The site USCMS-fnal-wc1 has a throttle of 68.86 = 6888 job capacity. But currently it only has 100 workers available. The log reports it receive 2.6k jobs from the workflow. Does the timeout occur from the jobs being to long in the coaster service queue? I did the same workflow on PADS only (site throttle makes it receive only a maximum of 400 jobs). I got the same errors at some point when my workers failed at a time less than the timeout period: The last line shows the worker.pl message when it exited: rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111/5 rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111 rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/wrapper.log unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/stdout.txt rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k Failed to process data: at /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl line 639. -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: catsall-20101229-1449-7rs3j584.log.gz Type: application/x-gzip Size: 1053260 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: service-6.log.gz Type: application/x-gzip Size: 171561 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: coaster_osg.xml Type: text/xml Size: 4368 bytes Desc: not available URL: From bugzilla-daemon at mcs.anl.gov Wed Dec 29 20:46:58 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 29 Dec 2010 20:46:58 -0600 (CST) Subject: [Swift-devel] [Bug 239] New: Java 1.5 compatibility issue - @Override Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=239 Summary: Java 1.5 compatibility issue - @Override Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: dk0966 at cs.ship.edu When trying to build swift with Java 1.5, I receive errors related to the use of @Override. In Java 1.5 you can only use the @Override annotation when overriding methods of a class. If the method is defined in an interface rather than a superclass, a compilation error will occur. (1.6 allows @Override for both classes and interfaces) I ran into this when trying to build swift on sisboombah. Patch attached. java version "1.5.0_14" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-b03, mixed mode) Ant output compile: [echo] [util]: COMPILE [mkdir] Created dir: /home/dk0966/cog/modules/util/build [javac] Compiling 53 source files to /home/dk0966/cog/modules/util/build [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:71: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:105: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:144: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:227: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:232: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:237: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:242: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:247: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:252: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /home/dk0966/cog/modules/util/src/org/globus/cog/util/CopyOnWriteArrayList.java:257: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 10 errors -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From hategan at mcs.anl.gov Thu Dec 30 00:01:50 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Dec 2010 00:01:50 -0600 Subject: [Swift-devel] Some remote workers + provider staging logs (ReplyTimeouts on large workflows) In-Reply-To: References: Message-ID: <1293688910.24531.7.camel@blabla2.none> On Wed, 2010-12-29 at 15:28 -0600, Allan Espinosa wrote: > Does the timeout occur from the jobs being to long in the coaster > service queue? No. The coaster protocol requires each command sent on a channel to be acknowledged (pretty much like TCP does). Either the worker was very busy (unlikely by design) or it has a fault that disturbed its main event loop or there was an actual networking problem (also unlikely). > > > I did the same workflow on PADS only (site throttle makes it receive > only a maximum of 400 jobs). I got the same errors at some point when > my workers failed at a time less than the timeout period: > > The last line shows the worker.pl message when it exited: > > rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111/5 > rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111 > rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations > unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/wrapper.log > unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/stdout.txt > rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k > Failed to process data: at > /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > line 639. I wish perl had a stack trace. Can you enable TRACE on the worker and re-run and send me the log for the failing worker? Mihael From aespinosa at cs.uchicago.edu Thu Dec 30 11:51:40 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 30 Dec 2010 11:51:40 -0600 Subject: [Swift-devel] Some remote workers + provider staging logs (ReplyTimeouts on large workflows) In-Reply-To: <1293688910.24531.7.camel@blabla2.none> References: <1293688910.24531.7.camel@blabla2.none> Message-ID: I redid the OSG run with only 1 worker per coaster service and the same workflow finished without problems. I'll investigate if there are problems on multiple workers by making a testbed case in PADS as well. 2010/12/30 Mihael Hategan : > On Wed, 2010-12-29 at 15:28 -0600, Allan Espinosa wrote: > >> Does the timeout occur from the jobs being to long in the coaster >> service queue? > > No. The coaster protocol requires each command sent on a channel to be > acknowledged (pretty much like TCP does). Either the worker was very > busy (unlikely by design) or it has a fault that disturbed its main > event loop or there was an actual networking problem (also unlikely). > >> >> >> I did the same workflow on PADS only (site throttle makes it receive >> only a maximum of 400 jobs). ?I got the same errors at some point when >> my workers failed at a time less than the timeout period: >> >> The last line shows the worker.pl message when it exited: >> >> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111/5 >> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111 >> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations >> unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/wrapper.log >> unlink /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/stdout.txt >> rmdir /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k >> Failed to process data: ?at >> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl >> line 639. > > I wish perl had a stack trace. Can you enable TRACE on the worker and > re-run and send me the log for the failing worker? > > Mihael > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Thu Dec 30 12:51:43 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Dec 2010 12:51:43 -0600 (CST) Subject: [Swift-devel] Some remote workers + provider staging logs (ReplyTimeouts on large workflows) In-Reply-To: Message-ID: <763241375.13842.1293735103760.JavaMail.root@zimbra.anl.gov> Hi Allan, It would be good to get client, service and worker logs for a reasonably small failing case - I suspect Mihael could diagnose the problem from that. I will try to join you by Skype at 2PM if thats convenient for you and Dan . - Mike ----- Original Message ----- > I redid the OSG run with only 1 worker per coaster service and the > same workflow finished without problems. I'll investigate if there > are problems on multiple workers by making a testbed case in PADS as > well. > > 2010/12/30 Mihael Hategan : > > On Wed, 2010-12-29 at 15:28 -0600, Allan Espinosa wrote: > > > >> Does the timeout occur from the jobs being to long in the coaster > >> service queue? > > > > No. The coaster protocol requires each command sent on a channel to > > be > > acknowledged (pretty much like TCP does). Either the worker was very > > busy (unlikely by design) or it has a fault that disturbed its main > > event loop or there was an actual networking problem (also > > unlikely). > > > >> > >> > >> I did the same workflow on PADS only (site throttle makes it > >> receive > >> only a maximum of 400 jobs). I got the same errors at some point > >> when > >> my workers failed at a time less than the timeout period: > >> > >> The last line shows the worker.pl message when it exited: > >> > >> rmdir > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111/5 > >> rmdir > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations/111 > >> rmdir > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/RuptureVariations > >> unlink > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/wrapper.log > >> unlink > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k/stdout.txt > >> rmdir > >> /gpfs/pads/swift/aespinosa/swift-runs/catsall-20101229-1501-x92u64yc-0-cat-0asfsp3k > >> Failed to process data: at > >> /home/aespinosa/swift/cogkit/modules/provider-coaster/resources/worker.pl > >> line 639. > > > > I wish perl had a stack trace. Can you enable TRACE on the worker > > and > > re-run and send me the log for the failing worker? > > > > Mihael > > > > > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 31 16:03:59 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 31 Dec 2010 16:03:59 -0600 Subject: [Swift-devel] unknown handler CANCELJOB Message-ID: I just updated to the latest trunk. The CANCELJOBs here are issues because of replications in my workflow. Swift r3830 and Cog 2988 Snippet dump from the persistent coaster service: Plan time: 1 Plan time: 1 Unknown handler: CANCELJOB. Available handlers: {CHMOD=class org.globus.cog.abstraction.impl.file.coaster.handlers.ChmodHandler, ISDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.IsDirectoryHandler, LIST=class org.globus.cog.abstraction.impl.file.coaster.handlers.ListHandler, SUBMITJOB=class org.globus.cog.abstraction.coaster.service.SubmitJobHandler, MKDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.MkdirHandler, PUT=class org.globus.cog.abstraction.impl.file.coaster.handlers.PutFileHandler, DEL=class org.globus.cog.abstraction.impl.file.coaster.handlers.DeleteHandler, HEARTBEAT=class org.globus.cog.karajan.workflow.service.handlers.HeartBeatHandler, CONFIGSERVICE=class org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler, FILEINFO=class org.globus.cog.abstraction.impl.file.coaster.handlers.FileInfoHandler, SHUTDOWNSERVICE=class org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler, SHUTDOWN=class org.globus.cog.karajan.workflow.service.handlers.ShutdownHandler, EXISTS=class org.globus.cog.abstraction.impl.file.coaster.handlers.ExistsHandler, CHANNELCONFIG=class org.globus.cog.karajan.workflow.service.handlers.ChannelConfigurationHandler, RMDIR=class org.globus.cog.abstraction.impl.file.coaster.handlers.RmdirHandler, RENAME=class org.globus.cog.abstraction.impl.file.coaster.handlers.RenameHandler, VERSION=class org.globus.cog.karajan.workflow.service.handlers.VersionHandler, WORKERSHELLCMD=class org.globus.cog.abstraction.coaster.service.WorkerShellHandler, GET=class org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler} -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From bugzilla-daemon at mcs.anl.gov Fri Dec 31 17:47:23 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 31 Dec 2010 17:47:23 -0600 (CST) Subject: [Swift-devel] [Bug 31] error message should not refer to java exception classes In-Reply-To: References: Message-ID: <20101231234723.5FBF9563FD@wind-2.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=31 skenny changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED CC| |swift-devel at ci.uchicago.edu Resolution| |FIXED --- Comment #3 from skenny 2010-12-31 17:47:22 --- [skenny at martini tests]$ swift mapperparam.swift Swift svn swift-r3834 (swift modified locally) cog-r2988 RunID: 20101231-1546-c4gymxze Execution failed: CSV mapper must have a file parameter. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are watching the reporter.