From ketan at mcs.anl.gov Mon Dec 1 15:15:51 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 1 Dec 2014 15:15:51 -0600 Subject: [Swift-user] round floating point numbers Message-ID: Hi, Is it possible to round floating point numbers to just 1 or 2 digits of precision as opposed to the arbitrary precision as I am getting by default, eg: foreach i in [0.0:0.9:0.1]{ trace(i); } is giving: wiftScript trace: 0.0 SwiftScript trace: 0.8999999999999999 SwiftScript trace: 0.6 SwiftScript trace: 0.7999999999999999 SwiftScript trace: 0.4 SwiftScript trace: 0.30000000000000004 SwiftScript trace: 0.2 SwiftScript trace: 0.7 SwiftScript trace: 0.1 SwiftScript trace: 0.5 I need to round off values like 0.8999999 to 0.8 and so on. Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Dec 2 14:28:23 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 2 Dec 2014 14:28:23 -0600 Subject: [Swift-user] reverse of regexp-mapper Message-ID: Hi, Multiple runs of an app call are producing output file with same name. Is it possible to tell Swift to rename them before staging into the results directory? In effect something opposite of regexp_mappers transform where in the produced file is transformed into a new name and brought to results directory. Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at anl.gov Wed Dec 3 10:50:40 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Wed, 3 Dec 2014 16:50:40 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost Message-ID: Hi all, I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. 768 of the 2187 tasks completed successfully and then I got the exception: exception @ swift-int.k, line: 530 Caused by: Block task failed: Connection to worker lost org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 And the process seems to have stopped. What log file would be helpful for diagnosing this? Jonathan From yadunand at uchicago.edu Wed Dec 3 11:04:36 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 03 Dec 2014 11:04:36 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: References: Message-ID: <547F42A4.3090101@uchicago.edu> Hi Jonathan, The issue you are seeing sounds pretty close to what David reported a while back. Could you send us a tar ball of your run directory from a failed run ? Could you also check if you've set lowOverAllocation and highOverAllocation in your sites definition ? Thanks, Yadu On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: > Hi all, > > I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. > 768 of the 2187 tasks completed successfully and then I got the exception: > > exception @ swift-int.k, line: 530 > Caused by: Block task failed: Connection to worker lost > org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] > at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) > at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 > Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 > > And the process seems to have stopped. > > What log file would be helpful for diagnosing this? > > Jonathan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From xio247 at gmail.com Wed Dec 3 13:16:13 2014 From: xio247 at gmail.com (Jonathan Ozik) Date: Wed, 3 Dec 2014 13:16:13 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <547F42A4.3090101@uchicago.edu> References: <547F42A4.3090101@uchicago.edu> Message-ID: <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> Hi Yadu, The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 I?m also attaching the swift.properties file that I used below. Thank you, Jonathan > On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji wrote: > > Hi Jonathan, > > The issue you are seeing sounds pretty close to what David reported a > while back. > Could you send us a tar ball of your run directory from a failed run ? > > Could you also check if you've set lowOverAllocation and > highOverAllocation in your sites definition ? > > Thanks, > Yadu > > On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >> Hi all, >> >> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >> 768 of the 2187 tasks completed successfully and then I got the exception: >> >> exception @ swift-int.k, line: 530 >> Caused by: Block task failed: Connection to worker lost >> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >> at java.util.TimerThread.mainLoop(Timer.java:555) >> at java.util.TimerThread.run(Timer.java:505) >> >> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >> >> And the process seems to have stopped. >> >> What log file would be helpful for diagnosing this? >> >> Jonathan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift.properties Type: application/octet-stream Size: 3052 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Wed Dec 3 19:03:30 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 03 Dec 2014 19:03:30 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> Message-ID: <547FB2E2.7040102@uchicago.edu> Hi Jonathan, I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk and would recommend that you try a run with that. I've also converted your swift.properties to the new swift.conf format. You can get a tested .conf file along with a small test case from here: http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz Here are some changes I've made to the conf: lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. staging set to direct, since you are running on the shared FS. added worker logging and an app definition for debug. You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/03/2014 01:16 PM, Jonathan Ozik wrote: > Hi Yadu, > > The tar.gz archive is here: > https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 > I?m also attaching the swift.properties file that I used below. > > Thank you, > > Jonathan > >> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > > wrote: >> >> Hi Jonathan, >> >> The issue you are seeing sounds pretty close to what David reported a >> while back. >> Could you send us a tar ball of your run directory from a failed run ? >> >> Could you also check if you've set lowOverAllocation and >> highOverAllocation in your sites definition ? >> >> Thanks, >> Yadu >> >> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>> Hi all, >>> >>> I?m trying to run a large set of simulations on Midway using Swift >>> 0.95-RC5. >>> 768 of the 2187 tasks completed successfully and then I got the >>> exception: >>> >>> exception @ swift-int.k, line: 530 >>> Caused by: Block task failed: Connection to worker lost >>> org.globus.cog.coaster.TimeoutException: Channel timed out. >>> lastTime=141203-145449.325, now=141203-145649.844, >>> channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>> at >>> org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>> at >>> org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>> at java.util.TimerThread.mainLoop(Timer.java:555) >>> at java.util.TimerThread.run(Timer.java:505) >>> >>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 >>> Finished successfully:768 Failed but can retry:762 >>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 >>> Finished successfully:768 Failed but can retry:724 >>> >>> And the process seems to have stopped. >>> >>> What log file would be helpful for diagnosing this? >>> >>> Jonathan >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xio247 at gmail.com Thu Dec 4 10:48:41 2014 From: xio247 at gmail.com (Jonathan Ozik) Date: Thu, 4 Dec 2014 10:48:41 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <547FB2E2.7040102@uchicago.edu> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> Message-ID: Thanks Yadu, I have a few questions. - How do I invoke swift and pass it the new swift.conf? - What is the ?restart? procedure? - Is there a module I can load to use the latest swift trunk? Jonathan > On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji wrote: > > Hi Jonathan, > > I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk > and would recommend that you try a run with that. I've also converted your swift.properties to > the new swift.conf format. You can get a tested .conf file along with a small test case from here: > > http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz > > Here are some changes I've made to the conf: > lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. > staging set to direct, since you are running on the shared FS. > added worker logging and an app definition for debug. > > You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz > > Thanks, > Yadu > > On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >> Hi Yadu, >> >> The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >> I?m also attaching the swift.properties file that I used below. >> >> Thank you, >> >> Jonathan >> >>> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: >>> >>> Hi Jonathan, >>> >>> The issue you are seeing sounds pretty close to what David reported a >>> while back. >>> Could you send us a tar ball of your run directory from a failed run ? >>> >>> Could you also check if you've set lowOverAllocation and >>> highOverAllocation in your sites definition ? >>> >>> Thanks, >>> Yadu >>> >>> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>>> Hi all, >>>> >>>> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >>>> 768 of the 2187 tasks completed successfully and then I got the exception: >>>> >>>> exception @ swift-int.k, line: 530 >>>> Caused by: Block task failed: Connection to worker lost >>>> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>>> at java.util.TimerThread.mainLoop(Timer.java:555) >>>> at java.util.TimerThread.run(Timer.java:505) >>>> >>>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >>>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >>>> >>>> And the process seems to have stopped. >>>> >>>> What log file would be helpful for diagnosing this? >>>> >>>> Jonathan >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Thu Dec 4 11:14:51 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Thu, 04 Dec 2014 11:14:51 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> Message-ID: <5480968B.7060108@uchicago.edu> Hi Jonathan, If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify the file on the commandline, otherwise specify the config file using the -config option: swift -config To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: swift -resume run001/restart.log ... The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. Generally you can always get the latest trunk builds here, (atmost a week older than last commit): http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/04/2014 10:48 AM, Jonathan Ozik wrote: > Thanks Yadu, > > I have a few questions. > - How do I invoke swift and pass it the new swift.conf? > - What is the ?restart? procedure? > - Is there a module I can load to use the latest swift trunk? > > Jonathan > >> On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > > wrote: >> >> Hi Jonathan, >> >> I believe some of the issues related to timeouts seen in your logs >> are fixed/less likely in trunk >> and would recommend that you try a run with that. I've also converted >> your swift.properties to >> the new swift.conf format. You can get a tested .conf file along with >> a small test case from here: >> >> http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz >> >> Here are some changes I've made to the conf: >> lazyErrors: true and executionRetries: 0 so that long running jobs >> are not retried. >> staging set to direct, since you are running on the shared FS. >> added worker logging and an app definition for debug. >> >> You can get the latest trunk build from here : >> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >> >> Thanks, >> Yadu >> >> On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >>> Hi Yadu, >>> >>> The tar.gz archive is here: >>> https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >>> I?m also attaching the swift.properties file that I used below. >>> >>> Thank you, >>> >>> Jonathan >>> >>>> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji >>>> > wrote: >>>> >>>> Hi Jonathan, >>>> >>>> The issue you are seeing sounds pretty close to what David reported a >>>> while back. >>>> Could you send us a tar ball of your run directory from a failed run ? >>>> >>>> Could you also check if you've set lowOverAllocation and >>>> highOverAllocation in your sites definition ? >>>> >>>> Thanks, >>>> Yadu >>>> >>>> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>>>> Hi all, >>>>> >>>>> I?m trying to run a large set of simulations on Midway using Swift >>>>> 0.95-RC5. >>>>> 768 of the 2187 tasks completed successfully and then I got the >>>>> exception: >>>>> >>>>> exception @ swift-int.k, line: 530 >>>>> Caused by: Block task failed: Connection to worker lost >>>>> org.globus.cog.coaster.TimeoutException: Channel timed out. >>>>> lastTime=141203-145449.325, now=141203-145649.844, >>>>> channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>>>> at >>>>> org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>>>> at >>>>> org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>>>> at java.util.TimerThread.mainLoop(Timer.java:555) >>>>> at java.util.TimerThread.run(Timer.java:505) >>>>> >>>>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 >>>>> Finished successfully:768 Failed but can retry:762 >>>>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 >>>>> Finished successfully:768 Failed but can retry:724 >>>>> >>>>> And the process seems to have stopped. >>>>> >>>>> What log file would be helpful for diagnosing this? >>>>> >>>>> Jonathan >>>>> >>>>> >>>>> _______________________________________________ >>>>> Swift-user mailing list >>>>> Swift-user at ci.uchicago.edu >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at anl.gov Thu Dec 4 12:33:23 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Thu, 4 Dec 2014 18:33:23 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <5480968B.7060108@uchicago.edu> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> Message-ID: Hi Yadu, I?ve tried running with trunk and am getting a strange Java error this time: No method: getProperty in java.lang.System with parameter types[class java.lang.String] swiftscript:java @ repast, line: 267 at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) at k.thr.LWThread.run(LWThread.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Execution failed: Error attempting to use: java.lang.System swiftscript:java @ repast, line: 267 I think this is being triggered by the call: string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); Which worked just fine with 0.95 RC5. Any thoughts? Jonathan On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: Hi Jonathan, If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify the file on the commandline, otherwise specify the config file using the -config option: swift -config To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: swift -resume run001/restart.log ... The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. Generally you can always get the latest trunk builds here, (atmost a week older than last commit): http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/04/2014 10:48 AM, Jonathan Ozik wrote: Thanks Yadu, I have a few questions. - How do I invoke swift and pass it the new swift.conf? - What is the ?restart? procedure? - Is there a module I can load to use the latest swift trunk? Jonathan On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: Hi Jonathan, I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk and would recommend that you try a run with that. I've also converted your swift.properties to the new swift.conf format. You can get a tested .conf file along with a small test case from here: http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz Here are some changes I've made to the conf: lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. staging set to direct, since you are running on the shared FS. added worker logging and an app definition for debug. You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/03/2014 01:16 PM, Jonathan Ozik wrote: Hi Yadu, The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 I?m also attaching the swift.properties file that I used below. Thank you, Jonathan On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: Hi Jonathan, The issue you are seeing sounds pretty close to what David reported a while back. Could you send us a tar ball of your run directory from a failed run ? Could you also check if you've set lowOverAllocation and highOverAllocation in your sites definition ? Thanks, Yadu On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: Hi all, I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. 768 of the 2187 tasks completed successfully and then I got the exception: exception @ swift-int.k, line: 530 Caused by: Block task failed: Connection to worker lost org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 And the process seems to have stopped. What log file would be helpful for diagnosing this? Jonathan _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Dec 4 13:01:38 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 4 Dec 2014 11:01:38 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> Message-ID: <1417719698.15860.3.camel@echo> Hi Jonathan, I fixed this in GIT. Yadu, can you compile the latest GIT please? Mihael On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: > Hi Yadu, > > I?ve tried running with trunk and am getting a strange Java error this time: > No method: getProperty in java.lang.System with parameter types[class java.lang.String] > swiftscript:java @ repast, line: 267 > > at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) > at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) > at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) > at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) > at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) > at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) > at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) > at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) > at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) > at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) > at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) > at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) > at k.thr.LWThread.run(LWThread.java:247) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Execution failed: > Error attempting to use: java.lang.System > swiftscript:java @ repast, line: 267 > > I think this is being triggered by the call: > string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); > > Which worked just fine with 0.95 RC5. > > Any thoughts? > > Jonathan > > On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: > > Hi Jonathan, > > If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify > the file on the commandline, otherwise specify the config file using the -config option: > swift -config > > To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: > swift -resume run001/restart.log ... > The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. > > There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. > > Generally you can always get the latest trunk builds here, (atmost a week older than last commit): > http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz > > Thanks, > Yadu > > On 12/04/2014 10:48 AM, Jonathan Ozik wrote: > Thanks Yadu, > > I have a few questions. > - How do I invoke swift and pass it the new swift.conf? > - What is the ?restart? procedure? > - Is there a module I can load to use the latest swift trunk? > > Jonathan > > On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: > > Hi Jonathan, > > I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk > and would recommend that you try a run with that. I've also converted your swift.properties to > the new swift.conf format. You can get a tested .conf file along with a small test case from here: > > http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz > > Here are some changes I've made to the conf: > lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. > staging set to direct, since you are running on the shared FS. > added worker logging and an app definition for debug. > > You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz > > Thanks, > Yadu > > On 12/03/2014 01:16 PM, Jonathan Ozik wrote: > Hi Yadu, > > The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 > I?m also attaching the swift.properties file that I used below. > > Thank you, > > Jonathan > > On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: > > Hi Jonathan, > > The issue you are seeing sounds pretty close to what David reported a > while back. > Could you send us a tar ball of your run directory from a failed run ? > > Could you also check if you've set lowOverAllocation and > highOverAllocation in your sites definition ? > > Thanks, > Yadu > > On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: > Hi all, > > I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. > 768 of the 2187 tasks completed successfully and then I got the exception: > > exception @ swift-int.k, line: 530 > Caused by: Block task failed: Connection to worker lost > org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] > at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) > at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 > Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 > > And the process seems to have stopped. > > What log file would be helpful for diagnosing this? > > Jonathan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From yadunand at uchicago.edu Thu Dec 4 13:23:30 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Thu, 04 Dec 2014 13:23:30 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1417719698.15860.3.camel@echo> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> Message-ID: <5480B4B2.6040407@uchicago.edu> Hi Jonathan, I rebuilt the trunk package with Mihael's fixes, and you can get it from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz -Yadu On 12/04/2014 01:01 PM, Mihael Hategan wrote: > Hi Jonathan, > > I fixed this in GIT. Yadu, can you compile the latest GIT please? > > Mihael > > On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: >> Hi Yadu, >> >> I?ve tried running with trunk and am getting a strange Java error this time: >> No method: getProperty in java.lang.System with parameter types[class java.lang.String] >> swiftscript:java @ repast, line: 267 >> >> at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) >> at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) >> at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) >> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) >> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >> at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) >> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >> at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) >> at k.thr.LWThread.run(LWThread.java:247) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> Execution failed: >> Error attempting to use: java.lang.System >> swiftscript:java @ repast, line: 267 >> >> I think this is being triggered by the call: >> string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); >> >> Which worked just fine with 0.95 RC5. >> >> Any thoughts? >> >> Jonathan >> >> On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: >> >> Hi Jonathan, >> >> If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify >> the file on the commandline, otherwise specify the config file using the -config option: >> swift -config >> >> To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: >> swift -resume run001/restart.log ... >> The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. >> >> There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. >> >> Generally you can always get the latest trunk builds here, (atmost a week older than last commit): >> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >> >> Thanks, >> Yadu >> >> On 12/04/2014 10:48 AM, Jonathan Ozik wrote: >> Thanks Yadu, >> >> I have a few questions. >> - How do I invoke swift and pass it the new swift.conf? >> - What is the ?restart? procedure? >> - Is there a module I can load to use the latest swift trunk? >> >> Jonathan >> >> On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: >> >> Hi Jonathan, >> >> I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk >> and would recommend that you try a run with that. I've also converted your swift.properties to >> the new swift.conf format. You can get a tested .conf file along with a small test case from here: >> >> http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz >> >> Here are some changes I've made to the conf: >> lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. >> staging set to direct, since you are running on the shared FS. >> added worker logging and an app definition for debug. >> >> You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >> >> Thanks, >> Yadu >> >> On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >> Hi Yadu, >> >> The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >> I?m also attaching the swift.properties file that I used below. >> >> Thank you, >> >> Jonathan >> >> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: >> >> Hi Jonathan, >> >> The issue you are seeing sounds pretty close to what David reported a >> while back. >> Could you send us a tar ball of your run directory from a failed run ? >> >> Could you also check if you've set lowOverAllocation and >> highOverAllocation in your sites definition ? >> >> Thanks, >> Yadu >> >> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >> Hi all, >> >> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >> 768 of the 2187 tasks completed successfully and then I got the exception: >> >> exception @ swift-int.k, line: 530 >> Caused by: Block task failed: Connection to worker lost >> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >> at java.util.TimerThread.mainLoop(Timer.java:555) >> at java.util.TimerThread.run(Timer.java:505) >> >> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >> >> And the process seems to have stopped. >> >> What log file would be helpful for diagnosing this? >> >> Jonathan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > From wilde at anl.gov Thu Dec 4 13:33:24 2014 From: wilde at anl.gov (Michael Wilde) Date: Thu, 4 Dec 2014 13:33:24 -0600 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <5480B4B2.6040407@uchicago.edu> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> Message-ID: <5480B704.6050707@anl.gov> We should (and will) add a getcwd( ) library function to eliminate this particular need for java( ), though. - Mike On 12/4/14 1:23 PM, Yadu Nand Babuji wrote: > Hi Jonathan, > > I rebuilt the trunk package with Mihael's fixes, and you can get it from > here : > http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz > > -Yadu > > On 12/04/2014 01:01 PM, Mihael Hategan wrote: >> Hi Jonathan, >> >> I fixed this in GIT. Yadu, can you compile the latest GIT please? >> >> Mihael >> >> On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: >>> Hi Yadu, >>> >>> I?ve tried running with trunk and am getting a strange Java error this time: >>> No method: getProperty in java.lang.System with parameter types[class java.lang.String] >>> swiftscript:java @ repast, line: 267 >>> >>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) >>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) >>> at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) >>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) >>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>> at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) >>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>> at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) >>> at k.thr.LWThread.run(LWThread.java:247) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Execution failed: >>> Error attempting to use: java.lang.System >>> swiftscript:java @ repast, line: 267 >>> >>> I think this is being triggered by the call: >>> string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); >>> >>> Which worked just fine with 0.95 RC5. >>> >>> Any thoughts? >>> >>> Jonathan >>> >>> On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: >>> >>> Hi Jonathan, >>> >>> If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify >>> the file on the commandline, otherwise specify the config file using the -config option: >>> swift -config >>> >>> To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: >>> swift -resume run001/restart.log ... >>> The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. >>> >>> There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. >>> >>> Generally you can always get the latest trunk builds here, (atmost a week older than last commit): >>> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>> >>> Thanks, >>> Yadu >>> >>> On 12/04/2014 10:48 AM, Jonathan Ozik wrote: >>> Thanks Yadu, >>> >>> I have a few questions. >>> - How do I invoke swift and pass it the new swift.conf? >>> - What is the ?restart? procedure? >>> - Is there a module I can load to use the latest swift trunk? >>> >>> Jonathan >>> >>> On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: >>> >>> Hi Jonathan, >>> >>> I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk >>> and would recommend that you try a run with that. I've also converted your swift.properties to >>> the new swift.conf format. You can get a tested .conf file along with a small test case from here: >>> >>> http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz >>> >>> Here are some changes I've made to the conf: >>> lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. >>> staging set to direct, since you are running on the shared FS. >>> added worker logging and an app definition for debug. >>> >>> You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>> >>> Thanks, >>> Yadu >>> >>> On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >>> Hi Yadu, >>> >>> The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >>> I?m also attaching the swift.properties file that I used below. >>> >>> Thank you, >>> >>> Jonathan >>> >>> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: >>> >>> Hi Jonathan, >>> >>> The issue you are seeing sounds pretty close to what David reported a >>> while back. >>> Could you send us a tar ball of your run directory from a failed run ? >>> >>> Could you also check if you've set lowOverAllocation and >>> highOverAllocation in your sites definition ? >>> >>> Thanks, >>> Yadu >>> >>> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>> Hi all, >>> >>> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >>> 768 of the 2187 tasks completed successfully and then I got the exception: >>> >>> exception @ swift-int.k, line: 530 >>> Caused by: Block task failed: Connection to worker lost >>> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>> at java.util.TimerThread.mainLoop(Timer.java:555) >>> at java.util.TimerThread.run(Timer.java:505) >>> >>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >>> >>> And the process seems to have stopped. >>> >>> What log file would be helpful for diagnosing this? >>> >>> Jonathan >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From yadudoc1729 at gmail.com Thu Dec 4 14:02:27 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 4 Dec 2014 14:02:27 -0600 Subject: [Swift-user] round floating point numbers In-Reply-To: References: Message-ID: This is not a rounding solution,but it should give the same results you are looking for : foreach i in [1.0:9.0:1.0]{ float foo = i/10; tracef("%f\n", foo); } Let me know if this doesn't work for you. -Yadu On Mon, Dec 1, 2014 at 3:15 PM, Ketan Maheshwari wrote: > Hi, > > Is it possible to round floating point numbers to just 1 or 2 digits of > precision as opposed to the arbitrary precision as I am getting by default, > eg: > > foreach i in [0.0:0.9:0.1]{ > trace(i); > } > > is giving: > > wiftScript trace: 0.0 > SwiftScript trace: 0.8999999999999999 > SwiftScript trace: 0.6 > SwiftScript trace: 0.7999999999999999 > SwiftScript trace: 0.4 > SwiftScript trace: 0.30000000000000004 > SwiftScript trace: 0.2 > SwiftScript trace: 0.7 > SwiftScript trace: 0.1 > SwiftScript trace: 0.5 > > I need to round off values like 0.8999999 to 0.8 and so on. > > Thanks, > Ketan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at anl.gov Thu Dec 4 14:58:38 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Thu, 4 Dec 2014 20:58:38 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <5480B704.6050707@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> Message-ID: <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> Thank you all, The job is queued up now. I?ll update on the results. Jonathan > On Dec 4, 2014, at 1:33 PM, Michael Wilde wrote: > > We should (and will) add a getcwd( ) library function to eliminate this > particular need for java( ), though. > > - Mike > > > On 12/4/14 1:23 PM, Yadu Nand Babuji wrote: >> Hi Jonathan, >> >> I rebuilt the trunk package with Mihael's fixes, and you can get it from >> here : >> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >> >> -Yadu >> >> On 12/04/2014 01:01 PM, Mihael Hategan wrote: >>> Hi Jonathan, >>> >>> I fixed this in GIT. Yadu, can you compile the latest GIT please? >>> >>> Mihael >>> >>> On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: >>>> Hi Yadu, >>>> >>>> I?ve tried running with trunk and am getting a strange Java error this time: >>>> No method: getProperty in java.lang.System with parameter types[class java.lang.String] >>>> swiftscript:java @ repast, line: 267 >>>> >>>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) >>>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) >>>> at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) >>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) >>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>> at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) >>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>> at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) >>>> at k.thr.LWThread.run(LWThread.java:247) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> Execution failed: >>>> Error attempting to use: java.lang.System >>>> swiftscript:java @ repast, line: 267 >>>> >>>> I think this is being triggered by the call: >>>> string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); >>>> >>>> Which worked just fine with 0.95 RC5. >>>> >>>> Any thoughts? >>>> >>>> Jonathan >>>> >>>> On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: >>>> >>>> Hi Jonathan, >>>> >>>> If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify >>>> the file on the commandline, otherwise specify the config file using the -config option: >>>> swift -config >>>> >>>> To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: >>>> swift -resume run001/restart.log ... >>>> The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. >>>> >>>> There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. >>>> >>>> Generally you can always get the latest trunk builds here, (atmost a week older than last commit): >>>> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>>> >>>> Thanks, >>>> Yadu >>>> >>>> On 12/04/2014 10:48 AM, Jonathan Ozik wrote: >>>> Thanks Yadu, >>>> >>>> I have a few questions. >>>> - How do I invoke swift and pass it the new swift.conf? >>>> - What is the ?restart? procedure? >>>> - Is there a module I can load to use the latest swift trunk? >>>> >>>> Jonathan >>>> >>>> On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: >>>> >>>> Hi Jonathan, >>>> >>>> I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk >>>> and would recommend that you try a run with that. I've also converted your swift.properties to >>>> the new swift.conf format. You can get a tested .conf file along with a small test case from here: >>>> >>>> http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz >>>> >>>> Here are some changes I've made to the conf: >>>> lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. >>>> staging set to direct, since you are running on the shared FS. >>>> added worker logging and an app definition for debug. >>>> >>>> You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>>> >>>> Thanks, >>>> Yadu >>>> >>>> On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >>>> Hi Yadu, >>>> >>>> The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >>>> I?m also attaching the swift.properties file that I used below. >>>> >>>> Thank you, >>>> >>>> Jonathan >>>> >>>> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: >>>> >>>> Hi Jonathan, >>>> >>>> The issue you are seeing sounds pretty close to what David reported a >>>> while back. >>>> Could you send us a tar ball of your run directory from a failed run ? >>>> >>>> Could you also check if you've set lowOverAllocation and >>>> highOverAllocation in your sites definition ? >>>> >>>> Thanks, >>>> Yadu >>>> >>>> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>>> Hi all, >>>> >>>> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >>>> 768 of the 2187 tasks completed successfully and then I got the exception: >>>> >>>> exception @ swift-int.k, line: 530 >>>> Caused by: Block task failed: Connection to worker lost >>>> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>>> at java.util.TimerThread.mainLoop(Timer.java:555) >>>> at java.util.TimerThread.run(Timer.java:505) >>>> >>>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >>>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >>>> >>>> And the process seems to have stopped. >>>> >>>> What log file would be helpful for diagnosing this? >>>> >>>> Jonathan >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jozik at anl.gov Thu Dec 4 17:57:01 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Thu, 4 Dec 2014 23:57:01 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> Message-ID: <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> The "staging: direct? option that?s included in the swift.conf file Yadu provided, I don?t seem to see a definition for it in the user guide. I?m having a path name issue and I suspect it could be something to do with the staging, but I?m not sure. If I use a ?-upf=filename.txt? command line argument to a swift script that includes the lines: string upf_str = @arg("upf","unrolledParamFile.txt"); file params_file ; If I use the filename(params_file) command, would I get ?filename.txt? with the default staging and the full path of the filename.txt file with the ?direct? staging? Or is this a change between 0.95 RC5 and trunk? Jonathan > On Dec 4, 2014, at 2:58 PM, Ozik, Jonathan wrote: > > Thank you all, > > The job is queued up now. I?ll update on the results. > > Jonathan > >> On Dec 4, 2014, at 1:33 PM, Michael Wilde wrote: >> >> We should (and will) add a getcwd( ) library function to eliminate this >> particular need for java( ), though. >> >> - Mike >> >> >> On 12/4/14 1:23 PM, Yadu Nand Babuji wrote: >>> Hi Jonathan, >>> >>> I rebuilt the trunk package with Mihael's fixes, and you can get it from >>> here : >>> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>> >>> -Yadu >>> >>> On 12/04/2014 01:01 PM, Mihael Hategan wrote: >>>> Hi Jonathan, >>>> >>>> I fixed this in GIT. Yadu, can you compile the latest GIT please? >>>> >>>> Mihael >>>> >>>> On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: >>>>> Hi Yadu, >>>>> >>>>> I?ve tried running with trunk and am getting a strange Java error this time: >>>>> No method: getProperty in java.lang.System with parameter types[class java.lang.String] >>>>> swiftscript:java @ repast, line: 267 >>>>> >>>>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) >>>>> at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) >>>>> at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) >>>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) >>>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>>> at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) >>>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>>> at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) >>>>> at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) >>>>> at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) >>>>> at k.thr.LWThread.run(LWThread.java:247) >>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> Execution failed: >>>>> Error attempting to use: java.lang.System >>>>> swiftscript:java @ repast, line: 267 >>>>> >>>>> I think this is being triggered by the call: >>>>> string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); >>>>> >>>>> Which worked just fine with 0.95 RC5. >>>>> >>>>> Any thoughts? >>>>> >>>>> Jonathan >>>>> >>>>> On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: >>>>> >>>>> Hi Jonathan, >>>>> >>>>> If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify >>>>> the file on the commandline, otherwise specify the config file using the -config option: >>>>> swift -config >>>>> >>>>> To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: >>>>> swift -resume run001/restart.log ... >>>>> The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. >>>>> >>>>> There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. >>>>> >>>>> Generally you can always get the latest trunk builds here, (atmost a week older than last commit): >>>>> http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>>>> >>>>> Thanks, >>>>> Yadu >>>>> >>>>> On 12/04/2014 10:48 AM, Jonathan Ozik wrote: >>>>> Thanks Yadu, >>>>> >>>>> I have a few questions. >>>>> - How do I invoke swift and pass it the new swift.conf? >>>>> - What is the ?restart? procedure? >>>>> - Is there a module I can load to use the latest swift trunk? >>>>> >>>>> Jonathan >>>>> >>>>> On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: >>>>> >>>>> Hi Jonathan, >>>>> >>>>> I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk >>>>> and would recommend that you try a run with that. I've also converted your swift.properties to >>>>> the new swift.conf format. You can get a tested .conf file along with a small test case from here: >>>>> >>>>> http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz >>>>> >>>>> Here are some changes I've made to the conf: >>>>> lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. >>>>> staging set to direct, since you are running on the shared FS. >>>>> added worker logging and an app definition for debug. >>>>> >>>>> You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz >>>>> >>>>> Thanks, >>>>> Yadu >>>>> >>>>> On 12/03/2014 01:16 PM, Jonathan Ozik wrote: >>>>> Hi Yadu, >>>>> >>>>> The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 >>>>> I?m also attaching the swift.properties file that I used below. >>>>> >>>>> Thank you, >>>>> >>>>> Jonathan >>>>> >>>>> On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: >>>>> >>>>> Hi Jonathan, >>>>> >>>>> The issue you are seeing sounds pretty close to what David reported a >>>>> while back. >>>>> Could you send us a tar ball of your run directory from a failed run ? >>>>> >>>>> Could you also check if you've set lowOverAllocation and >>>>> highOverAllocation in your sites definition ? >>>>> >>>>> Thanks, >>>>> Yadu >>>>> >>>>> On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: >>>>> Hi all, >>>>> >>>>> I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. >>>>> 768 of the 2187 tasks completed successfully and then I got the exception: >>>>> >>>>> exception @ swift-int.k, line: 530 >>>>> Caused by: Block task failed: Connection to worker lost >>>>> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] >>>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) >>>>> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) >>>>> at java.util.TimerThread.mainLoop(Timer.java:555) >>>>> at java.util.TimerThread.run(Timer.java:505) >>>>> >>>>> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 >>>>> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 >>>>> >>>>> And the process seems to have stopped. >>>>> >>>>> What log file would be helpful for diagnosing this? >>>>> >>>>> Jonathan >>>>> >>>>> >>>>> _______________________________________________ >>>>> Swift-user mailing list >>>>> Swift-user at ci.uchicago.edu >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>> >>>>> _______________________________________________ >>>>> Swift-user mailing list >>>>> Swift-user at ci.uchicago.edu >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Swift-user mailing list >>>>> Swift-user at ci.uchicago.edu >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> -- >> Michael Wilde >> Mathematics and Computer Science Computation Institute >> Argonne National Laboratory The University of Chicago >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jozik at anl.gov Thu Dec 4 22:11:36 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Fri, 5 Dec 2014 04:11:36 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> Message-ID: <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> I?ve looked a bit closer into the differences between the different staging options, and chose the ?local? option for now, even though this is probably not the most efficient in terms of creating unnecessarily large amounts of copies of the input files needed for each app invocation. Speaking of which, in the User Guide (http://swift-lang.org/guides/trunk/userguide/userguide.html), there is a section that states ?The wrapper script creates the application workspace directory; places the input files for that job into the application workspace directory using either cp or ln -s (depending on a configuration option)?,? but I couldn?t find any more information on enabling the symlinking of input files. Is this associated with a specific type of staging or configuration? Jonathan On Dec 4, 2014, at 5:57 PM, Ozik, Jonathan > wrote: The "staging: direct? option that?s included in the swift.conf file Yadu provided, I don?t seem to see a definition for it in the user guide. I?m having a path name issue and I suspect it could be something to do with the staging, but I?m not sure. If I use a ?-upf=filename.txt? command line argument to a swift script that includes the lines: string upf_str = @arg("upf","unrolledParamFile.txt"); file params_file ; If I use the filename(params_file) command, would I get ?filename.txt? with the default staging and the full path of the filename.txt file with the ?direct? staging? Or is this a change between 0.95 RC5 and trunk? Jonathan On Dec 4, 2014, at 2:58 PM, Ozik, Jonathan > wrote: Thank you all, The job is queued up now. I?ll update on the results. Jonathan On Dec 4, 2014, at 1:33 PM, Michael Wilde > wrote: We should (and will) add a getcwd( ) library function to eliminate this particular need for java( ), though. - Mike On 12/4/14 1:23 PM, Yadu Nand Babuji wrote: Hi Jonathan, I rebuilt the trunk package with Mihael's fixes, and you can get it from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz -Yadu On 12/04/2014 01:01 PM, Mihael Hategan wrote: Hi Jonathan, I fixed this in GIT. Yadu, can you compile the latest GIT please? Mihael On Thu, 2014-12-04 at 18:33 +0000, Ozik, Jonathan wrote: Hi Yadu, I?ve tried running with trunk and am getting a strange Java error this time: No method: getProperty in java.lang.System with parameter types[class java.lang.String] swiftscript:java @ repast, line: 267 at org.griphyn.vdl.karajan.lib.swiftscript.Java.getMethod(Java.java:192) at org.griphyn.vdl.karajan.lib.swiftscript.Java.function(Java.java:162) at org.griphyn.vdl.karajan.lib.SwiftFunction.runBody(SwiftFunction.java:77) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:175) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.InternalFunction.run(InternalFunction.java:165) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.Sequential.run(Sequential.java:41) at org.globus.cog.karajan.compiled.nodes.CompoundNode.runChild(CompoundNode.java:110) at org.globus.cog.karajan.compiled.nodes.UParallel$1.run(UParallel.java:91) at k.thr.LWThread.run(LWThread.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Execution failed: Error attempting to use: java.lang.System swiftscript:java @ repast, line: 267 I think this is being triggered by the call: string s = strcat(java("java.lang.System","getProperty","user.dir"),"/?); Which worked just fine with 0.95 RC5. Any thoughts? Jonathan On Dec 4, 2014, at 11:14 AM, Yadu Nand Babuji > wrote: Hi Jonathan, If your config file is named swift.conf and is in the current directory, it will be automatically selected and you needn't specify the file on the commandline, otherwise specify the config file using the -config option: swift -config To resume from the log, say the restart.log in your run001 folder specify the restart.log using the -resume option: swift -resume run001/restart.log ... The restart log is from an 0.95 run, and I'm not quite sure if it will work correctly with trunk. There is no trunk module available on Midway, since we rebuild from source to keep up to date with changes in the codebase. Generally you can always get the latest trunk builds here, (atmost a week older than last commit): http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/04/2014 10:48 AM, Jonathan Ozik wrote: Thanks Yadu, I have a few questions. - How do I invoke swift and pass it the new swift.conf? - What is the ?restart? procedure? - Is there a module I can load to use the latest swift trunk? Jonathan On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji > wrote: Hi Jonathan, I believe some of the issues related to timeouts seen in your logs are fixed/less likely in trunk and would recommend that you try a run with that. I've also converted your swift.properties to the new swift.conf format. You can get a tested .conf file along with a small test case from here: http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz Here are some changes I've made to the conf: lazyErrors: true and executionRetries: 0 so that long running jobs are not retried. staging set to direct, since you are running on the shared FS. added worker logging and an app definition for debug. You can get the latest trunk build from here : http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz Thanks, Yadu On 12/03/2014 01:16 PM, Jonathan Ozik wrote: Hi Yadu, The tar.gz archive is here: https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0 I?m also attaching the swift.properties file that I used below. Thank you, Jonathan On Dec 3, 2014, at 11:04 AM, Yadu Nand Babuji > wrote: Hi Jonathan, The issue you are seeing sounds pretty close to what David reported a while back. Could you send us a tar ball of your run directory from a failed run ? Could you also check if you've set lowOverAllocation and highOverAllocation in your sites definition ? Thanks, Yadu On 12/03/2014 10:50 AM, Ozik, Jonathan wrote: Hi all, I?m trying to run a large set of simulations on Midway using Swift 0.95-RC5. 768 of the 2187 tasks completed successfully and then I got the exception: exception @ swift-int.k, line: 530 Caused by: Block task failed: Connection to worker lost org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000] at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133) at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762 Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724 And the process seems to have stopped. What log file would be helpful for diagnosing this? Jonathan _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Dec 5 00:24:19 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 4 Dec 2014 22:24:19 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> Message-ID: <1417760659.32006.1.camel@echo> On Fri, 2014-12-05 at 04:11 +0000, Ozik, Jonathan wrote: > I?ve looked a bit closer into the differences between the different staging options, and chose the ?local? option for now, even though this is probably not the most efficient in terms of creating unnecessarily large amounts of copies of the input files needed for each app invocation. > Speaking of which, in the User Guide > (http://swift-lang.org/guides/trunk/userguide/userguide.html), there > is a section that states ?The wrapper script creates the application > workspace directory; places the input files for that job into the > application workspace directory using either cp or ln -s (depending on > a configuration option)?,? but I couldn?t find any more information on > enabling the symlinking of input files. Is this associated with a > specific type of staging or configuration? I think that only applies to "swift" staging. Symlinking is the default. If you specify a scratch directory (site.x.scratch: "/blabla"), then copying is done. Mihael From jozik at anl.gov Fri Dec 5 10:37:52 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Fri, 5 Dec 2014 16:37:52 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1417760659.32006.1.camel@echo> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> Message-ID: <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> Mihael, Thanks. Is there a document that explains the details of each of the staging options? As reference, when I tried to use the wrapper or swift staging or omitted the staging specifier I got errors. The direct, local, and shared-fs staging didn?t throw errors. For the swift staging I got the error: Could not initialize shared directory on midway_debug exception @ swift-int.k, line: 303 Caused by: Could not find a suitable service/provider for host midway_debug Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException: Could not find a suitable service/provider for host midway_debug k:assign @ swift.k, line: 174 Caused by: Could not initialize shared directory on midway_debug exception @ swift-int.k, line: 303 Caused by: Could not find a suitable service/provider for host midway_debug Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException: Could not find a suitable service/provider for host midway_debug Final status: Fri, 05 Dec 2014 00:38:21+0000 Failed:1 The following errors have occurred: 1. Could not initialize shared directory on midway_debug exception @ swift-int.k, line: 303 Caused by: Could not find a suitable service/provider for host midway_debug Caused by: org.globus.cog.karajan.scheduler.NoSuchResourceException: Could not find a suitable service/provider for host midway_debug Execution failed: Execution completed with errors throw @ swift.k, line: 116 Jonathan On Dec 5, 2014, at 12:24 AM, Mihael Hategan > wrote: On Fri, 2014-12-05 at 04:11 +0000, Ozik, Jonathan wrote: I?ve looked a bit closer into the differences between the different staging options, and chose the ?local? option for now, even though this is probably not the most efficient in terms of creating unnecessarily large amounts of copies of the input files needed for each app invocation. Speaking of which, in the User Guide (http://swift-lang.org/guides/trunk/userguide/userguide.html), there is a section that states ?The wrapper script creates the application workspace directory; places the input files for that job into the application workspace directory using either cp or ln -s (depending on a configuration option)?,? but I couldn?t find any more information on enabling the symlinking of input files. Is this associated with a specific type of staging or configuration? I think that only applies to "swift" staging. Symlinking is the default. If you specify a scratch directory (site.x.scratch: "/blabla"), then copying is done. Mihael _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Dec 5 13:18:41 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 5 Dec 2014 11:18:41 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> Message-ID: <1417807121.5911.3.camel@echo> On Fri, 2014-12-05 at 10:37 -0600, Ozik, Jonathan wrote: > Mihael, > > > Thanks. Is there a document that explains the details of each of the > staging options? http://swift-lang.org/guides/trunk/userguide/userguide.html#table-staging-methods > > > As reference, when I tried to use the wrapper or swift staging or > omitted the staging specifier I got errors. The direct, local, and > shared-fs staging didn?t throw errors. For the swift staging I got the > error: [...] > Caused by: Could not find a suitable service/provider for host > midway_debug Right. For swift staging you need to specify a filesystem type that swift will use to do the actual copying between the swift side and wherever the jobs are submitted to. This is the legacy way of dealing with data in swift. "Wrapper" you might not want to use. We are still working on the finer details of that. Mihael From jozik at anl.gov Fri Dec 5 13:32:08 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Fri, 5 Dec 2014 19:32:08 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1417807121.5911.3.camel@echo> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> Message-ID: <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> Mihael, See my responses below. Jonathan > On Dec 5, 2014, at 1:18 PM, Mihael Hategan wrote: > > On Fri, 2014-12-05 at 10:37 -0600, Ozik, Jonathan wrote: >> Mihael, >> >> >> Thanks. Is there a document that explains the details of each of the >> staging options? > > http://swift-lang.org/guides/trunk/userguide/userguide.html#table-staging-methods Yes I did see the Swift Staging Methods table. Because I didn?t see any reference to the ?direct? staging I was wondering if I?d missed any other document that was out there. But I?m guessing this means no. > >> >> >> As reference, when I tried to use the wrapper or swift staging or >> omitted the staging specifier I got errors. The direct, local, and >> shared-fs staging didn?t throw errors. For the swift staging I got the >> error: > > [...] >> Caused by: Could not find a suitable service/provider for host >> midway_debug > > Right. For swift staging you need to specify a filesystem type that > swift will use to do the actual copying between the swift side and > wherever the jobs are submitted to. This is the legacy way of dealing > with data in swift. > > "Wrapper" you might not want to use. We are still working on the finer > details of that. So for now, for running on a shared file system resource like Midway I should focus on local, direct or shared-fs staging? > > Mihael > From hategan at mcs.anl.gov Fri Dec 5 14:05:38 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 5 Dec 2014 12:05:38 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> Message-ID: <1417809938.6576.4.camel@echo> Hi Jonathan, Inline... On Fri, 2014-12-05 at 13:32 -0600, Ozik, Jonathan wrote: > [...] > > http://swift-lang.org/guides/trunk/userguide/userguide.html#table-staging-methods > Yes I did see the Swift Staging Methods table. Because I didn?t see > any reference to the ?direct? staging I was wondering if I?d missed > any other document that was out there. But I?m guessing this means no. Direct staging (or more accurately the bypassing of staging) is a new feature that we are still testing and refining. > > > [...] > So for now, for running on a shared file system resource like Midway I should focus on local, direct or shared-fs staging? And "swift". This is what you were using with 0.95. Mihael From jozik at anl.gov Fri Dec 5 14:13:23 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Fri, 5 Dec 2014 20:13:23 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1417809938.6576.4.camel@echo> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> <1417809938.6576.4.camel@echo> Message-ID: Great, very helpful. If in my swift.conf I use: staging: swift filesystem: local Will this result in symlinks or cp? Jonathan > On Dec 5, 2014, at 2:05 PM, Mihael Hategan wrote: > > Hi Jonathan, > > Inline... > > On Fri, 2014-12-05 at 13:32 -0600, Ozik, Jonathan wrote: >> [...] >>> http://swift-lang.org/guides/trunk/userguide/userguide.html#table-staging-methods >> Yes I did see the Swift Staging Methods table. Because I didn?t see >> any reference to the ?direct? staging I was wondering if I?d missed >> any other document that was out there. But I?m guessing this means no. > > Direct staging (or more accurately the bypassing of staging) is a new > feature that we are still testing and refining. > >> >>> [...] >> So for now, for running on a shared file system resource like Midway I should focus on local, direct or shared-fs staging? > > And "swift". This is what you were using with 0.95. > > Mihael > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From hategan at mcs.anl.gov Fri Dec 5 14:54:08 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 5 Dec 2014 12:54:08 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> <1417809938.6576.4.camel@echo> Message-ID: <1417812848.7296.6.camel@echo> On Fri, 2014-12-05 at 14:13 -0600, Ozik, Jonathan wrote: > Great, very helpful. > > If in my swift.conf I use: > staging: swift > filesystem: local > > Will this result in symlinks or cp? Symlinks. Well, an initial copy to a "site directory" and then symlink from there. Again, this is what you had with 0.95. My suggestions are based on you getting something working, although perhaps not optimally, but working. And then optimizing I/O if needed with fancier staging schemes. Mihael From jozik at anl.gov Sat Dec 6 12:02:52 2014 From: jozik at anl.gov (Ozik, Jonathan) Date: Sat, 6 Dec 2014 18:02:52 +0000 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <1417812848.7296.6.camel@echo> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> <1417809938.6576.4.camel@echo> <1417812848.7296.6.camel@echo> Message-ID: <77B84461-27D7-4F33-9E9B-69855A2C2F94@anl.gov> All the simulation runs completed successfully. Thanks! Let me know if you?d like to see any of the worker logs. Jonathan > On Dec 5, 2014, at 2:54 PM, Mihael Hategan wrote: > > On Fri, 2014-12-05 at 14:13 -0600, Ozik, Jonathan wrote: >> Great, very helpful. >> >> If in my swift.conf I use: >> staging: swift >> filesystem: local >> >> Will this result in symlinks or cp? > > Symlinks. Well, an initial copy to a "site directory" and then symlink > from there. Again, this is what you had with 0.95. > > My suggestions are based on you getting something working, although > perhaps not optimally, but working. And then optimizing I/O if needed > with fancier staging schemes. > > Mihael > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From hategan at mcs.anl.gov Sat Dec 6 12:26:50 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 6 Dec 2014 10:26:50 -0800 Subject: [Swift-user] Block task failed: Connection to worker lost In-Reply-To: <77B84461-27D7-4F33-9E9B-69855A2C2F94@anl.gov> References: <547F42A4.3090101@uchicago.edu> <040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com> <547FB2E2.7040102@uchicago.edu> <5480968B.7060108@uchicago.edu> <1417719698.15860.3.camel@echo> <5480B4B2.6040407@uchicago.edu> <5480B704.6050707@anl.gov> <4CAC1E7A-0379-439E-8523-21D1B4CA1663@anl.gov> <9129BAF5-6732-4670-9199-46FDAB2619FA@anl.gov> <1E1CB289-1960-45F6-A782-DB0B7E9DE087@anl.gov> <1417760659.32006.1.camel@echo> <865E1ECF-4F8C-4908-99F2-EC13C2E5062B@anl.gov> <1417807121.5911.3.camel@echo> <4B908933-8467-4280-AAAD-5301252DD8FB@anl.gov> <1417809938.6576.4.camel@echo> <1417812848.7296.6.camel@echo> <77B84461-27D7-4F33-9E9B-69855A2C2F94@anl.gov> Message-ID: <1417890410.17415.1.camel@echo> Hi Jonathan, I wouldn't mind the swift log to run it through the log analysis tools. Mihael On Sat, 2014-12-06 at 12:02 -0600, Ozik, Jonathan wrote: > All the simulation runs completed successfully. Thanks! > Let me know if you?d like to see any of the worker logs. > > Jonathan > > > On Dec 5, 2014, at 2:54 PM, Mihael Hategan wrote: > > > > On Fri, 2014-12-05 at 14:13 -0600, Ozik, Jonathan wrote: > >> Great, very helpful. > >> > >> If in my swift.conf I use: > >> staging: swift > >> filesystem: local > >> > >> Will this result in symlinks or cp? > > > > Symlinks. Well, an initial copy to a "site directory" and then symlink > > from there. Again, this is what you had with 0.95. > > > > My suggestions are based on you getting something working, although > > perhaps not optimally, but working. And then optimizing I/O if needed > > with fancier staging schemes. > > > > Mihael > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > From ketan at mcs.anl.gov Wed Dec 10 15:11:36 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 10 Dec 2014 15:11:36 -0600 Subject: [Swift-user] __root__ in filename Message-ID: Hi, In an app that needs input files with same name but different extension, I am trying to use regexp on the filename() of a mapped file. However, Swift adds '__root__' to the path of file which cause the file to be not found in the new location resulting in error as: org.griphyn.vdl.mapping.MissingDataException: File not found for variable 'in_sto': file://localhost/__root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap332_500.sto Relevant code is: file timfiles[]; file in_sto ; Thanks for any workaround suggestions for this, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Dec 10 15:21:53 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Dec 2014 13:21:53 -0800 Subject: [Swift-user] __root__ in filename In-Reply-To: References: Message-ID: <1418246513.21329.2.camel@echo> Hi, In 0.95 this is a bit of a known problem. This should be fixed in trunk. A backport of the fix may be possible and should probably be done. Can you easily switch to trunk? If not, I'll try to do the backport sooner. Mihael On Wed, 2014-12-10 at 15:11 -0600, Ketan Maheshwari wrote: > Hi, > > In an app that needs input files with same name but different extension, I > am trying to use regexp on the filename() of a mapped file. However, Swift > adds '__root__' to the path of file which cause the file to be not found in > the new location resulting in error as: > > org.griphyn.vdl.mapping.MissingDataException: File not found for > variable 'in_sto': > file://localhost/__root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap332_500.sto > > Relevant code is: > > file timfiles[] location="/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap", > pattern="*.tim">; > file in_sto "sto")>; > > Thanks for any workaround suggestions for this, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From wilde at anl.gov Wed Dec 10 15:52:00 2014 From: wilde at anl.gov (Michael Wilde) Date: Wed, 10 Dec 2014 15:52:00 -0600 Subject: [Swift-user] __root__ in filename In-Reply-To: <1418246513.21329.2.camel@echo> References: <1418246513.21329.2.camel@echo> Message-ID: <5488C080.4010604@anl.gov> This example may be helpful: $ cat -n regexpmap.swift 1 type file; 2 3 string fnames[] = [ 4 "/data/dir/000001/f01.dat", 5 "/data/dir/000001/f02.dat", 6 "/data/dir/000002/f03.dat", 7 "/data/dir/000002/f04.dat", 8 "/data/dir/000003/f05.dat", 9 "/data/dir/000003/f06.dat"]; 10 11 file data[]; 12 13 file image[] ; 17 18 iterate i { 19 tracef(" data[%i] = %s\nimage[%i] = %s\n", 20 i, filename(data[i]), 21 i, filename(image[i])); 22 } until (i==6); $ swift regexpmap.swift Swift 0.95 RC5 swift-r7605 cog-r3874 RunID: run067 Progress: Wed, 10 Dec 2014 21:50:42+0000 data[0] = __root__/data/dir/000001/f01.dat image[0] = __root__/data/dir/000001/f01.img data[1] = __root__/data/dir/000001/f02.dat image[1] = __root__/data/dir/000001/f02.img data[2] = __root__/data/dir/000002/f03.dat image[2] = __root__/data/dir/000002/f03.img data[3] = __root__/data/dir/000002/f04.dat image[3] = __root__/data/dir/000002/f04.img data[4] = __root__/data/dir/000003/f05.dat image[4] = __root__/data/dir/000003/f05.img data[5] = __root__/data/dir/000003/f06.dat image[5] = __root__/data/dir/000003/f06.img Final status:Wed, 10 Dec 2014 21:50:42+0000 On 12/10/14 3:21 PM, Mihael Hategan wrote: > Hi, > > In 0.95 this is a bit of a known problem. This should be fixed in trunk. > A backport of the fix may be possible and should probably be done. Can > you easily switch to trunk? If not, I'll try to do the backport sooner. > > Mihael > > On Wed, 2014-12-10 at 15:11 -0600, Ketan Maheshwari wrote: >> Hi, >> >> In an app that needs input files with same name but different extension, I >> am trying to use regexp on the filename() of a mapped file. However, Swift >> adds '__root__' to the path of file which cause the file to be not found in >> the new location resulting in error as: >> >> org.griphyn.vdl.mapping.MissingDataException: File not found for >> variable 'in_sto': >> file://localhost/__root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap332_500.sto >> >> Relevant code is: >> >> file timfiles[]> location="/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap", >> pattern="*.tim">; >> file in_sto > "sto")>; >> >> Thanks for any workaround suggestions for this, >> Ketan >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Wed Dec 10 16:16:41 2014 From: wilde at anl.gov (Michael Wilde) Date: Wed, 10 Dec 2014 16:16:41 -0600 Subject: [Swift-user] __root__ in filename In-Reply-To: <5488C080.4010604@anl.gov> References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> Message-ID: <5488C649.9060807@anl.gov> Here's a few more examples of simpler approaches. These *should* work correctly for both relative and absolute filename mappings, even with the __root__ convention. - Mike $ cat regexpmap1.swift type file; file data[]; file image[] ; foreach j, i in data { tracef(" data[%i] = %s\nimage[%i] = %s\n", i, filename(data[i]), i, filename(image[i])); } $ swift ./regexpmap1.swift Swift 0.95 RC5 swift-r7605 cog-r3874 RunID: run077 Progress: Wed, 10 Dec 2014 22:10:29+0000 data[0] = f1.dat image[0] = f1.img data[3] = f4.dat image[3] = f4.img data[1] = f2.dat image[1] = f2.img data[2] = f3.dat image[2] = f3.img Final status:Wed, 10 Dec 2014 22:10:29+0000 The simplest approach is to just map one array to each file suffix. The filesys mapper will return the files in the same lexicographic order for each pattern: $ cat ./regexpmap3.swift type file; file data[] ; file image[] ; foreach j, i in data { tracef(" data[%i] = %s\nimage[%i] = %s\n", i, filename(data[i]), i, filename(image[i])); } $ swift ./regexpmap3.swift Swift 0.95 RC5 swift-r7605 cog-r3874 RunID: run078 Progress: Wed, 10 Dec 2014 22:10:56+0000 data[2] = f3.dat image[2] = f3.img data[3] = f4.dat image[3] = f4.img data[0] = f1.dat image[0] = f1.img data[1] = f2.dat image[1] = f2.img Final status:Wed, 10 Dec 2014 22:10:56+0000 swift$ On 12/10/14 3:52 PM, Michael Wilde wrote: > This example may be helpful: > > $ cat -n regexpmap.swift > 1 type file; > 2 > 3 string fnames[] = [ > 4 "/data/dir/000001/f01.dat", > 5 "/data/dir/000001/f02.dat", > 6 "/data/dir/000002/f03.dat", > 7 "/data/dir/000002/f04.dat", > 8 "/data/dir/000003/f05.dat", > 9 "/data/dir/000003/f06.dat"]; > 10 > 11 file data[]; > 12 > 13 file image[] 14 source=data, > 15 match="(/data/dir/[0-9][0-9]*/.*?)dat$", > 16 transform="\\1img">; > 17 > 18 iterate i { > 19 tracef(" data[%i] = %s\nimage[%i] = %s\n", > 20 i, filename(data[i]), > 21 i, filename(image[i])); > 22 } until (i==6); > > $ swift regexpmap.swift > > Swift 0.95 RC5 swift-r7605 cog-r3874 > RunID: run067 > Progress: Wed, 10 Dec 2014 21:50:42+0000 > data[0] = __root__/data/dir/000001/f01.dat > image[0] = __root__/data/dir/000001/f01.img > data[1] = __root__/data/dir/000001/f02.dat > image[1] = __root__/data/dir/000001/f02.img > data[2] = __root__/data/dir/000002/f03.dat > image[2] = __root__/data/dir/000002/f03.img > data[3] = __root__/data/dir/000002/f04.dat > image[3] = __root__/data/dir/000002/f04.img > data[4] = __root__/data/dir/000003/f05.dat > image[4] = __root__/data/dir/000003/f05.img > data[5] = __root__/data/dir/000003/f06.dat > image[5] = __root__/data/dir/000003/f06.img > Final status:Wed, 10 Dec 2014 21:50:42+0000 > > > > On 12/10/14 3:21 PM, Mihael Hategan wrote: >> Hi, >> >> In 0.95 this is a bit of a known problem. This should be fixed in trunk. >> A backport of the fix may be possible and should probably be done. Can >> you easily switch to trunk? If not, I'll try to do the backport sooner. >> >> Mihael >> >> On Wed, 2014-12-10 at 15:11 -0600, Ketan Maheshwari wrote: >>> Hi, >>> >>> In an app that needs input files with same name but different extension, I >>> am trying to use regexp on the filename() of a mapped file. However, Swift >>> adds '__root__' to the path of file which cause the file to be not found in >>> the new location resulting in error as: >>> >>> org.griphyn.vdl.mapping.MissingDataException: File not found for >>> variable 'in_sto': >>> file://localhost/__root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap332_500.sto >>> >>> Relevant code is: >>> >>> file timfiles[]>> location="/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap", >>> pattern="*.tim">; >>> file in_sto >> "sto")>; >>> >>> Thanks for any workaround suggestions for this, >>> Ketan >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From ketan at mcs.anl.gov Wed Dec 10 18:52:01 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 10 Dec 2014 18:52:01 -0600 Subject: [Swift-user] env.HOME in sites file not working Message-ID: Hi, Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when trying to set workdirectory to {env.HOME}/swiftwork but works if set with constant path. The error message is: Execution failed: Exception in dsp: Arguments: [-f, __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, -a, 1, -n, 1, -p, 1, -I, 10] Host: localblues Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m exception @ swift-int.k, line: 530 Caused by: null Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 127 Attached are two identical runs: run001.tgz which failed and run002.tgz which worked. How can I make it work so that the Swift workdir us not hardcoded? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run002.tgz Type: application/x-gzip Size: 8656 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run001.tgz Type: application/x-gzip Size: 4036 bytes Desc: not available URL: From ketan at mcs.anl.gov Wed Dec 10 19:04:03 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 10 Dec 2014 19:04:03 -0600 Subject: [Swift-user] command line arg conflict Message-ID: Hi, One of the commandline arg to the Swift application I am working with is "p" which seems to be conflicting with "-p" arg of Swift which seems to be for a properties file: swift -sites.file sites.local.xml -config cf -tc.file apps dsp.blues.swift -a=1 -loc=./dcap -p=2 -I=10 -n=2 Is there a way to override the Swift's default arg? I am using version 0.95. Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Wed Dec 10 19:37:15 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 10 Dec 2014 19:37:15 -0600 Subject: [Swift-user] __root__ in filename In-Reply-To: <5488C649.9060807@anl.gov> References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: Thanks! The separate files mapping approach seems to be working. On Wed, Dec 10, 2014 at 4:16 PM, Michael Wilde wrote: > Here's a few more examples of simpler approaches. These *should* work > correctly for both relative and absolute filename mappings, even with > the __root__ convention. > > - Mike > > $ cat regexpmap1.swift > > type file; > > file data[]; > > file image[] source=data, > match="^(.*)dat$", > transform="\\1img">; > > foreach j, i in data { > tracef(" data[%i] = %s\nimage[%i] = %s\n", > i, filename(data[i]), > i, filename(image[i])); > } > > $ swift ./regexpmap1.swift > Swift 0.95 RC5 swift-r7605 cog-r3874 > RunID: run077 > Progress: Wed, 10 Dec 2014 22:10:29+0000 > data[0] = f1.dat > image[0] = f1.img > data[3] = f4.dat > image[3] = f4.img > data[1] = f2.dat > image[1] = f2.img > data[2] = f3.dat > image[2] = f3.img > Final status:Wed, 10 Dec 2014 22:10:29+0000 > > The simplest approach is to just map one array to each file suffix. The > filesys mapper will return the files in the same lexicographic order for > each pattern: > > $ cat ./regexpmap3.swift > > type file; > > file data[] ; > file image[] ; > > foreach j, i in data { > tracef(" data[%i] = %s\nimage[%i] = %s\n", > i, filename(data[i]), > i, filename(image[i])); > } > > $ swift ./regexpmap3.swift > Swift 0.95 RC5 swift-r7605 cog-r3874 > RunID: run078 > Progress: Wed, 10 Dec 2014 22:10:56+0000 > data[2] = f3.dat > image[2] = f3.img > data[3] = f4.dat > image[3] = f4.img > data[0] = f1.dat > image[0] = f1.img > data[1] = f2.dat > image[1] = f2.img > Final status:Wed, 10 Dec 2014 22:10:56+0000 > swift$ > > On 12/10/14 3:52 PM, Michael Wilde wrote: > > This example may be helpful: > > > > $ cat -n regexpmap.swift > > 1 type file; > > 2 > > 3 string fnames[] = [ > > 4 "/data/dir/000001/f01.dat", > > 5 "/data/dir/000001/f02.dat", > > 6 "/data/dir/000002/f03.dat", > > 7 "/data/dir/000002/f04.dat", > > 8 "/data/dir/000003/f05.dat", > > 9 "/data/dir/000003/f06.dat"]; > > 10 > > 11 file data[]; > > 12 > > 13 file image[] > 14 source=data, > > 15 match="(/data/dir/[0-9][0-9]*/.*?)dat$", > > 16 transform="\\1img">; > > 17 > > 18 iterate i { > > 19 tracef(" data[%i] = %s\nimage[%i] = %s\n", > > 20 i, filename(data[i]), > > 21 i, filename(image[i])); > > 22 } until (i==6); > > > > $ swift regexpmap.swift > > > > Swift 0.95 RC5 swift-r7605 cog-r3874 > > RunID: run067 > > Progress: Wed, 10 Dec 2014 21:50:42+0000 > > data[0] = __root__/data/dir/000001/f01.dat > > image[0] = __root__/data/dir/000001/f01.img > > data[1] = __root__/data/dir/000001/f02.dat > > image[1] = __root__/data/dir/000001/f02.img > > data[2] = __root__/data/dir/000002/f03.dat > > image[2] = __root__/data/dir/000002/f03.img > > data[3] = __root__/data/dir/000002/f04.dat > > image[3] = __root__/data/dir/000002/f04.img > > data[4] = __root__/data/dir/000003/f05.dat > > image[4] = __root__/data/dir/000003/f05.img > > data[5] = __root__/data/dir/000003/f06.dat > > image[5] = __root__/data/dir/000003/f06.img > > Final status:Wed, 10 Dec 2014 21:50:42+0000 > > > > > > > > On 12/10/14 3:21 PM, Mihael Hategan wrote: > >> Hi, > >> > >> In 0.95 this is a bit of a known problem. This should be fixed in trunk. > >> A backport of the fix may be possible and should probably be done. Can > >> you easily switch to trunk? If not, I'll try to do the backport sooner. > >> > >> Mihael > >> > >> On Wed, 2014-12-10 at 15:11 -0600, Ketan Maheshwari wrote: > >>> Hi, > >>> > >>> In an app that needs input files with same name but different > extension, I > >>> am trying to use regexp on the filename() of a mapped file. However, > Swift > >>> adds '__root__' to the path of file which cause the file to be not > found in > >>> the new location resulting in error as: > >>> > >>> org.griphyn.vdl.mapping.MissingDataException: File not found for > >>> variable 'in_sto': > >>> > file://localhost/__root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap332_500.sto > >>> > >>> Relevant code is: > >>> > >>> file timfiles[] >>> location="/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap", > >>> pattern="*.tim">; > >>> file in_sto "tim", > >>> "sto")>; > >>> > >>> Thanks for any workaround suggestions for this, > >>> Ketan > >>> _______________________________________________ > >>> Swift-user mailing list > >>> Swift-user at ci.uchicago.edu > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Dec 10 20:41:49 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Dec 2014 18:41:49 -0800 Subject: [Swift-user] command line arg conflict In-Reply-To: References: Message-ID: <1418265709.23614.0.camel@echo> This looks like a bug. Can you put this in bugzilla please? Mihael On Wed, 2014-12-10 at 19:04 -0600, Ketan Maheshwari wrote: > Hi, > > One of the commandline arg to the Swift application I am working with is > "p" which seems to be conflicting with "-p" arg of Swift which seems to be > for a properties file: > > swift -sites.file sites.local.xml -config cf -tc.file apps dsp.blues.swift > -a=1 -loc=./dcap -p=2 -I=10 -n=2 > > Is there a way to override the Swift's default arg? > > I am using version 0.95. > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From hategan at mcs.anl.gov Wed Dec 10 20:49:20 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 10 Dec 2014 18:49:20 -0800 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: References: Message-ID: <1418266160.23767.1.camel@echo> Variable substitution was done only in attributes but not in xml text nodes. This should now be fixed in 0.95 swift r8324. Mihael On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > Hi, > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when trying to > set workdirectory to {env.HOME}/swiftwork but works if set with constant > path. > > The error message is: > > Execution failed: > Exception in dsp: > Arguments: [-f, > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > -a, 1, -n, 1, -p, 1, -I, 10] > Host: localblues > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > exception @ swift-int.k, line: 530 > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 127 > > Attached are two identical runs: run001.tgz which failed and run002.tgz > which worked. > > How can I make it work so that the Swift workdir us not hardcoded? > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Thu Dec 11 11:59:05 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Thu, 11 Dec 2014 11:59:05 -0600 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: <1418266160.23767.1.camel@echo> References: <1418266160.23767.1.camel@echo> Message-ID: Mihael, I updated Swift to: Swift 0.95 swift-r8326 cog-r4045 It still does not seem to work. The rundir is attached. Thanks, Ketan On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan wrote: > > Variable substitution was done only in attributes but not in xml text > nodes. This should now be fixed in 0.95 swift r8324. > > Mihael > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > Hi, > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when trying to > > set workdirectory to {env.HOME}/swiftwork but works if set with constant > > path. > > > > The error message is: > > > > Execution failed: > > Exception in dsp: > > Arguments: [-f, > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > -a, 1, -n, 1, -p, 1, -I, 10] > > Host: localblues > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > exception @ swift-int.k, line: 530 > > Caused by: null > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with an exit code of 127 > > > > Attached are two identical runs: run001.tgz which failed and run002.tgz > > which worked. > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > Thanks, > > Ketan > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run001.tgz Type: application/x-gzip Size: 4043 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu Dec 11 12:08:01 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2014 10:08:01 -0800 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: References: <1418266160.23767.1.camel@echo> Message-ID: <1418321281.650.1.camel@echo> Right. My bad. I don't think {env.XYZ} is something that is supported in sites files. Substitution works only for java system properties (e.g. {user.home}). Where did you see {env.XYZ}? Mihael On Thu, 2014-12-11 at 11:59 -0600, Ketan Maheshwari wrote: > Mihael, > > I updated Swift to: > Swift 0.95 swift-r8326 cog-r4045 > > It still does not seem to work. The rundir is attached. > > Thanks, > Ketan > > On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan wrote: > > > > Variable substitution was done only in attributes but not in xml text > > nodes. This should now be fixed in 0.95 swift r8324. > > > > Mihael > > > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > > Hi, > > > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when trying to > > > set workdirectory to {env.HOME}/swiftwork but works if set with constant > > > path. > > > > > > The error message is: > > > > > > Execution failed: > > > Exception in dsp: > > > Arguments: [-f, > > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > > -a, 1, -n, 1, -p, 1, -I, 10] > > > Host: localblues > > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > > exception @ swift-int.k, line: 530 > > > Caused by: null > > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with an exit code of 127 > > > > > > Attached are two identical runs: run001.tgz which failed and run002.tgz > > > which worked. > > > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > > > Thanks, > > > Ketan > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Thu Dec 11 12:17:28 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2014 10:17:28 -0800 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: <1418321281.650.1.camel@echo> References: <1418266160.23767.1.camel@echo> <1418321281.650.1.camel@echo> Message-ID: <1418321848.915.0.camel@echo> That said, if you replace {env.HOME} with {user.home}, it should work. Mihael On Thu, 2014-12-11 at 10:08 -0800, Mihael Hategan wrote: > Right. My bad. I don't think {env.XYZ} is something that is supported in > sites files. Substitution works only for java system properties (e.g. > {user.home}). Where did you see {env.XYZ}? > > Mihael > > On Thu, 2014-12-11 at 11:59 -0600, Ketan Maheshwari wrote: > > Mihael, > > > > I updated Swift to: > > Swift 0.95 swift-r8326 cog-r4045 > > > > It still does not seem to work. The rundir is attached. > > > > Thanks, > > Ketan > > > > On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan wrote: > > > > > > Variable substitution was done only in attributes but not in xml text > > > nodes. This should now be fixed in 0.95 swift r8324. > > > > > > Mihael > > > > > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > > > Hi, > > > > > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when trying to > > > > set workdirectory to {env.HOME}/swiftwork but works if set with constant > > > > path. > > > > > > > > The error message is: > > > > > > > > Execution failed: > > > > Exception in dsp: > > > > Arguments: [-f, > > > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > > > -a, 1, -n, 1, -p, 1, -I, 10] > > > > Host: localblues > > > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > > > exception @ swift-int.k, line: 530 > > > > Caused by: null > > > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > > > > Job failed with an exit code of 127 > > > > > > > > Attached are two identical runs: run001.tgz which failed and run002.tgz > > > > which worked. > > > > > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > > > > > Thanks, > > > > Ketan > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Thu Dec 11 12:31:28 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Thu, 11 Dec 2014 12:31:28 -0600 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: <1418321848.915.0.camel@echo> References: <1418266160.23767.1.camel@echo> <1418321281.650.1.camel@echo> <1418321848.915.0.camel@echo> Message-ID: Thanks Mihael! I will use {user.home}, I remember to have seen {env.HOME} being used in sites file in the past. A quick look in the SwiftApps svn repo yields this: $ find . -iname "*.xml" -exec grep 'env.HOME' {} \; -print {env.HOME}/swiftwork ./Scattering/paintgrid/beagle.xml {env.HOME}/swiftwork {env.HOME}/swiftwork ./Scattering/paintgrid/orthros.xml {env.HOME}/swiftwork {env.HOME}/swiftwork ./Scattering/paintgrid/sites.xml Thanks, Ketan On Thu, Dec 11, 2014 at 12:17 PM, Mihael Hategan wrote: > > That said, if you replace {env.HOME} with {user.home}, it should work. > > Mihael > > On Thu, 2014-12-11 at 10:08 -0800, Mihael Hategan wrote: > > Right. My bad. I don't think {env.XYZ} is something that is supported in > > sites files. Substitution works only for java system properties (e.g. > > {user.home}). Where did you see {env.XYZ}? > > > > Mihael > > > > On Thu, 2014-12-11 at 11:59 -0600, Ketan Maheshwari wrote: > > > Mihael, > > > > > > I updated Swift to: > > > Swift 0.95 swift-r8326 cog-r4045 > > > > > > It still does not seem to work. The rundir is attached. > > > > > > Thanks, > > > Ketan > > > > > > On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan > wrote: > > > > > > > > Variable substitution was done only in attributes but not in xml text > > > > nodes. This should now be fixed in 0.95 swift r8324. > > > > > > > > Mihael > > > > > > > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > > > > Hi, > > > > > > > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when > trying to > > > > > set workdirectory to {env.HOME}/swiftwork but works if set with > constant > > > > > path. > > > > > > > > > > The error message is: > > > > > > > > > > Execution failed: > > > > > Exception in dsp: > > > > > Arguments: [-f, > > > > > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > > > > -a, 1, -n, 1, -p, 1, -I, 10] > > > > > Host: localblues > > > > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > > > > exception @ swift-int.k, line: 530 > > > > > Caused by: null > > > > > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > > > > > Job failed with an exit code of 127 > > > > > > > > > > Attached are two identical runs: run001.tgz which failed and > run002.tgz > > > > > which worked. > > > > > > > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > > > > > > > Thanks, > > > > > Ketan > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Dec 11 13:16:25 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2014 11:16:25 -0800 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: References: <1418266160.23767.1.camel@echo> <1418321281.650.1.camel@echo> <1418321848.915.0.camel@echo> Message-ID: <1418325385.1554.1.camel@echo> Yes, I see it. 0.94 used to set all environment variables as java properties prefixed with "env.". This behavior can probably be replicated in 0.95. Mihael On Thu, 2014-12-11 at 12:31 -0600, Ketan Maheshwari wrote: > Thanks Mihael! I will use {user.home}, I remember to have seen {env.HOME} > being used in sites file in the past. A quick look in the SwiftApps svn > repo yields this: > > $ find . -iname "*.xml" -exec grep 'env.HOME' {} \; -print > {env.HOME}/swiftwork > ./Scattering/paintgrid/beagle.xml > {env.HOME}/swiftwork > {env.HOME}/swiftwork > ./Scattering/paintgrid/orthros.xml > {env.HOME}/swiftwork > {env.HOME}/swiftwork > ./Scattering/paintgrid/sites.xml > > Thanks, > Ketan > > On Thu, Dec 11, 2014 at 12:17 PM, Mihael Hategan > wrote: > > > > That said, if you replace {env.HOME} with {user.home}, it should work. > > > > Mihael > > > > On Thu, 2014-12-11 at 10:08 -0800, Mihael Hategan wrote: > > > Right. My bad. I don't think {env.XYZ} is something that is supported in > > > sites files. Substitution works only for java system properties (e.g. > > > {user.home}). Where did you see {env.XYZ}? > > > > > > Mihael > > > > > > On Thu, 2014-12-11 at 11:59 -0600, Ketan Maheshwari wrote: > > > > Mihael, > > > > > > > > I updated Swift to: > > > > Swift 0.95 swift-r8326 cog-r4045 > > > > > > > > It still does not seem to work. The rundir is attached. > > > > > > > > Thanks, > > > > Ketan > > > > > > > > On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan > > wrote: > > > > > > > > > > Variable substitution was done only in attributes but not in xml text > > > > > nodes. This should now be fixed in 0.95 swift r8324. > > > > > > > > > > Mihael > > > > > > > > > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > > > > > Hi, > > > > > > > > > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when > > trying to > > > > > > set workdirectory to {env.HOME}/swiftwork but works if set with > > constant > > > > > > path. > > > > > > > > > > > > The error message is: > > > > > > > > > > > > Execution failed: > > > > > > Exception in dsp: > > > > > > Arguments: [-f, > > > > > > > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > > > > > -a, 1, -n, 1, -p, 1, -I, 10] > > > > > > Host: localblues > > > > > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > > > > > exception @ swift-int.k, line: 530 > > > > > > Caused by: null > > > > > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > > > > Job failed with an exit code of 127 > > > > > > > > > > > > Attached are two identical runs: run001.tgz which failed and > > run002.tgz > > > > > > which worked. > > > > > > > > > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > > > > > > > > > Thanks, > > > > > > Ketan > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Thu Dec 11 13:26:44 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2014 11:26:44 -0800 Subject: [Swift-user] env.HOME in sites file not working In-Reply-To: <1418325385.1554.1.camel@echo> References: <1418266160.23767.1.camel@echo> <1418321281.650.1.camel@echo> <1418321848.915.0.camel@echo> <1418325385.1554.1.camel@echo> Message-ID: <1418326004.1554.4.camel@echo> Fixed in swift r8329 (0.95 branch). Mihael On Thu, 2014-12-11 at 11:16 -0800, Mihael Hategan wrote: > Yes, I see it. 0.94 used to set all environment variables as java > properties prefixed with "env.". This behavior can probably be > replicated in 0.95. > > Mihael > > On Thu, 2014-12-11 at 12:31 -0600, Ketan Maheshwari wrote: > > Thanks Mihael! I will use {user.home}, I remember to have seen {env.HOME} > > being used in sites file in the past. A quick look in the SwiftApps svn > > repo yields this: > > > > $ find . -iname "*.xml" -exec grep 'env.HOME' {} \; -print > > {env.HOME}/swiftwork > > ./Scattering/paintgrid/beagle.xml > > {env.HOME}/swiftwork > > {env.HOME}/swiftwork > > ./Scattering/paintgrid/orthros.xml > > {env.HOME}/swiftwork > > {env.HOME}/swiftwork > > ./Scattering/paintgrid/sites.xml > > > > Thanks, > > Ketan > > > > On Thu, Dec 11, 2014 at 12:17 PM, Mihael Hategan > > wrote: > > > > > > That said, if you replace {env.HOME} with {user.home}, it should work. > > > > > > Mihael > > > > > > On Thu, 2014-12-11 at 10:08 -0800, Mihael Hategan wrote: > > > > Right. My bad. I don't think {env.XYZ} is something that is supported in > > > > sites files. Substitution works only for java system properties (e.g. > > > > {user.home}). Where did you see {env.XYZ}? > > > > > > > > Mihael > > > > > > > > On Thu, 2014-12-11 at 11:59 -0600, Ketan Maheshwari wrote: > > > > > Mihael, > > > > > > > > > > I updated Swift to: > > > > > Swift 0.95 swift-r8326 cog-r4045 > > > > > > > > > > It still does not seem to work. The rundir is attached. > > > > > > > > > > Thanks, > > > > > Ketan > > > > > > > > > > On Wed, Dec 10, 2014 at 8:49 PM, Mihael Hategan > > > wrote: > > > > > > > > > > > > Variable substitution was done only in attributes but not in xml text > > > > > > nodes. This should now be fixed in 0.95 swift r8324. > > > > > > > > > > > > Mihael > > > > > > > > > > > > On Wed, 2014-12-10 at 18:52 -0600, Ketan Maheshwari wrote: > > > > > > > Hi, > > > > > > > > > > > > > > Running 0.95 swift-r8301 cog-r4045, the Swift run crashes when > > > trying to > > > > > > > set workdirectory to {env.HOME}/swiftwork but works if set with > > > constant > > > > > > > path. > > > > > > > > > > > > > > The error message is: > > > > > > > > > > > > > > Execution failed: > > > > > > > Exception in dsp: > > > > > > > Arguments: [-f, > > > > > > > > > > __root__/lcrc/project/NEXTGENOPT/DSP_old/examples/smps/dcap/dcap243_200, > > > > > > > -a, 1, -n, 1, -p, 1, -I, 10] > > > > > > > Host: localblues > > > > > > > Directory: dsp.blues-run001/jobs/j/dsp-j4e17i1m > > > > > > > exception @ swift-int.k, line: 530 > > > > > > > Caused by: null > > > > > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > > > > > Job failed with an exit code of 127 > > > > > > > > > > > > > > Attached are two identical runs: run001.tgz which failed and > > > run002.tgz > > > > > > > which worked. > > > > > > > > > > > > > > How can I make it work so that the Swift workdir us not hardcoded? > > > > > > > > > > > > > > Thanks, > > > > > > > Ketan > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From iraicu at cs.iit.edu Fri Dec 12 12:51:30 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 12 Dec 2014 12:51:30 -0600 Subject: [Swift-user] CFP: The 24th Int. ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC) 2015 -- Abstracts due 01/12/15 Message-ID: <548B3932.5000503@cs.iit.edu> **** CALL FOR PAPERS **** The 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC-2015) Portland, Oregon, USA - June 15-19, 2015 http://www.hpdc.org/2015 The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) is the premier annual conference for presenting the latest research on the design, implementation, evaluation, and the use of parallel and distributed systems for high-end computing. The 24th HPDC will take place in the city of roses, Portland, Oregon on June 15-19, 2015. (Workshops on June 15-16, and the main conference on June 17-19.) **** IMPORTANT DATES **** Abstracts (required) due: January 12, 2015 Full Papers due: January 19, 2015 (no extensions) Author rebuttal period: March 4-7, 2015 Author notifications: March 16, 2015 Final Manuscripts: April 1, 2015 **** SCOPE AND TOPICS **** Submissions are welcomed on high-performance parallel and distributed computing topics including but not limited to: clouds, clusters, grids, big data, massively multicore, and global-scale computing systems. Submissions that focus on the architectures, systems, and networks of cloud infrastructures are particularly encouraged, as are experience reports of operational deployments that can provide insights for future research on HPDC applications and systems. All papers will be evaluated for their originality, technical depth and correctness, potential impact, relevance to the conference, and quality of presentation. Research papers must clearly demonstrate research contributions and novelty, while experience reports must clearly describe lessons learned and demonstrate impact. In the context of high-performance parallel and distributed computing, the topics of interest include, but are not limited to: - Systems, networks, and architectures - Massively multicore systems - Resource virtualization - Programming languages and environments - File and storage systems, I/O, and data management - Resource management and scheduling, including energy-aware techniques - Performance modeling and analysis - Fault tolerance, reliability, and availability - Data-intensive computing - Applications and services that depend upon high-end computing **** PAPER SUBMISSION GUIDELINES **** Authors are invited to submit technical papers of at most 12 pages in PDF format, including figures and references. Papers should be formatted in the ACM Proceedings Style and submitted via the conference web site. No changes to the margins, spacing, or font sizes as specified by the style file are allowed. Accepted papers will appear in the conference proceedings, and will be incorporated into the ACM Digital Library. A limited number of papers will be accepted as posters. Papers must be self-contained and provide the technical substance required for the program committee to evaluate their contributions. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. See the ACM Prior Publication Policy for more details. Papers can be submitted at https://ssl.linklings.net/conferences/hpdc/. **** HPDC'15 GENERAL CHAIR **** Thilo Kielmann, VU University Amsterdam, The Netherlands **** HPDC'15 PROGRAM CO-CHAIRS **** Dean Hildebrand, IBM Research Almaden, USA Michela Taufer, University of Delaware, USA **** HPDC'15 WORKSHOP CHAIRS **** Abhishek Chandra, University of Minnesota, Twin Cities, USA Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA **** HPDC'15 POSTERS CHAIR **** Ana-Maria Oprescu, VU University Amsterdam, The Netherlands **** HPDC'15 PUBLICITY CHAIRS **** Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA Torsten Hoefler, ETH Zurich, Switzerland Naoya Maruyama, RIKEN Advanced Institute for Computational Science, Japan **** HPDC'15 PUBLICATIONS CHAIR **** Antonino Tumeo, Pacific Northwest National Laboratory, USA **** HPDC'15 TRAVEL AWARD CHAIR **** Ming Zhao, Florida International University, USA **** HPDC'15 SPONSORSHIP CHAIR **** Martin Swany, Indiana University, USA **** HPDC'15 WEBMASTER **** Kaveh Razavi, VU University Amsterdam, The Netherlands **** HPDC'15 PROGRAM COMMITTEE **** David Abramson, The University of Queensland, Australia Dong Ahn, Lawrence Livermore National Laboratory, USA Gabriel Antoniu, INRIA, France Henri Bal, VU University Amsterdam, The Netherlands Pavan Balaji, Argonne National Laboratory, USA Michela Becchi, University of Missouri, USA John Bent, EMC, USA Greg Bronevetsky, Lawrence Livermore National Laboratory, USA Ali Butt, Virginia Tech, USA Franck Cappello, Argonne National Lab, USA Abhishek Chandra, University of Minnesota, USA Andrew A. Chien, University of Chicago, USA Paolo Costa, Microsoft Research Cambridge, UK Kei Davis, Los Alamos National Laboratory, USA Peter Dinda, Northwestern University, USA Dick Epema, Delft and Eindhoven University of Technology, The Netherlands Gilles Fedak, INRIA, France Wuchun Feng, Virginia Tech, USA Renato Figueiredo, University of Florida, USA Clemens Grelck, University of Amsterdam, The Netherlands Adriana Iamnitchi, University of South Florida, USA Larry Kaplan, Cray Inc., USA Kate Keahey, Argonne National Laboratory, USA Dries Kimpe, Argonne National Laboratory, USA Alice Koniges, Lawrence Berkeley National Laboratory, USA Zhiling Lan, Illinois Institute of Technology, USA John (Jack) Lange, University of Pittsburgh, USA Gary Liu, Oak Ridge National Laboratory, USA Jay Lofstead, Sandia National Laboratories, USA Arthur Barney Maccabe, Oak Ridge National Laboratory, USA Carlos Maltzahn, University of California, Santa Cruz, USA Naoya Maruyama, RIKEN Advanced Institute for Comp. Science, Japan Satoshi Matsuoka, Tokyo Inst. Technology, Japan Timothy Mattson, Intel, USA Kathryn Mohror, Lawrence Livermore National Laboratory, USA Bogdan Nicolae, IBM Research, Ireland Sangmi Pallickara, Colorado State University, USA Manish Parashar, Rutgers University, USA Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA Raju Rangaswami, Florida International University, USA Matei Ripeanu, University of British Columbia, Canada Nagiza F. Samatova, North Carolina State University, USA Prasenjit Sarkar, Independent Consultant, USA Karsten Schwan, Georgia Institute of Technology, USA Vasily Tarasov, IBM Research, USA Kenjiro Taura, University of Tokyo, Japan Douglas Thain, University of Notre Dame, USA Ana Varbanescu, University of Amsterdam, The Netherlands Richard Vuduc, Georgia Institute of Technology, USA Jon Weissman, University of Minnesota, USA Dongyan Xu, Purdue University, USA Rui Zhang, IBM Research, USA **** HPDC STEERING COMMITTEE **** Franck Cappello, Argonne National Lab, USA and INRIA, France Andrew A. Chien, University of Chicago, USA Peter Dinda, Northwestern University, USA Dick Epema, Delft and Eindhoven University of Technology, The Netherlands Renato Figueiredo, University of Florida, USA Salim Hariri, University of Arizona, USA Thilo Kielmann, VU University Amsterdam, The Netherlands Arthur "Barney" Maccabe, Oak Ridge National Laboratory, USA Manish Parashar, Rutgers University, USA Matei Ripeanu, University of British Columbia, Canada Karsten Schwan, Georgia Tech, USA Doug Thain, University of Notre Dame, USA Jon Weissman, University of Minnesota, USA (Chair) Dongyan Xu, Purdue University, USA -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= From ketan at mcs.anl.gov Fri Dec 12 16:55:31 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 12 Dec 2014 16:55:31 -0600 Subject: [Swift-user] __root__ in filename In-Reply-To: <5488C649.9060807@anl.gov> References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: > > The simplest approach is to just map one array to each file suffix. The > filesys mapper will return the files in the same lexicographic order for > each pattern: > > $ cat ./regexpmap3.swift > > type file; > > file data[] ; > file image[] ; > > foreach j, i in data { > tracef(" data[%i] = %s\nimage[%i] = %s\n", > i, filename(data[i]), > i, filename(image[i])); > } > > $ swift ./regexpmap3.swift > Swift 0.95 RC5 swift-r7605 cog-r3874 > RunID: run078 > Progress: Wed, 10 Dec 2014 22:10:56+0000 > data[2] = f3.dat > image[2] = f3.img > data[3] = f4.dat > image[3] = f4.img > data[0] = f1.dat > image[0] = f1.img > data[1] = f2.dat > image[1] = f2.img > Final status:Wed, 10 Dec 2014 22:10:56+0000 > > After some more experiments it was determined that this does not hold true. That is, independent filesys_mappers on a given directory will not pick the files in the same order resulting in irregular combinations in the resulting tuple. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Dec 12 18:02:50 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 12 Dec 2014 16:02:50 -0800 Subject: [Swift-user] __root__ in filename In-Reply-To: References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: <1418428970.675.1.camel@echo> On Fri, 2014-12-12 at 16:55 -0600, Ketan Maheshwari wrote: > > > After some more experiments it was determined that this does not hold true. > That is, independent filesys_mappers on a given directory will not pick the > files in the same order resulting in irregular combinations in the > resulting tuple. So it really needs filename to be context aware. I'll let you know when I have a fix. Mihael From hategan at mcs.anl.gov Sat Dec 13 15:54:43 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 13 Dec 2014 13:54:43 -0800 Subject: [Swift-user] __root__ in filename In-Reply-To: References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: <1418507683.11981.3.camel@echo> Oh, so I think Mike was saying an entirely different thing. You don't use two separate filesys mappers. That won't work. You use one filesys mapper for the initial data and then a structured regexp mapper, which preserves the order. That has to work, and it is a feature supported since swift version zero. In other news, filename(), if not invoked from an app body, should now avoid doing a remote pathname expansion, so your initial scheme should work. However, I would still recommend Mike's solution. Mihael On Fri, 2014-12-12 at 16:55 -0600, Ketan Maheshwari wrote: > > > > The simplest approach is to just map one array to each file suffix. The > > filesys mapper will return the files in the same lexicographic order for > > each pattern: > > > > $ cat ./regexpmap3.swift > > > > type file; > > > > file data[] ; > > file image[] ; > > > > foreach j, i in data { > > tracef(" data[%i] = %s\nimage[%i] = %s\n", > > i, filename(data[i]), > > i, filename(image[i])); > > } > > > > $ swift ./regexpmap3.swift > > Swift 0.95 RC5 swift-r7605 cog-r3874 > > RunID: run078 > > Progress: Wed, 10 Dec 2014 22:10:56+0000 > > data[2] = f3.dat > > image[2] = f3.img > > data[3] = f4.dat > > image[3] = f4.img > > data[0] = f1.dat > > image[0] = f1.img > > data[1] = f2.dat > > image[1] = f2.img > > Final status:Wed, 10 Dec 2014 22:10:56+0000 > > > > > After some more experiments it was determined that this does not hold true. > That is, independent filesys_mappers on a given directory will not pick the > files in the same order resulting in irregular combinations in the > resulting tuple. > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Sat Dec 13 17:59:50 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Sat, 13 Dec 2014 17:59:50 -0600 Subject: [Swift-user] __root__ in filename In-Reply-To: References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: On Sat, Dec 13, 2014 at 3:54 PM, Hategan-Marandiuc, Philip M. < hategan at mcs.anl.gov> wrote: > > Oh, so I think Mike was saying an entirely different thing. You don't > use two separate filesys mappers. That won't work. > Right. We assumed this should work. And Mike tested this with a small dataset. Later on we found that it was working by chance and not by design. The previous message was to update the discussion with this finding. > > You use one filesys mapper for the initial data and then a structured > regexp mapper, which preserves the order. That has to work, and it is a > feature supported since swift version zero. > > In other news, filename(), if not invoked from an app body, should now > avoid doing a remote pathname expansion, so your initial scheme should > work. However, I would still recommend Mike's solution. > Thanks! I am using the structured regexp mapper. Seems to be sufficient for the needs of this app. > > Mihael > > On Fri, 2014-12-12 at 16:55 -0600, Ketan Maheshwari wrote: > > > > > > The simplest approach is to just map one array to each file suffix. The > > > filesys mapper will return the files in the same lexicographic order > for > > > each pattern: > > > > > > $ cat ./regexpmap3.swift > > > > > > type file; > > > > > > file data[] ; > > > file image[] ; > > > > > > foreach j, i in data { > > > tracef(" data[%i] = %s\nimage[%i] = %s\n", > > > i, filename(data[i]), > > > i, filename(image[i])); > > > } > > > > > > $ swift ./regexpmap3.swift > > > Swift 0.95 RC5 swift-r7605 cog-r3874 > > > RunID: run078 > > > Progress: Wed, 10 Dec 2014 22:10:56+0000 > > > data[2] = f3.dat > > > image[2] = f3.img > > > data[3] = f4.dat > > > image[3] = f4.img > > > data[0] = f1.dat > > > image[0] = f1.img > > > data[1] = f2.dat > > > image[1] = f2.img > > > Final status:Wed, 10 Dec 2014 22:10:56+0000 > > > > > > > > After some more experiments it was determined that this does not hold > true. > > That is, independent filesys_mappers on a given directory will not pick > the > > files in the same order resulting in irregular combinations in the > > resulting tuple. > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sat Dec 13 18:16:26 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 13 Dec 2014 16:16:26 -0800 Subject: [Swift-user] __root__ in filename In-Reply-To: References: <1418246513.21329.2.camel@echo> <5488C080.4010604@anl.gov> <5488C649.9060807@anl.gov> Message-ID: <1418516186.15544.1.camel@echo> On Sat, 2014-12-13 at 17:59 -0600, Ketan Maheshwari wrote: > On Sat, Dec 13, 2014 at 3:54 PM, Hategan-Marandiuc, Philip M. < > hategan at mcs.anl.gov> wrote: > > > > Oh, so I think Mike was saying an entirely different thing. You don't > > use two separate filesys mappers. That won't work. > > > > Right. We assumed this should work. And Mike tested this with a small > dataset. Later on we found that it was working by chance and not by design. > The previous message was to update the discussion with this finding. > Ah, sorry. I was missing some context. Mihael From ketan at mcs.anl.gov Wed Dec 17 14:26:40 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 17 Dec 2014 14:26:40 -0600 Subject: [Swift-user] output file array Message-ID: Hi, I am dealing with a workflow pattern where an app expects multiple output files with a pattern. The app signature is: app (file[] _wrfout, file _out, file _err) wrf_app (file _wrf_in, file[] _tbl, file[] _ozone, ...) { wrf stdout=@_out stderr=@_err; } The _wrfout files are the app result files which follows a pattern: wrfout_* So, I am invoking the application in a foreach loop as: foreach i in [0:2]{ file[] wrfout; file wrfstdout; file wrfstderr; (wrfout, wrfstdout, wrfstderr) = wrf_app (wrfin, tbl, ozone, tr, data, gribmap, namelist, co2_trans, input_sounding); } The script hangs at runtime with the following messages: No events in 1s. Finding dependency loops... Waiting threads: Thread: R-6-0-4, waiting on wrfout (declared on line 50) swift:stageOut, wf.edison, line 134 swift:execute, wf.edison, line 123 wrf_app, wf.edison, line 242 Thread: R-6-2-4, waiting on wrfout (declared on line 50) swift:stageOut, wf.edison, line 134 swift:execute, wf.edison, line 123 wrf_app, wf.edison, line 242 Thread: R-6-1-4, waiting on wrfout (declared on line 50) swift:stageOut, wf.edison, line 134 swift:execute, wf.edison, line 123 wrf_app, wf.edison, line 242 Any suggestions? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Thu Dec 18 09:00:36 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Thu, 18 Dec 2014 09:00:36 -0600 Subject: [Swift-user] cray (Edison) Message-ID: Hi, I am trying to submit to a cray machine (Edison) with 24 cores per node. I am looking to submit a 25 node 600 tasks job but with my sites configuration, it results in a 25 node 25 tasks submission. The sites file bits are: 24 pbs.aprun;pbs.mpp;depth=24 1000 00:30:00 1 25 25 The resulting job is: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - --------- 2204800.edique02 ketan debug B1218-5106290-0 -- 25 25 -- 00:16:00 C -- The resulting submit script is: #PBS -S /bin/bash #PBS -N B1218-5106290-0 #PBS -m n #PBS -l mppwidth=25,mppnppn=1,mppdepth=24 #PBS -l walltime=00:16:00 #PBS -q debug #PBS -o /scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.stdout #PBS -e /scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.stderr export WORKER_LOGGING_LEVEL=NONE #PBS -v WORKER_LOGGING_LEVEL cd / && aprun -n 25 -N 1 -cc none -d 24 -F exclusive /bin/sh -c '/usr/bin/perl /global/homes/k/ketan/.globus/coasters/ cscript6966010727500767046.pl http://10.10.20.170:58984, http://10.100.100.52:58984,http://10.141.1.2:58984,http://127.0.0.2:58984, http://128.55.34.2:58984,http://128.55.72.100:58984, http://128.55.72.22:58984 1218-5106290-000000 NOLOGGING' /bin/echo $? >/scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.exitcode I am looking to get mppwidth and -n switch of aprun to 600 Thanks for any suggestions. Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Dec 21 14:40:31 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 21 Dec 2014 12:40:31 -0800 Subject: [Swift-user] output file array In-Reply-To: References: Message-ID: <1419194431.20752.1.camel@echo> Hi Ketan, Sorry for the delay. Is this trunk or 0.95? Mihael On Wed, 2014-12-17 at 14:26 -0600, Ketan Maheshwari wrote: > Hi, > > I am dealing with a workflow pattern where an app expects multiple output > files with a pattern. > > The app signature is: > > app (file[] _wrfout, file _out, file _err) wrf_app (file _wrf_in, file[] > _tbl, file[] _ozone, ...) > { > wrf stdout=@_out stderr=@_err; > } > > The _wrfout files are the app result files which follows a pattern: wrfout_* > > So, I am invoking the application in a foreach loop as: > > foreach i in [0:2]{ > file[] wrfout pattern="wrfout_*">; > file wrfstdout; > file wrfstderr; > > (wrfout, wrfstdout, wrfstderr) = wrf_app (wrfin, tbl, ozone, tr, data, > gribmap, namelist, co2_trans, input_sounding); > } > > The script hangs at runtime with the following messages: > > No events in 1s. > Finding dependency loops... > > Waiting threads: > Thread: R-6-0-4, waiting on wrfout (declared on line 50) > swift:stageOut, wf.edison, line 134 > swift:execute, wf.edison, line 123 > wrf_app, wf.edison, line 242 > > Thread: R-6-2-4, waiting on wrfout (declared on line 50) > swift:stageOut, wf.edison, line 134 > swift:execute, wf.edison, line 123 > wrf_app, wf.edison, line 242 > > Thread: R-6-1-4, waiting on wrfout (declared on line 50) > swift:stageOut, wf.edison, line 134 > swift:execute, wf.edison, line 123 > wrf_app, wf.edison, line 242 > > Any suggestions? > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Mon Dec 22 15:37:59 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 22 Dec 2014 15:37:59 -0600 Subject: [Swift-user] output file array In-Reply-To: References: Message-ID: Hi Mihael, This is with Swift 0.95. Thanks, Ketan On Sun, Dec 21, 2014 at 2:40 PM, Hategan-Marandiuc, Philip M. < hategan at mcs.anl.gov> wrote: > Hi Ketan, > > Sorry for the delay. Is this trunk or 0.95? > > Mihael > > On Wed, 2014-12-17 at 14:26 -0600, Ketan Maheshwari wrote: > > Hi, > > > > I am dealing with a workflow pattern where an app expects multiple output > > files with a pattern. > > > > The app signature is: > > > > app (file[] _wrfout, file _out, file _err) wrf_app (file _wrf_in, file[] > > _tbl, file[] _ozone, ...) > > { > > wrf stdout=@_out stderr=@_err; > > } > > > > The _wrfout files are the app result files which follows a pattern: > wrfout_* > > > > So, I am invoking the application in a foreach loop as: > > > > foreach i in [0:2]{ > > file[] wrfout > pattern="wrfout_*">; > > file wrfstdout "/std.out")>; > > file wrfstderr "/std.err")>; > > > > (wrfout, wrfstdout, wrfstderr) = wrf_app (wrfin, tbl, ozone, tr, data, > > gribmap, namelist, co2_trans, input_sounding); > > } > > > > The script hangs at runtime with the following messages: > > > > No events in 1s. > > Finding dependency loops... > > > > Waiting threads: > > Thread: R-6-0-4, waiting on wrfout (declared on line 50) > > swift:stageOut, wf.edison, line 134 > > swift:execute, wf.edison, line 123 > > wrf_app, wf.edison, line 242 > > > > Thread: R-6-2-4, waiting on wrfout (declared on line 50) > > swift:stageOut, wf.edison, line 134 > > swift:execute, wf.edison, line 123 > > wrf_app, wf.edison, line 242 > > > > Thread: R-6-1-4, waiting on wrfout (declared on line 50) > > swift:stageOut, wf.edison, line 134 > > swift:execute, wf.edison, line 123 > > wrf_app, wf.edison, line 242 > > > > Any suggestions? > > > > Thanks, > > Ketan > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Dec 22 17:14:16 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 22 Dec 2014 15:14:16 -0800 Subject: [Swift-user] output file array In-Reply-To: References: Message-ID: <1419290056.31348.0.camel@echo> Hi, I don't think 0.95 supports dynamic arrays output from apps. You will need trunk/0.96 for that. Mihael On Mon, 2014-12-22 at 15:37 -0600, Ketan Maheshwari wrote: > Hi Mihael, > > This is with Swift 0.95. > > Thanks, > Ketan > > On Sun, Dec 21, 2014 at 2:40 PM, Hategan-Marandiuc, Philip M. < > hategan at mcs.anl.gov> wrote: > > > Hi Ketan, > > > > Sorry for the delay. Is this trunk or 0.95? > > > > Mihael > > > > On Wed, 2014-12-17 at 14:26 -0600, Ketan Maheshwari wrote: > > > Hi, > > > > > > I am dealing with a workflow pattern where an app expects multiple output > > > files with a pattern. > > > > > > The app signature is: > > > > > > app (file[] _wrfout, file _out, file _err) wrf_app (file _wrf_in, file[] > > > _tbl, file[] _ozone, ...) > > > { > > > wrf stdout=@_out stderr=@_err; > > > } > > > > > > The _wrfout files are the app result files which follows a pattern: > > wrfout_* > > > > > > So, I am invoking the application in a foreach loop as: > > > > > > foreach i in [0:2]{ > > > file[] wrfout > > pattern="wrfout_*">; > > > file wrfstdout > "/std.out")>; > > > file wrfstderr > "/std.err")>; > > > > > > (wrfout, wrfstdout, wrfstderr) = wrf_app (wrfin, tbl, ozone, tr, data, > > > gribmap, namelist, co2_trans, input_sounding); > > > } > > > > > > The script hangs at runtime with the following messages: > > > > > > No events in 1s. > > > Finding dependency loops... > > > > > > Waiting threads: > > > Thread: R-6-0-4, waiting on wrfout (declared on line 50) > > > swift:stageOut, wf.edison, line 134 > > > swift:execute, wf.edison, line 123 > > > wrf_app, wf.edison, line 242 > > > > > > Thread: R-6-2-4, waiting on wrfout (declared on line 50) > > > swift:stageOut, wf.edison, line 134 > > > swift:execute, wf.edison, line 123 > > > wrf_app, wf.edison, line 242 > > > > > > Thread: R-6-1-4, waiting on wrfout (declared on line 50) > > > swift:stageOut, wf.edison, line 134 > > > swift:execute, wf.edison, line 123 > > > wrf_app, wf.edison, line 242 > > > > > > Any suggestions? > > > > > > Thanks, > > > Ketan > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > From iraicu at cs.iit.edu Mon Dec 22 19:12:18 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 22 Dec 2014 19:12:18 -0600 Subject: [Swift-user] CFP: IEEE International Conference on Cluster Computing 2015 -- Papers due 02/27/15 Message-ID: <5498C172.5060108@cs.iit.edu> IEEE International Conference on Cluster Computing September 8-11, 2015 Chicago, IL, USA http://www.mcs.anl.gov/ieeecluster2015/ ---------------------------------------------- ...Follow us on Facebook athttps://www.facebook.com/ieee.cluster ...Follow us on Twitter athttps://twitter.com/IEEECluster ...Follow us on Linkedin at https://www.linkedin.com/groups/IEEE-International-Conference-on-Cluster-7428925 ...Follow us on RenRen athttp://page.renren.com/601871401 ---------------------------------------------- CALL FOR PAPERS Following the successes of the series of Cluster conferences, for 2015 we solicit high-quality original papers presenting work that advances the state-of-the-art in clusters and closely related fields. All papers will be rigorously peer-reviewed for their originality, technical depth and correctness, potential impact, relevance to the conference, and quality of presentation. Research papers must clearly demonstrate research contributions and novelty, while papers reporting experience must clearly describe lessons learned and impact, along with the utility of the approach compared to the ones in the past. PAPER TRACKS * Applications, Algorithms, and Libraries * Architecture, Networks/Communication, and Management * Programming and Systems Software * Data, Storage, and Visualization SUBMISSION GUIDELINES Authors are invited to submit papers electronically in PDF format. Submitted manuscripts should be structured as technical papers and may not exceed 10 letter-size (8.5 x 11) pages including figures, tables and references using the IEEE format for conference proceedings. Submissions not conforming to these guidelines may be returned without review. Authors should make sure that their file will print on a printer that uses letter-size (8.5 x 11) paper. The official language of the conference is English. All manuscripts will be reviewed and will be judged on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the conference attendees. Paper submissions are limited to 10 pages in 2-column IEEE format including all figures and references. Submitted manuscripts exceeding this limit will be returned without review. For the final camera-ready version, authors with accepted papers may purchase additional pages at the following rates: 200 USD for each of two additional pages. See formatting templates for details: * LaTex Package http://datasys.cs.iit.edu/events/CCGrid2014/IEEECS_confs_LaTeX.zip * Word Template http://datasys.cs.iit.edu/events/CCGrid2014/instruct8.5x11x2.doc and http://datasys.cs.iit.edu/events/CCGrid2014/instruct8.5x11x2.pdf Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors and sponsors of the conference. Submissions received after the due date, exceeding the page limit, or not appropriately structured may not be considered. Authors may contact the conference chairs for more information. The proceedings will be published through the IEEE Computer Society Conference Publishing Services. Please submit your paper via the EasyChair submission system (Not open yet). JOURNAL SPECIAL ISSUE The best papers of Cluster 2015 will be included in a Special Issue on advances in topics related to cluster computing of the Elsevier International Journal of Parallel Computing (PARCO), edited by Pavan Balaji, Satoshi Matsuoka, and Michela Taufer. This special issue is dedicated for the papers accepted in the Cluster 2015 conference. The submission to this special issue is by invitation only. IMPORTANT DATES January 1, 2015 ........... Submissions open for Papers February 27, 2015 ....... Papers Submission Deadline April 23, 2015 ............... Papers Acceptance Notification August 1, 2015 ............ Camera-ready Copy Deadline for Papers See other important dates here http://www.mcs.anl.gov/ieeecluster2015/author-information/important-dates/. CLUSTER 2015 PROGRAM CHAIR Satoshi Matsuoka, Tokyo Institute of Technology (matsu AT is.titech.ac.jp). -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= From ketan at mcs.anl.gov Tue Dec 23 15:15:25 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 23 Dec 2014 15:15:25 -0600 Subject: [Swift-user] Edison: pbs job starts but Swift unresponsive Message-ID: Hi, Trying to run an application on Edison (Cray). I see with qsub that the job has started but on the Swift side, I still see status as submitted. This happens with local:pbs coasters with both 0.95 and latest trunk using a simple catsn example. Any suggestions? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Dec 23 16:35:53 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 23 Dec 2014 14:35:53 -0800 Subject: [Swift-user] cray (Edison) In-Reply-To: References: Message-ID: <1419374153.9912.4.camel@echo> Hi Ketan, I would try nodeGranularity/maxNodes = 600 and mppppn = 24. The term "node" there is a slight misnomer. Mihael On Thu, 2014-12-18 at 09:00 -0600, Ketan Maheshwari wrote: > Hi, > > I am trying to submit to a cray machine (Edison) with 24 cores per > node. I > am looking to submit a 25 node 600 tasks job but with my sites > configuration, it results in a 25 node 25 tasks submission. > > The sites file bits are: > > 24 > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24 > 1000 > 00:30:00 > 1 > 25 > 25 > > The resulting job is: > > > Req'd Req'd Elap > Job ID Username Queue Jobname SessID > NDS > TSK Memory Time S Time > ----------------------- ----------- -------- ---------------- ------ > ----- > ------ ------ --------- - --------- > 2204800.edique02 ketan debug B1218-5106290-0 -- > 25 > 25 -- 00:16:00 C -- > > The resulting submit script is: > > #PBS -S /bin/bash > #PBS -N B1218-5106290-0 > #PBS -m n > #PBS -l mppwidth=25,mppnppn=1,mppdepth=24 > #PBS -l walltime=00:16:00 > #PBS -q debug > #PBS -o > /scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.stdout > #PBS -e > /scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.stderr > export WORKER_LOGGING_LEVEL=NONE > #PBS -v WORKER_LOGGING_LEVEL > cd / && aprun -n 25 -N 1 -cc none -d 24 -F exclusive /bin/sh -c > '/usr/bin/perl /global/homes/k/ketan/.globus/coasters/ > cscript6966010727500767046.pl http://10.10.20.170:58984, > http://10.100.100.52:58984,http://10.141.1.2:58984,http://127.0.0.2:58984, > http://128.55.34.2:58984,http://128.55.72.100:58984, > http://128.55.72.22:58984 1218-5106290-000000 NOLOGGING' > /bin/echo $? > >/scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.exitcode > > > I am looking to get mppwidth and -n switch of aprun to 600 > > Thanks for any suggestions. > > Ketan > > --001a11394b96d0fcc6050a7eda85 > Content-Type: text/html; charset="UTF-8" > Content-Transfer-Encoding: quoted-printable > > charset=3Dutf-8"> iv dir=3D"ltr">Hi,

I am trying to submit to a cray > machi= > ne (Edison) with 24 cores per node. I am looking to submit a 25 node > 600 ta= > sks job but with my sites configuration, it results in a 25 node 25 > tasks s= > ubmission.

The sites file bits > are:

= >
   <profile > namespace=3D"globus" key= > =3D"jobsPerNode">24</profile>
  >   &l= > t;profile namespace=3D"globus" > key=3D"providerAttributes&quo= > t;>pbs.aprun;pbs.mpp;depth=3D24</profile>
  >   = > <profile namespace=3D"globus" > key=3D"maxTime">100= > 0</profile>
    <profile > namespace=3D"glo= > bus" > key=3D"wallTime">00:30:00</profile>
= >     <profile namespace=3D"globus" > key=3D"slots&= > quot;>1</profile>
    <profile > namespace=3D= > "globus" > key=3D"nodeGranularity">25</profile><= > /div>
    <profile namespace=3D"globus" > key=3D&q= > uot;maxNodes">25</profile>

The= > resulting job is:

      >  = > ;                   >   &nb= > sp;                   >   &= > nbsp;                   >  = >           Req'd    Req'd   >   = >   Elap
Job ID           >   = >      Username    Queue    Jobname >  = >        SessID  NDS   TSK   Memory >  = > ; Time    S   Time
----------------------- > -------= > ---- -------- ---------------- ------ ----- ------ ------ --------- - > -----= > ----
2204800.edique02        ketan >   &nb= > sp;   debug    B1218-5106290-0     --   >  = > ; 25     25    --   00:16:00 C     >  = > ; --

The resulting submit script > is:
iv>
#PBS -S /bin/bash
#PBS -N > B1218-5106290-0<= > /div>
#PBS -m n
#PBS -l > mppwidth=3D25,mppnppn=3D1,mppdepth=3D= > 24
#PBS -l walltime=3D00:16:00
#PBS -q > debug
= > #PBS > -o /scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/P= > BS8090417396448920648.submit.stdout
#PBS > -e /scratch2/scratchdirs= > /ketan/WRF_LES/WRFV3/ketanrun/run001/scripts/PBS8090417396448920648.submit.= > stderr
export WORKER_LOGGING_LEVEL=3DNONE
#PBS -v > WORKE= > R_LOGGING_LEVEL
/bin/e= > cho $? > >/scratch2/scratchdirs/ketan/WRF_LES/WRFV3/ketanrun/run001/script= > s/PBS8090417396448920648.submit.exitcode

>
I am looking to get mppwidth and -n switch of aprun to > 600 = > ;

Thanks for any > suggestions.

= >
Ketan
> > --001a11394b96d0fcc6050a7eda85-- > > --===============0073910834839521412== > Content-Type: text/plain; charset="us-ascii" > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > --===============0073910834839521412==-- > From hategan at mcs.anl.gov Tue Dec 23 16:37:45 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 23 Dec 2014 14:37:45 -0800 Subject: [Swift-user] Edison: pbs job starts but Swift unresponsive In-Reply-To: References: Message-ID: <1419374265.9912.6.camel@echo> Hi Ketan, The PBS job having started means that the workers have started. However, there are some steps that need to happen between that and the workers being able to receive jobs. I'm guessing some of those steps might either be failing or they take some time. If you could send a log, that might help figure out what's happening. Mihael On Tue, 2014-12-23 at 15:15 -0600, Ketan Maheshwari wrote: > Hi, > > Trying to run an application on Edison (Cray). > > I see with qsub that the job has started but on the Swift side, I still see > status as submitted. > > This happens with local:pbs coasters with both 0.95 and latest trunk using > a simple catsn example. > > Any suggestions? > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ketan at mcs.anl.gov Tue Dec 30 12:49:33 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 30 Dec 2014 12:49:33 -0600 Subject: [Swift-user] Edison: pbs job starts but Swift unresponsive In-Reply-To: <1419374265.9912.6.camel@echo> References: <1419374265.9912.6.camel@echo> Message-ID: Hi Mihael, It takes about 8-9 minutes after the worker starting (ie. queue showing running status) that the Swift progress text shows active status. In the active status, one wave of tasks finishes and the status goes back to submit state but now no job shows up in the queue. Attached is a rundir of one such run. Thanks for any inputs, Ketan On Tue, Dec 23, 2014 at 4:37 PM, Mihael Hategan wrote: > Hi Ketan, > > The PBS job having started means that the workers have started. However, > there are some steps that need to happen between that and the workers > being able to receive jobs. I'm guessing some of those steps might > either be failing or they take some time. > > If you could send a log, that might help figure out what's happening. > > Mihael > > On Tue, 2014-12-23 at 15:15 -0600, Ketan Maheshwari wrote: > > Hi, > > > > Trying to run an application on Edison (Cray). > > > > I see with qsub that the job has started but on the Swift side, I still > see > > status as submitted. > > > > This happens with local:pbs coasters with both 0.95 and latest trunk > using > > a simple catsn example. > > > > Any suggestions? > > > > Thanks, > > Ketan > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run001.tgz Type: application/x-gzip Size: 39789 bytes Desc: not available URL: