From zhaozhang at uchicago.edu Mon Mar 9 12:07:39 2009 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 09 Mar 2009 12:07:39 -0500 Subject: [Swift-user] class not found error from swift Message-ID: <49B54CDB.6060908@uchicago.edu> Hi, I got an error message for a sanity test of a latest swift: zzhang at login6.surveyor:~/new_dock6> swift first.swift Execution failed: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor I check out the latest code, and build it, set PATH to the right dir, then I try to run this example as I usually did, but it failed. Below is the error message. Any ideas on this? zhao zzhang at login6.surveyor:~/new_dock6> cat first-20090309-1204-8ybqqe9f.log 2009-03-09 12:04:14,695-0500 DEBUG Loader kmlversion is >d1035a25-6ebd-404a-9b15-b03c27dc3bee< 2009-03-09 12:04:14,695-0500 DEBUG Loader build version is >f1b24b29-cc83-4ebd-affd-dc63ea733a1d< 2009-03-09 12:04:14,695-0500 INFO Loader first.swift: source file was compiled with a different version of Swift. Recompiling. 2009-03-09 12:04:16,665-0500 INFO Karajan Validation of XML intermediate file was successful 2009-03-09 12:04:16,675-0500 INFO VariableScope New scope 663980386 with no parent. 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1395853665 with no parent. 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1358006625 with no parent. 2009-03-09 12:04:16,710-0500 INFO VariableScope Adding variable t of type messagefile to scope 1358006625 2009-03-09 12:04:16,845-0500 INFO VariableScope Adding variable outfile of type messagefile to scope 663980386 2009-03-09 12:04:16,852-0500 INFO VariableScope thats the declaration for outfile 2009-03-09 12:04:19,089-0500 DEBUG VDL2ExecutionContext Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor kernel:elementdef @ vdl-lib.xml, line: 94 kernel:export @ vdl-lib.xml, line: 94 kernel:namespace @ vdl-lib.xml, line: 30 kernel:import @ vdl-lib.xml, line: 4 kernel:import @ scheduler.xml, line: 3 kernel:project @ first.kml, line: 2 first-20090309-1204-8ybqqe9f Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) at org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled Code)) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled Code)) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at java.net.URLClassLoader.findClass(URLClassLoader.java:378) at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) at java.lang.ClassLoader.loadClass(ClassLoader.java:502) at java.lang.Class.forName1(Native Method) at java.lang.Class.forName(Class.java(Compiled Code)) at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) ... 12 more Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled Code)) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) at org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled Code)) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) ... 7 more Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at java.net.URLClassLoader.findClass(URLClassLoader.java:378) at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) at java.lang.ClassLoader.loadClass(ClassLoader.java:502) at java.lang.Class.forName1(Native Method) at java.lang.Class.forName(Class.java(Compiled Code)) at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) ... 12 more 2009-03-09 12:04:19,118-0500 INFO ExecutionContext Detailed exception: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor kernel:elementdef @ vdl-lib.xml, line: 94 kernel:export @ vdl-lib.xml, line: 94 kernel:namespace @ vdl-lib.xml, line: 30 kernel:import @ vdl-lib.xml, line: 4 kernel:import @ scheduler.xml, line: 3 kernel:project @ first.kml, line: 2 first-20090309-1204-8ybqqe9f Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) at org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled Code)) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled Code)) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at java.net.URLClassLoader.findClass(URLClassLoader.java:378) at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) at java.lang.ClassLoader.loadClass(ClassLoader.java:502) at java.lang.Class.forName1(Native Method) at java.lang.Class.forName(Class.java(Compiled Code)) at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) ... 12 more Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled Code)) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) at org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled Code)) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) ... 7 more Caused by: java.lang.ClassNotFoundException: org.griphyn.vdl.karajan.lib.ThrottledParallelFor at java.net.URLClassLoader.findClass(URLClassLoader.java:378) at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) at java.lang.ClassLoader.loadClass(ClassLoader.java:502) at java.lang.Class.forName1(Native Method) at java.lang.Class.forName(Class.java(Compiled Code)) at org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) ... 12 more 2009-03-09 12:04:19,120-0500 DEBUG Loader Swift finished with errors From benc at hawaga.org.uk Mon Mar 9 12:14:05 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 Mar 2009 17:14:05 +0000 (GMT) Subject: [Swift-user] class not found error from swift In-Reply-To: <49B54CDB.6060908@uchicago.edu> References: <49B54CDB.6060908@uchicago.edu> Message-ID: On Mon, 9 Mar 2009, Zhao Zhang wrote: > Fatal: Class not found: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor Looks like r2665 didn't add that file when it should. Mihael will probably add it when he sees this message. In the meantime, you can force SVN to checkout version r2664 of Swift by using the parameter -r 2665 with the checkout command (or use an older installation). -- From benc at hawaga.org.uk Mon Mar 9 12:15:48 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 Mar 2009 17:15:48 +0000 (GMT) Subject: [Swift-user] class not found error from swift In-Reply-To: References: <49B54CDB.6060908@uchicago.edu> Message-ID: > In the meantime, you can force SVN to checkout version r2664 of Swift by > using the parameter -r 2665 with the checkout command (or use an older > installation). -r 2664 I mean -- From hategan at mcs.anl.gov Mon Mar 9 12:19:00 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 09 Mar 2009 12:19:00 -0500 Subject: [Swift-user] class not found error from swift In-Reply-To: <49B54CDB.6060908@uchicago.edu> References: <49B54CDB.6060908@uchicago.edu> Message-ID: <1236619140.5856.0.camel@localhost> I added the missing file in r2671. Sorry. On Mon, 2009-03-09 at 12:07 -0500, Zhao Zhang wrote: > Hi, > > I got an error message for a sanity test of a latest swift: > > zzhang at login6.surveyor:~/new_dock6> swift first.swift > Execution failed: > Fatal: Class not found: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > > > I check out the latest code, and build it, set PATH to the right dir, > then I try to run this example as I usually did, but it failed. > Below is the error message. Any ideas on this? > > zhao > > zzhang at login6.surveyor:~/new_dock6> cat first-20090309-1204-8ybqqe9f.log > 2009-03-09 12:04:14,695-0500 DEBUG Loader kmlversion is > >d1035a25-6ebd-404a-9b15-b03c27dc3bee< > 2009-03-09 12:04:14,695-0500 DEBUG Loader build version is > >f1b24b29-cc83-4ebd-affd-dc63ea733a1d< > 2009-03-09 12:04:14,695-0500 INFO Loader first.swift: source file was > compiled with a different version of Swift. Recompiling. > 2009-03-09 12:04:16,665-0500 INFO Karajan Validation of XML > intermediate file was successful > 2009-03-09 12:04:16,675-0500 INFO VariableScope New scope 663980386 > with no parent. > 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1395853665 > with no parent. > 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1358006625 > with no parent. > 2009-03-09 12:04:16,710-0500 INFO VariableScope Adding variable t of > type messagefile to scope 1358006625 > 2009-03-09 12:04:16,845-0500 INFO VariableScope Adding variable outfile > of type messagefile to scope 663980386 > 2009-03-09 12:04:16,852-0500 INFO VariableScope thats the declaration > for outfile > 2009-03-09 12:04:19,089-0500 DEBUG VDL2ExecutionContext Fatal: Class not > found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > kernel:elementdef @ vdl-lib.xml, line: 94 > kernel:export @ vdl-lib.xml, line: 94 > kernel:namespace @ vdl-lib.xml, line: 30 > kernel:import @ vdl-lib.xml, line: 4 > kernel:import @ scheduler.xml, line: 3 > kernel:project @ first.kml, line: 2 > first-20090309-1204-8ybqqe9f > Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) > at > org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at java.net.URLClassLoader.findClass(URLClassLoader.java:378) > at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) > at java.lang.ClassLoader.loadClass(ClassLoader.java:502) > at java.lang.Class.forName1(Native Method) > at java.lang.Class.forName(Class.java(Compiled Code)) > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) > ... 12 more > > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) > at > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) > at > org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > ... 7 more > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at java.net.URLClassLoader.findClass(URLClassLoader.java:378) > at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) > at java.lang.ClassLoader.loadClass(ClassLoader.java:502) > at java.lang.Class.forName1(Native Method) > at java.lang.Class.forName(Class.java(Compiled Code)) > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) > ... 12 more > 2009-03-09 12:04:19,118-0500 INFO ExecutionContext Detailed exception: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > kernel:elementdef @ vdl-lib.xml, line: 94 > kernel:export @ vdl-lib.xml, line: 94 > kernel:namespace @ vdl-lib.xml, line: 30 > kernel:import @ vdl-lib.xml, line: 4 > kernel:import @ scheduler.xml, line: 3 > kernel:project @ first.kml, line: 2 > first-20090309-1204-8ybqqe9f > Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) > at > org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at java.net.URLClassLoader.findClass(URLClassLoader.java:378) > at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) > at java.lang.ClassLoader.loadClass(ClassLoader.java:502) > at java.lang.Class.forName1(Native Method) > at java.lang.Class.forName(Class.java(Compiled Code)) > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) > ... 12 more > > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) > at > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: > Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) > at > org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled > Code)) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > ... 7 more > Caused by: java.lang.ClassNotFoundException: > org.griphyn.vdl.karajan.lib.ThrottledParallelFor > at java.net.URLClassLoader.findClass(URLClassLoader.java:378) > at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) > at java.lang.ClassLoader.loadClass(ClassLoader.java:502) > at java.lang.Class.forName1(Native Method) > at java.lang.Class.forName(Class.java(Compiled Code)) > at > org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) > ... 12 more > 2009-03-09 12:04:19,120-0500 DEBUG Loader Swift finished with errors > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From zhaozhang at uchicago.edu Mon Mar 9 13:14:46 2009 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 09 Mar 2009 13:14:46 -0500 Subject: [Swift-user] class not found error from swift In-Reply-To: <1236619140.5856.0.camel@localhost> References: <49B54CDB.6060908@uchicago.edu> <1236619140.5856.0.camel@localhost> Message-ID: <49B55C96.7080506@uchicago.edu> got it, thanks guys! zhao Mihael Hategan wrote: > I added the missing file in r2671. Sorry. > > On Mon, 2009-03-09 at 12:07 -0500, Zhao Zhang wrote: > >> Hi, >> >> I got an error message for a sanity test of a latest swift: >> >> zzhang at login6.surveyor:~/new_dock6> swift first.swift >> Execution failed: >> Fatal: Class not found: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> >> >> I check out the latest code, and build it, set PATH to the right dir, >> then I try to run this example as I usually did, but it failed. >> Below is the error message. Any ideas on this? >> >> zhao >> >> zzhang at login6.surveyor:~/new_dock6> cat first-20090309-1204-8ybqqe9f.log >> 2009-03-09 12:04:14,695-0500 DEBUG Loader kmlversion is >> >d1035a25-6ebd-404a-9b15-b03c27dc3bee< >> 2009-03-09 12:04:14,695-0500 DEBUG Loader build version is >> >f1b24b29-cc83-4ebd-affd-dc63ea733a1d< >> 2009-03-09 12:04:14,695-0500 INFO Loader first.swift: source file was >> compiled with a different version of Swift. Recompiling. >> 2009-03-09 12:04:16,665-0500 INFO Karajan Validation of XML >> intermediate file was successful >> 2009-03-09 12:04:16,675-0500 INFO VariableScope New scope 663980386 >> with no parent. >> 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1395853665 >> with no parent. >> 2009-03-09 12:04:16,706-0500 INFO VariableScope New scope 1358006625 >> with no parent. >> 2009-03-09 12:04:16,710-0500 INFO VariableScope Adding variable t of >> type messagefile to scope 1358006625 >> 2009-03-09 12:04:16,845-0500 INFO VariableScope Adding variable outfile >> of type messagefile to scope 663980386 >> 2009-03-09 12:04:16,852-0500 INFO VariableScope thats the declaration >> for outfile >> 2009-03-09 12:04:19,089-0500 DEBUG VDL2ExecutionContext Fatal: Class not >> found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> kernel:elementdef @ vdl-lib.xml, line: 94 >> kernel:export @ vdl-lib.xml, line: 94 >> kernel:namespace @ vdl-lib.xml, line: 30 >> kernel:import @ vdl-lib.xml, line: 4 >> kernel:import @ scheduler.xml, line: 3 >> kernel:project @ first.kml, line: 2 >> first-20090309-1204-8ybqqe9f >> Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) >> at >> org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at java.net.URLClassLoader.findClass(URLClassLoader.java:378) >> at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:502) >> at java.lang.Class.forName1(Native Method) >> at java.lang.Class.forName(Class.java(Compiled Code)) >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) >> ... 12 more >> >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) >> at >> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) >> at >> org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) >> ... 7 more >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at java.net.URLClassLoader.findClass(URLClassLoader.java:378) >> at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:502) >> at java.lang.Class.forName1(Native Method) >> at java.lang.Class.forName(Class.java(Compiled Code)) >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) >> ... 12 more >> 2009-03-09 12:04:19,118-0500 INFO ExecutionContext Detailed exception: >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> kernel:elementdef @ vdl-lib.xml, line: 94 >> kernel:export @ vdl-lib.xml, line: 94 >> kernel:namespace @ vdl-lib.xml, line: 30 >> kernel:import @ vdl-lib.xml, line: 4 >> kernel:import @ scheduler.xml, line: 3 >> kernel:project @ first.kml, line: 2 >> first-20090309-1204-8ybqqe9f >> Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) >> at >> org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at java.net.URLClassLoader.findClass(URLClassLoader.java:378) >> at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:502) >> at java.lang.Class.forName1(Native Method) >> at java.lang.Class.forName(Class.java(Compiled Code)) >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) >> ... 12 more >> >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) >> at >> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:151) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:255) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:285) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:397) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> Caused by: org.globus.cog.karajan.workflow.KarajanRuntimeException: >> Fatal: Class not found: org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:25) >> at >> org.globus.cog.karajan.workflow.nodes.ElementDefNode.post(ElementDefNode.java:57) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java(Compiled >> Code)) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) >> ... 7 more >> Caused by: java.lang.ClassNotFoundException: >> org.griphyn.vdl.karajan.lib.ThrottledParallelFor >> at java.net.URLClassLoader.findClass(URLClassLoader.java:378) >> at java.lang.ClassLoader.loadClass(ClassLoader.java(Compiled Code)) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:442) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:502) >> at java.lang.Class.forName1(Native Method) >> at java.lang.Class.forName(Class.java(Compiled Code)) >> at >> org.globus.cog.karajan.workflow.JavaElement.(JavaElement.java:22) >> ... 12 more >> 2009-03-09 12:04:19,120-0500 DEBUG Loader Swift finished with errors >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > > From wilde at mcs.anl.gov Sun Mar 15 23:44:34 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 15 Mar 2009 23:44:34 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? Message-ID: <49BDD932.9000800@mcs.anl.gov> Does the env var SWIFT_JOBDIR_PATH set the jobdir for each job in a script to this value, and use it for output as well as input? Or is it only used for input? I have it set in tc.data like this: bgp015 runrama /intrepid-fs0/users/wilde/persistent/oops/oops-r055/bin/runrama.sh INSTALLED INTEL32::LINUX env::SWIFT_JOBDIR_PATH="/dev/shm" ...to use the local ramdisk on BGP nodes. But as my jobs run, I can see that they are writing their log data, line by line, to the output directories of the shared workdirectory. Thats the overhead I was hoping to avoid with SWIFT_JOBDIR_PATH. Or, do I have it set wrong? - Mike SWIFT_JOBDIR_PATH - set in env namespace profiles. If set, then Swift will use the path specified here as a worker-node local temporary directory to copy input files to before running a job. If unset, Swift will keep input files on the site-shared filesystem. In some cases, copying to a worker-node local directory can be much faster than having applications access the site-shared filesystem directly. From benc at hawaga.org.uk Mon Mar 16 04:07:26 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 09:07:26 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BDD932.9000800@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> Message-ID: On Sun, 15 Mar 2009, Michael Wilde wrote: > Does the env var SWIFT_JOBDIR_PATH set the jobdir for each job in a script to > this value, and use it for output as well as input? Or is it only used for > input? It sets the root under which the per-job working directories exist, that is otherwise the run shared directory with /jobs/ on the end > But as my jobs run, I can see that they are writing their log data, line by > line, to the output directories of the shared workdirectory. Thats the > overhead I was hoping to avoid with SWIFT_JOBDIR_PATH. You mean the wrapper logs are being written to the wrapper log directory info/ ? That will happen. Zhao has hacked things in the past to not store the log. Another thing that can be done is to create the log in the job directory and copy it at the end. However, in certain failure modes you then won't get any log data at all. I can implement an option for that. -- From wilde at mcs.anl.gov Mon Mar 16 06:50:07 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 06:50:07 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> Message-ID: <49BE3CEF.6000703@mcs.anl.gov> On 3/16/09 4:07 AM, Ben Clifford wrote: > On Sun, 15 Mar 2009, Michael Wilde wrote: > >> Does the env var SWIFT_JOBDIR_PATH set the jobdir for each job in a script to >> this value, and use it for output as well as input? Or is it only used for >> input? > > It sets the root under which the per-job working directories exist, that > is otherwise the run shared directory with /jobs/ on the end OK, thats exactly what I as hoping it did. >> But as my jobs run, I can see that they are writing their log data, line by >> line, to the output directories of the shared workdirectory. Thats the >> overhead I was hoping to avoid with SWIFT_JOBDIR_PATH. > > You mean the wrapper logs are being written to the wrapper log directory > info/ ? No, this is an application log file - just one of the app's outputs. I could see these ".log" files being created in the workdir as soon as the script started, and was able to tail -f these files to see the app's progress. So if this feature works as above, then something is wrong. Possibly I didnt specify it correctly, or its broken. I'll need to investigate. > That will happen. Zhao has hacked things in the past to not store the log. > Another thing that can be done is to create the log in the job directory > and copy it at the end. However, in certain failure modes you then won't > get any log data at all. > > I can implement an option for that. > From benc at hawaga.org.uk Mon Mar 16 06:54:22 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 11:54:22 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE3CEF.6000703@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Michael Wilde wrote: > No, this is an application log file - just one of the app's outputs. I could > see these ".log" files being created in the workdir as soon as the script > started, and was able to tail -f these files to see the app's progress. OK, those should all appear under the SWIFT_JOBDIR_PATH directory. I've never seen that profile specified in tc.data; only in sites.xml. It might be that something is wrong there - try it in sites.xml In the -info log for a job, you always should see one of two log messages: "Job directory mode is: local copy" or "Job directory mode is: link on shared filesystem" with the former meaning that SWIFT_JOBDIR_PATH is being detected in the wrapper environment, and the latter meaning that it is not. Looking for those lines in your present setup would be useful. -- From wilde at mcs.anl.gov Mon Mar 16 07:01:45 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 07:01:45 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> Message-ID: <49BE3FA9.8030106@mcs.anl.gov> I see: Job directory mode is: link on shared filesystem I'll try moving the env var to sites.xml. In this case, I wanted only this job to get this behavior, as a later job in the script analyzes all the files that are produced in parallel, and might eventually exceed the available limited local BG/P filesystem (ramdisk). But for the tests Im doing now this is not a problem. On 3/16/09 6:54 AM, Ben Clifford wrote: > On Mon, 16 Mar 2009, Michael Wilde wrote: > >> No, this is an application log file - just one of the app's outputs. I could >> see these ".log" files being created in the workdir as soon as the script >> started, and was able to tail -f these files to see the app's progress. > > OK, those should all appear under the SWIFT_JOBDIR_PATH directory. > > I've never seen that profile specified in tc.data; only in sites.xml. It > might be that something is wrong there - try it in sites.xml > > In the -info log for a job, you always should see one of two log messages: > > "Job directory mode is: local copy" > > or > > "Job directory mode is: link on shared filesystem" > > with the former meaning that SWIFT_JOBDIR_PATH is being detected in the > wrapper environment, and the latter meaning that it is not. > > Looking for those lines in your present setup would be useful. > From benc at hawaga.org.uk Mon Mar 16 07:43:34 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 12:43:34 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE3FA9.8030106@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Michael Wilde wrote: > But for the tests Im doing now this is not a problem. Please indicate what you discover anyway. -- From wilde at mcs.anl.gov Mon Mar 16 08:03:48 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:03:48 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> Message-ID: <49BE4E34.2010501@mcs.anl.gov> I discovered I misspelled the var name, and am retesting specifying it from both tc.data and sites.xml. On 3/16/09 7:43 AM, Ben Clifford wrote: > On Mon, 16 Mar 2009, Michael Wilde wrote: > >> But for the tests Im doing now this is not a problem. > > Please indicate what you discover anyway. > From wilde at mcs.anl.gov Mon Mar 16 08:35:31 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:35:31 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE4E34.2010501@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> Message-ID: <49BE55A3.6070702@mcs.anl.gov> Im still unable to get this to work. For the test case of specifying in sites.xml I have: /home/wilde/swiftwork/ /intrepid-fs0/users/wilde/persistent/oops/swift/work 8 1000 /dev/shm and in the info file I see whats pasted below. I *thought* this info file is a full wrapper.sh log and includes a section listing all the environment vars, so I could verify if SWIFT_JOBDIR_PATH made it into the environment. Can you remind me how to turn that on? Cant find it in swift.properties, will check the wrapper.sh code. - Mike int$ cat ./info/1/runrama-1in2318j-info Progress 2009-03-16 08:24:53.%N-0500 LOG_START _____________________________________________________________________________ Wrapper _____________________________________________________________________________ Job directory mode is: link on shared filesystem DIR=jobs/1/runrama-1in2318j EXEC=/intrepid-fs0/users/wilde/persistent/oops/oops-r055/bin/runrama.sh STDIN= STDOUT=output/T1af7/0000_0001.log STDERR=stderr.txt DIRS=input/secseq^input/native^input/fasta^secseq/T1af7^input/rama^output/T1af7 INF=input/secseq/T1af7.secseq^input/native/T1af7.pdb^input/fasta/T1af7.fasta^secseq/T1af7/T1af7.0000.secseq^input/rama/T1af7.rama_index^input/rama/T1af7.rama_map OUTF=output/T1af7/0000_0001.log^output/T1af7/0000_0001.rmsd^output/T1af7/0000_0001.pdt KICKSTART= ARGS=input/fasta/T1af7.fasta secseq/T1af7/T1af7.0000.secseq input/native/T1af7.pdb input/rama/T1af7.rama_map output/T1af7/0000_0001.pdt output/T1af7/0000_0001.rmsd 1 KILL_TIME_=_1 ARGC=8 Progress 2009-03-16 08:24:54.%N-0500 CREATE_JOBDIR Created job directory: jobs/1/runrama-1in2318j --- On 3/16/09 8:03 AM, Michael Wilde wrote: > I discovered I misspelled the var name, and am retesting specifying it > from both tc.data and sites.xml. > > On 3/16/09 7:43 AM, Ben Clifford wrote: >> On Mon, 16 Mar 2009, Michael Wilde wrote: >> >>> But for the tests Im doing now this is not a problem. >> >> Please indicate what you discover anyway. >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Mon Mar 16 08:40:27 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:40:27 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE55A3.6070702@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> Message-ID: <49BE56CB.5090802@mcs.anl.gov> Ah, I see: in wrapper.sh the "info" function which logs the env etc is only invoked on failure, it seems. Unless you see an error in my sites.xml below, I'll add extra logging to wrapper.sh to see what my env is. Or test with printenv as the app. On 3/16/09 8:35 AM, Michael Wilde wrote: > Im still unable to get this to work. > > For the test case of specifying in sites.xml I have: > > > > > > /home/wilde/swiftwork/ > > > url="http://172.16.8.122:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> > > > > /intrepid-fs0/users/wilde/persistent/oops/swift/work > > 8 > 1000 > /dev/shm > > > > and in the info file I see whats pasted below. > > I *thought* this info file is a full wrapper.sh log and includes a > section listing all the environment vars, so I could verify if > SWIFT_JOBDIR_PATH made it into the environment. Can you remind me how > to turn that on? Cant find it in swift.properties, will check the > wrapper.sh code. > > - Mike > > > int$ cat ./info/1/runrama-1in2318j-info > Progress 2009-03-16 08:24:53.%N-0500 LOG_START > > _____________________________________________________________________________ > > > Wrapper > _____________________________________________________________________________ > > > Job directory mode is: link on shared filesystem > DIR=jobs/1/runrama-1in2318j > EXEC=/intrepid-fs0/users/wilde/persistent/oops/oops-r055/bin/runrama.sh > STDIN= > STDOUT=output/T1af7/0000_0001.log > STDERR=stderr.txt > DIRS=input/secseq^input/native^input/fasta^secseq/T1af7^input/rama^output/T1af7 > > INF=input/secseq/T1af7.secseq^input/native/T1af7.pdb^input/fasta/T1af7.fasta^secseq/T1af7/T1af7.0000.secseq^input/rama/T1af7.rama_index^input/rama/T1af7.rama_map > > OUTF=output/T1af7/0000_0001.log^output/T1af7/0000_0001.rmsd^output/T1af7/0000_0001.pdt > > KICKSTART= > ARGS=input/fasta/T1af7.fasta secseq/T1af7/T1af7.0000.secseq > input/native/T1af7.pdb input/rama/T1af7.rama_map > output/T1af7/0000_0001.pdt output/T1af7/0000_0001.rmsd 1 KILL_TIME_=_1 > ARGC=8 > Progress 2009-03-16 08:24:54.%N-0500 CREATE_JOBDIR > Created job directory: jobs/1/runrama-1in2318j > > --- > > > > On 3/16/09 8:03 AM, Michael Wilde wrote: >> I discovered I misspelled the var name, and am retesting specifying it >> from both tc.data and sites.xml. >> >> On 3/16/09 7:43 AM, Ben Clifford wrote: >>> On Mon, 16 Mar 2009, Michael Wilde wrote: >>> >>>> But for the tests Im doing now this is not a problem. >>> >>> Please indicate what you discover anyway. >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From benc at hawaga.org.uk Mon Mar 16 08:48:34 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 13:48:34 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE56CB.5090802@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Michael Wilde wrote: > Ah, I see: in wrapper.sh the "info" function which logs the env etc is only > invoked on failure, it seems. Unless you see an error in my sites.xml below, > I'll add extra logging to wrapper.sh to see what my env is. Or test with > printenv as the app. Yes, using env/printenv should be easy to do. Are you using an unmodified Swift? Please can you send a log from running something like first.swift, and its corresponding info file. Also, try with that exact same Swift installation, but running on localhost using the local provider and specifying SWIFT_JOBDIR_PATH. -- From wilde at mcs.anl.gov Mon Mar 16 08:49:03 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:49:03 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE56CB.5090802@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> Message-ID: <49BE58CF.2010704@mcs.anl.gov> Printenv indicates that SWIFT_JOBDIR_PATH is not getting into the app's environment: int$ cat pe.swift type file; app (file o) printenv () { printenv stdout=@o; } file ofile <"pe.out">; ofile = printenv(); int$ int$ cat pe.out PLOTICUS_HOME=/home/falkon/users/wilde/0922/ploticus FALKON_CLIENT_HOME=/home/falkon/users/wilde/0922/client GLOBUS_OPTIONS_MISC=-Xms512M -Xmx512M -Xss128K GLOBUS_LOCATION=/home/falkon/users/wilde/0922/container FALKON_CONFIG=/home/falkon/users/wilde/0922/config GLOBUS_PATH=/home/falkon/users/wilde/0922/container OLDPWD=/fuse/intrepid-fs0/users/wilde/persistent/oops/swift/work/pe-20090316-0845-sqceiryc ANT_HOME=/home/falkon/users/wilde/0922/apache-ant-1.7.0 LD_LIBRARY_PATH=.:/lib:/fuse/lib:/fuse/usr/lib:/home/falkon/users/wilde/0922/container/lib:/lib:/fuse/lib:/fuse/usr/lib FALKON_LOGS=/home/falkon/users/wilde/0922/logs FALKON_CLIENT_WORKLOADS_HOME=/home/falkon/users/wilde/0922/workloads FALKON_HOME=/home/falkon/users/wilde/0922 PATH=/home/falkon/users/wilde/0922/bin:/home/falkon/users/wilde/0922/service:/home/falkon/users/wilde/0922/worker:/home/falkon/users/wilde/0922/client:/home/falkon/users/wilde/0922/monitor:/home/falkon/users/wilde/0922/webserver:/home/falkon/users/wilde/0922/ploticus/src:/home/falkon/users/wilde/0922/apache-ant-1.7.0:/home/falkon/users/wilde/0922/apache-ant-1.7.0/bin:/home/falkon/users/wilde/0922/ibm-java2-ppc-50/jre:/home/falkon/users/wilde/0922/ibm-java2-ppc-50/jre/bin:/home/falkon/users/wilde/0922/container:/home/falkon/users/wilde/0922/container/bin:/home/falkon/users/wilde/0922/cog/modules/vdsk/dist/vdsk-svn/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:.:/bin:/usr/bin PWD=/fuse/intrepid-fs0/users/wilde/persistent/oops/swift/work/pe-20090316-0845-sqceiryc/jobs/r/printenv-rx6v318j JAVA_HOME=/home/falkon/users/wilde/0922/ibm-java2-ppc-50/jre COBALT_JOBID=113638 SWIFT_HOME=/home/falkon/users/wilde/0922/cog/modules/vdsk/dist/vdsk-svn SHLVL=2 GLOBUS_TCP_PORT_RANGE=50000,59999 FALKON_WWW_HOME=/home/falkon/users/wilde/0922/webserver BG_SIZE=64 FALKON_MONITOR_HOME=/home/falkon/users/wilde/0922/monitor FALKON_WORKER_HOME=/home/falkon/users/wilde/0922/worker FALKON_SERVICE_HOME=/home/falkon/users/wilde/0922/service CONTROL_INIT=4195440,8,64,62,48 FALKON_ROOT=/home/falkon/users/wilde _=/usr/bin/printenv int$ cat sites.xml /home/wilde/swiftwork/ /intrepid-fs0/users/wilde/persistent/oops/swift/work 8 1000 /dev/shm On 3/16/09 8:40 AM, Michael Wilde wrote: > Ah, I see: in wrapper.sh the "info" function which logs the env etc is > only invoked on failure, it seems. Unless you see an error in my > sites.xml below, I'll add extra logging to wrapper.sh to see what my env > is. Or test with printenv as the app. > > On 3/16/09 8:35 AM, Michael Wilde wrote: >> Im still unable to get this to work. >> >> For the test case of specifying in sites.xml I have: >> >> >> >> >> >> /home/wilde/swiftwork/ >> >> >> > url="http://172.16.8.122:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> >> >> >> >> /intrepid-fs0/users/wilde/persistent/oops/swift/work >> >> 8 >> 1000 >> /dev/shm >> >> >> >> and in the info file I see whats pasted below. >> >> I *thought* this info file is a full wrapper.sh log and includes a >> section listing all the environment vars, so I could verify if >> SWIFT_JOBDIR_PATH made it into the environment. Can you remind me how >> to turn that on? Cant find it in swift.properties, will check the >> wrapper.sh code. >> >> - Mike >> >> >> int$ cat ./info/1/runrama-1in2318j-info >> Progress 2009-03-16 08:24:53.%N-0500 LOG_START >> >> _____________________________________________________________________________ >> >> >> Wrapper >> _____________________________________________________________________________ >> >> >> Job directory mode is: link on shared filesystem >> DIR=jobs/1/runrama-1in2318j >> EXEC=/intrepid-fs0/users/wilde/persistent/oops/oops-r055/bin/runrama.sh >> STDIN= >> STDOUT=output/T1af7/0000_0001.log >> STDERR=stderr.txt >> DIRS=input/secseq^input/native^input/fasta^secseq/T1af7^input/rama^output/T1af7 >> >> INF=input/secseq/T1af7.secseq^input/native/T1af7.pdb^input/fasta/T1af7.fasta^secseq/T1af7/T1af7.0000.secseq^input/rama/T1af7.rama_index^input/rama/T1af7.rama_map >> >> OUTF=output/T1af7/0000_0001.log^output/T1af7/0000_0001.rmsd^output/T1af7/0000_0001.pdt >> >> KICKSTART= >> ARGS=input/fasta/T1af7.fasta secseq/T1af7/T1af7.0000.secseq >> input/native/T1af7.pdb input/rama/T1af7.rama_map >> output/T1af7/0000_0001.pdt output/T1af7/0000_0001.rmsd 1 KILL_TIME_=_1 >> ARGC=8 >> Progress 2009-03-16 08:24:54.%N-0500 CREATE_JOBDIR >> Created job directory: jobs/1/runrama-1in2318j >> >> --- >> >> >> >> On 3/16/09 8:03 AM, Michael Wilde wrote: >>> I discovered I misspelled the var name, and am retesting specifying >>> it from both tc.data and sites.xml. >>> >>> On 3/16/09 7:43 AM, Ben Clifford wrote: >>>> On Mon, 16 Mar 2009, Michael Wilde wrote: >>>> >>>>> But for the tests Im doing now this is not a problem. >>>> >>>> Please indicate what you discover anyway. >>>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Mon Mar 16 08:57:33 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:57:33 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> Message-ID: <49BE5ACD.8020009@mcs.anl.gov> On 3/16/09 8:48 AM, Ben Clifford wrote: > On Mon, 16 Mar 2009, Michael Wilde wrote: > >> Ah, I see: in wrapper.sh the "info" function which logs the env etc is only >> invoked on failure, it seems. Unless you see an error in my sites.xml below, >> I'll add extra logging to wrapper.sh to see what my env is. Or test with >> printenv as the app. > > Yes, using env/printenv should be easy to do. > > Are you using an unmodified Swift? Please can you send a log from running > something like first.swift, and its corresponding info file. The "only" mod I made to this swift (from trunk, rev# in log) is to change the arg separator from | to ^ in wrapper.sh and vdl-int.k, to be compatible with Falkon. Its possible that broke env passing. The log and info file for the pe.swift run, which is one job, is in: http://www.ci.uchicago.edu/~wilde/pe-20090316-0845-sqceiryc.log http://www.ci.uchicago.edu/~wilde/printenv-rx6v318j-info > Also, try with that exact same Swift installation, but running on > localhost using the local provider and specifying SWIFT_JOBDIR_PATH. will do. From wilde at mcs.anl.gov Mon Mar 16 08:59:04 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 08:59:04 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE5ACD.8020009@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> Message-ID: <49BE5B28.9040305@mcs.anl.gov> Its very possible that Falkon is loosing the environment. The local test will show if thats likely. On 3/16/09 8:57 AM, Michael Wilde wrote: > > > On 3/16/09 8:48 AM, Ben Clifford wrote: >> On Mon, 16 Mar 2009, Michael Wilde wrote: >> >>> Ah, I see: in wrapper.sh the "info" function which logs the env etc >>> is only >>> invoked on failure, it seems. Unless you see an error in my >>> sites.xml below, >>> I'll add extra logging to wrapper.sh to see what my env is. Or test >>> with >>> printenv as the app. >> >> Yes, using env/printenv should be easy to do. >> >> Are you using an unmodified Swift? Please can you send a log from >> running something like first.swift, and its corresponding info file. > > The "only" mod I made to this swift (from trunk, rev# in log) is to > change the arg separator from | to ^ in wrapper.sh and vdl-int.k, to be > compatible with Falkon. > > Its possible that broke env passing. > > The log and info file for the pe.swift run, which is one job, is in: > > http://www.ci.uchicago.edu/~wilde/pe-20090316-0845-sqceiryc.log > http://www.ci.uchicago.edu/~wilde/printenv-rx6v318j-info > >> Also, try with that exact same Swift installation, but running on >> localhost using the local provider and specifying SWIFT_JOBDIR_PATH. > > will do. > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From benc at hawaga.org.uk Mon Mar 16 09:14:38 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 14:14:38 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE5B28.9040305@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Michael Wilde wrote: > Its very possible that Falkon is loosing the environment. The local test will > show if thats likely. Far be it for me to make such an accusation. The test suite does test environment passing, in tests/misc/path-prefix.sh, at least as far as the local provider. You should be able to run tests/misc/path-prefix.sh against your local provider; and if you run a falkon worker on the same system as you submit from (so it ends up doing local execution through falkon) you should be able to run that test successfully too. -- From benc at hawaga.org.uk Mon Mar 16 09:29:22 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 14:29:22 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Ben Clifford wrote: > provider; and if you run a falkon worker on the same system as you submit > from (so it ends up doing local execution through falkon) you should be > able to run that test successfully too. Actually if you get that far, you should be able to run the entire test suite that way, by running ./run in the tests/ directory from SVN. Then go and drink coffee for an hour or something; but it would give much greater confidence in the integration of falkon and swift. -- From wilde at mcs.anl.gov Mon Mar 16 10:53:17 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 16 Mar 2009 10:53:17 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> Message-ID: <49BE75ED.5040404@mcs.anl.gov> Testing locally on the bgp, I see that env passing works, and wrapper.sh picks up SWIFT_JOBDIR_PATH and does the right thing. So the problem seems to be somewhere in the Falkon path. We'll investigate, and can for the right jobdir handling in the meantime. A side note: when I set an env profile in tc.data, my (localhost) job gets a very limited environment: SWIFT_JOBDIR_PATH=/dev/shm PWD=/dev/shm/h/printenv-hzj1818j SHLVL=1 OLDPWD=/var/tmp/pe-20090316-1029-cgkuddnd _=/usr/bin/printenv while with no env profile in the tc.data entry, my job gets the full environment from the shell in which I ran swift. I'm not going to look into that unless it becomes an issue, but its curious. On 3/16/09 9:14 AM, Ben Clifford wrote: > On Mon, 16 Mar 2009, Michael Wilde wrote: > >> Its very possible that Falkon is loosing the environment. The local test will >> show if thats likely. > > Far be it for me to make such an accusation. > > The test suite does test environment passing, in > tests/misc/path-prefix.sh, at least as far as the local provider. > > You should be able to run tests/misc/path-prefix.sh against your local > provider; and if you run a falkon worker on the same system as you submit > from (so it ends up doing local execution through falkon) you should be > able to run that test successfully too. > From hategan at mcs.anl.gov Mon Mar 16 11:08:36 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 16 Mar 2009 11:08:36 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE75ED.5040404@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> <49BE75ED.5040404@mcs.anl.gov> Message-ID: <1237219716.4248.11.camel@localhost> On Mon, 2009-03-16 at 10:53 -0500, Michael Wilde wrote: > Testing locally on the bgp, I see that env passing works, and wrapper.sh > picks up SWIFT_JOBDIR_PATH and does the right thing. So the problem > seems to be somewhere in the Falkon path. We'll investigate, and can for > the right jobdir handling in the meantime. > > A side note: when I set an env profile in tc.data, my (localhost) job > gets a very limited environment: > > SWIFT_JOBDIR_PATH=/dev/shm > PWD=/dev/shm/h/printenv-hzj1818j > SHLVL=1 > OLDPWD=/var/tmp/pe-20090316-1029-cgkuddnd > _=/usr/bin/printenv > > while with no env profile in the tc.data entry, my job gets the full > environment from the shell in which I ran swift. > > I'm not going to look into that unless it becomes an issue, but its curious. http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Runtime.html#exec(java.lang.String[],%20java.lang.String[]) It seems that there is no (obvious) way to add variables to the environment from java 1.4. You can either let the child process inherit the full environment or specify the exact environment. In Java5, there is a better scheme: ProcessBuilder. So I'm contemplating the idea of dropping the Java 1.4 restriction and moving forward. From benc at hawaga.org.uk Mon Mar 16 13:21:05 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 18:21:05 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <1237219716.4248.11.camel@localhost> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> <49BE75ED.5040404@mcs.anl.gov> <1237219716.4248.11.camel@localhost> Message-ID: On Mon, 16 Mar 2009, Mihael Hategan wrote: > It seems that there is no (obvious) way to add variables to the > environment from java 1.4. You can either let the child process inherit > the full environment or specify the exact environment. > > In Java5, there is a better scheme: ProcessBuilder. > > So I'm contemplating the idea of dropping the Java 1.4 restriction and > moving forward. The same has bugged me in the past (enough that I have a patch in a local checkout that uses 1.5 stuff for that). I guess 1.5 is pretty widespread on machines that people are likely to want to run locally on. -- From benc at hawaga.org.uk Mon Mar 16 13:27:02 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Mar 2009 18:27:02 +0000 (GMT) Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: <49BE75ED.5040404@mcs.anl.gov> References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> <49BE75ED.5040404@mcs.anl.gov> Message-ID: On Mon, 16 Mar 2009, Michael Wilde wrote: > Testing locally on the bgp, I see that env passing works, and wrapper.sh picks > up SWIFT_JOBDIR_PATH and does the right thing. So the problem seems to be > somewhere in the Falkon path. We'll investigate, and can for the right jobdir > handling in the meantime. Try out the whole test suite while you are debugging that - if you find other problems in there, you'll be less frustrated than when you find them in your application later. -- From hategan at mcs.anl.gov Mon Mar 16 13:42:42 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 16 Mar 2009 13:42:42 -0500 Subject: [Swift-user] Does SWIFT_JOBDIR_PATH work for output? In-Reply-To: References: <49BDD932.9000800@mcs.anl.gov> <49BE3CEF.6000703@mcs.anl.gov> <49BE3FA9.8030106@mcs.anl.gov> <49BE4E34.2010501@mcs.anl.gov> <49BE55A3.6070702@mcs.anl.gov> <49BE56CB.5090802@mcs.anl.gov> <49BE5ACD.8020009@mcs.anl.gov> <49BE5B28.9040305@mcs.anl.gov> <49BE75ED.5040404@mcs.anl.gov> <1237219716.4248.11.camel@localhost> Message-ID: <1237228962.8617.0.camel@localhost> On Mon, 2009-03-16 at 18:21 +0000, Ben Clifford wrote: > On Mon, 16 Mar 2009, Mihael Hategan wrote: > > > It seems that there is no (obvious) way to add variables to the > > environment from java 1.4. You can either let the child process inherit > > the full environment or specify the exact environment. > > > > In Java5, there is a better scheme: ProcessBuilder. > > > > So I'm contemplating the idea of dropping the Java 1.4 restriction and > > moving forward. > > The same has bugged me in the past (enough that I have a patch in a local > checkout that uses 1.5 stuff for that). > > I guess 1.5 is pretty widespread on machines that people are likely to > want to run locally on. Alternatively, there could be this "smart" provider that uses ProcessBuilder on >1.5 and the exitsing scheme on 1.4. From skenny at uchicago.edu Mon Mar 16 21:06:29 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Mon, 16 Mar 2009 21:06:29 -0500 (CDT) Subject: [Swift-user] Re: workflow in science Message-ID: <20090316210629.BUD97852@m4500-02.uchicago.edu> hey uri, i thought i'd post this to the swift list to see if others are familiar w/these...both are new to me, though the microsoft tools seem to be filling in the gap that SIDGrid had been aiming for as far as provenance/metadata, particularly http://research.microsoft.com/en-us/projects/trident/ ~sarah ---- Original message ---- >Date: Thu, 5 Mar 2009 19:42:25 +0100 >From: Uri Hasson >Subject: workflow in science >To: Sarah Kenny > > Hi Sarah, > > Hope all is well! I have a quick question if you > have a sec.. > > What do you think of Ninf-G http://ninf.apgrid.org/ > and this system (link below) as science workflows? > > http://research.microsoft.com/en-us/collaboration/tools/workflows.aspx > > Have you thought about them with respect to? SWIFT? > From foster at anl.gov Mon Mar 16 21:08:19 2009 From: foster at anl.gov (Ian Foster) Date: Mon, 16 Mar 2009 21:08:19 -0500 Subject: [Swift-user] Re: workflow in science In-Reply-To: <20090316210629.BUD97852@m4500-02.uchicago.edu> References: <20090316210629.BUD97852@m4500-02.uchicago.edu> Message-ID: <0CE271A3-028F-4FE3-A59F-8013E10AEAEA@anl.gov> Last I looked, Ninf-G was just a remote procedure call tool. On Mar 16, 2009, at 9:06 PM, wrote: > hey uri, i thought i'd post this to the swift list to see if > others are familiar w/these...both are new to me, though the > microsoft tools seem to be filling in the gap that SIDGrid had > been aiming for as far as provenance/metadata, particularly > http://research.microsoft.com/en-us/projects/trident/ > > ~sarah > > ---- Original message ---- >> Date: Thu, 5 Mar 2009 19:42:25 +0100 >> From: Uri Hasson >> Subject: workflow in science >> To: Sarah Kenny >> >> Hi Sarah, >> >> Hope all is well! I have a quick question if you >> have a sec.. >> >> What do you think of Ninf-G http://ninf.apgrid.org/ >> and this system (link below) as science workflows? >> >> > http://research.microsoft.com/en-us/collaboration/tools/workflows.aspx >> >> Have you thought about them with respect to SWIFT? >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From yuechen at bsd.uchicago.edu Wed Mar 18 18:32:46 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Wed, 18 Mar 2009 18:32:46 -0500 Subject: [Swift-user] swift execution problem Message-ID: Hi, I'm new to Swift programming. I was able to run a swift script before, but I couldn't run it now. I'm wondering if someone can help me figure out why. The swift script, sites.xml, tc.data, and all the error messages are copied in this email. Thank you! Regards, Chen, Yue ********************* Swift script ********************* type Fasta {} type PTMapOut {} type Solution {} type Inputfile {} app (PTMapOut ofile) PTMap (Solution sfile, Fasta fastafile, Inputfile input, Inputfile parameter) { PTMap @filename(sfile) @filename(fastafile) @filename(input) @filename(parameter) stdout=@filename(ofile ); } Fasta texts[] ; doall(Fasta texts[]) { Solution sfile <"BSASolution.mzXML">; Inputfile input <"inputs.txt">; Inputfile parameter <"parameters.txt">; foreach p in texts { PTMapOut r , match="fasta(.*)", transform="\\1.out " >; r = PTMap(sfile, p, input, parameter); } } // Main doall(texts); ************** sites.xml ************** /var/tmp 0 ************** tc.data ************** localhost echo /bin/echo INSTALLED INTEL32::LINUX null localhost cat /bin/cat INSTALLED INTEL32::LINUX null localhost ls /bin/ls INSTALLED INTEL32::LINUX null localhost grep /bin/grep INSTALLED INTEL32::LINUX null localhost sort /bin/sort INSTALLED INTEL32::LINUX null localhost paste /bin/paste INSTALLED INTEL32::LINUX null localhost PTMap /home/yuechen/PTMap/PTMap INSTALLED INTEL32::LINUX null ************** Error messages ************** [yuechen at communicado PTMap]$ swift PTMap.swift Execution failed: java.lang.NullPointerException at org.globus.cog.abstraction.impl.common.task.ServiceImpl.toString(ServiceImpl.java:156) at java.lang.String.valueOf(String.java:2577) at java.lang.StringBuffer.append(StringBuffer.java:220) at org.globus.cog.karajan.workflow.nodes.grid.GridNode.function(GridNode.java:31) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.notificationEvent(ExecuteFile.java:163) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:45) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.childCompleted(UserDefinedElement.java:283) at org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:85) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.If.childCompleted(If.java:30) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajboyce at jacks.sdstate.edu Thu Mar 19 02:42:25 2009 From: ajboyce at jacks.sdstate.edu (Andrew Boyce) Date: Thu, 19 Mar 2009 02:42:25 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? Message-ID: Hello, I am currently running Swift in conjunction with the PBS scheduler. My annoyance at the moment is this: When running any script, even a simple script such as first.swift (which normally finishes almost instantaneously), Swift always takes precisely five minutes to tell me that my job Finished successfully and copy the files back to the appropriate folder. It is always almost exactly five minutes; I've checked many logs - it polls the scheduler for five minutes. When I run a script (like first.swift) without using the PBS scheduler, everything happens as normal; execution and "Finished successfully" are nearly immediate. I think I know what the problem is: even after the scheduler says that the job is 'completed,' (which is generally right away) the scheduler keeps the job up on qstat and such for 5 minutes after (this setting is a PBS server attribute known as 'keep_completed', and I have checked that it is indeed set to 300 seconds; unfortunately I don't have permissions to change it). So when Swift polls the scheduler, the job is still up on qstat, and Swift must think that the task has not yet "Finished successfully." My question is this: Am I indeed right that Swift does not "understand" that when the PBS scheduler says a job is 'completed', the job really has "Finished successfully"? Can this be changed so that Swift does "understand" that a 'completed' job has "Finished successfully"? I have not included any files because I think I have narrowed the problem down to a question that does not require those that I would usually provide, but if I am wrong, then I can provide. Thank you and sorry for the length. Regards, Andrew Boyce From wilde at mcs.anl.gov Thu Mar 19 07:26:19 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 07:26:19 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: Message-ID: <49C239EB.9010008@mcs.anl.gov> The Swift developers will need to look into this issue, which seems to be with the Karajan PBS provider. I dont think we see this delay on our local PBS cluster here. In the meantime, you might want to try the fairly new "coaster" provider: http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters This starts "worker" jobs in the target cluster which stay up for the duration of a script, into which Swift sends jobs directly without involving the scheduler. If your scheduler has this 5-minute "linger" setting, the overall script will still wait at the end (I think), but all the jobs in the script should finish very quickly. If you're interested, preliminary notes on the design of coasters is at: http://wiki.cogkit.org/wiki/Coasters A few cautions: - coasters are a new feature, code is changing rapidly, and they are not yet suffciently tested. - we'd welcome your help in evaluating them - you'll want to run from a source release - I think they are a good base for a lot of interesting projects and studies on scheduling and resource allocation algorithms and approaches. I tested a simple 10-echo foreach loop with this sites.xml file on our local pbs cluster: -- /home/wilde/swiftwork -- which gave: tp$ swift hellos.swift -sites.file sites.xml -tc.file tc.data Swift svn swift-r2701 cog-r2332 RunID: 20090319-0658-3ejpl9xc Progress: Progress: Submitting:9 Submitted:1 Progress: Submitted:9 Active:1 Progress: Submitted:4 Active:3 Stage out:1 Finished successfully:2 Final status: Finished successfully:10 Cleaning up... Shutting down service at https://128.135.125.117:50002 Got channel MetaChannel: 101224864 -> GSSSChannel-null(1) - Done -- On 3/19/09 2:42 AM, Andrew Boyce wrote: > Hello, > > I am currently running Swift in conjunction with the PBS scheduler. My > annoyance at the moment is this: > > When running any script, even a simple script such as first.swift (which > normally finishes almost instantaneously), Swift always takes precisely > five minutes to tell me that my job Finished successfully and copy the > files back to the appropriate folder. It is always almost exactly five > minutes; I've checked many logs - it polls the scheduler for five > minutes. When I run a script (like first.swift) without using the PBS > scheduler, everything happens as normal; execution and "Finished > successfully" are nearly immediate. > > I think I know what the problem is: even after the scheduler says that > the job is 'completed,' (which is generally right away) the scheduler > keeps the job up on qstat and such for 5 minutes after (this setting is > a PBS server attribute known as 'keep_completed', and I have checked > that it is indeed set to 300 seconds; unfortunately I don't have > permissions to change it). So when Swift polls the scheduler, the job is > still up on qstat, and Swift must think that the task has not yet > "Finished successfully." > > My question is this: > Am I indeed right that Swift does not "understand" that when the PBS > scheduler says a job is 'completed', the job really has "Finished > successfully"? > Can this be changed so that Swift does "understand" that a 'completed' > job has "Finished successfully"? > > I have not included any files because I think I have narrowed the > problem down to a question that does not require those that I would > usually provide, but if I am wrong, then I can provide. > > Thank you and sorry for the length. > > Regards, > > Andrew Boyce > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Thu Mar 19 07:35:56 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 07:35:56 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <49C239EB.9010008@mcs.anl.gov> References: <49C239EB.9010008@mcs.anl.gov> Message-ID: <49C23C2C.1050507@mcs.anl.gov> Sorry, one clarification: > - you'll want to run from a source release By this I meant that the coaster code is changing frequently due to ongoing testing. So you'll typically want to run the latest svn revision. Checkout and build is simple and described under "Building Swift" at: http://www.ci.uchicago.edu/swift/downloads/index.php On 3/19/09 7:26 AM, Michael Wilde wrote: > The Swift developers will need to look into this issue, which seems to > be with the Karajan PBS provider. I dont think we see this delay on our > local PBS cluster here. > > In the meantime, you might want to try the fairly new "coaster" provider: > > http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters > > This starts "worker" jobs in the target cluster which stay up for the > duration of a script, into which Swift sends jobs directly without > involving the scheduler. > > If your scheduler has this 5-minute "linger" setting, the overall script > will still wait at the end (I think), but all the jobs in the script > should finish very quickly. > > If you're interested, preliminary notes on the design of coasters is at: > http://wiki.cogkit.org/wiki/Coasters > > A few cautions: > > - coasters are a new feature, code is changing rapidly, and they are not > yet suffciently tested. > > - we'd welcome your help in evaluating them > > - you'll want to run from a source release > > - I think they are a good base for a lot of interesting projects and > studies on scheduling and resource allocation algorithms and approaches. > > I tested a simple 10-echo foreach loop with this sites.xml file on our > local pbs cluster: > > -- > > > > > > /home/wilde/swiftwork > > > > -- which gave: > > tp$ swift hellos.swift -sites.file sites.xml -tc.file tc.data > Swift svn swift-r2701 cog-r2332 > > RunID: 20090319-0658-3ejpl9xc > Progress: > Progress: Submitting:9 Submitted:1 > Progress: Submitted:9 Active:1 > Progress: Submitted:4 Active:3 Stage out:1 Finished successfully:2 > Final status: Finished successfully:10 > Cleaning up... > Shutting down service at https://128.135.125.117:50002 > Got channel MetaChannel: 101224864 -> GSSSChannel-null(1) > - Done > > -- > > > On 3/19/09 2:42 AM, Andrew Boyce wrote: >> Hello, >> >> I am currently running Swift in conjunction with the PBS scheduler. My >> annoyance at the moment is this: >> >> When running any script, even a simple script such as first.swift >> (which normally finishes almost instantaneously), Swift always takes >> precisely five minutes to tell me that my job Finished successfully >> and copy the files back to the appropriate folder. It is always almost >> exactly five minutes; I've checked many logs - it polls the scheduler >> for five minutes. When I run a script (like first.swift) without using >> the PBS scheduler, everything happens as normal; execution and >> "Finished successfully" are nearly immediate. >> >> I think I know what the problem is: even after the scheduler says that >> the job is 'completed,' (which is generally right away) the scheduler >> keeps the job up on qstat and such for 5 minutes after (this setting >> is a PBS server attribute known as 'keep_completed', and I have >> checked that it is indeed set to 300 seconds; unfortunately I don't >> have permissions to change it). So when Swift polls the scheduler, the >> job is still up on qstat, and Swift must think that the task has not >> yet "Finished successfully." >> >> My question is this: >> Am I indeed right that Swift does not "understand" that when the PBS >> scheduler says a job is 'completed', the job really has "Finished >> successfully"? >> Can this be changed so that Swift does "understand" that a 'completed' >> job has "Finished successfully"? >> >> I have not included any files because I think I have narrowed the >> problem down to a question that does not require those that I would >> usually provide, but if I am wrong, then I can provide. >> >> Thank you and sorry for the length. >> >> Regards, >> >> Andrew Boyce >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Thu Mar 19 08:27:02 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 08:27:02 -0500 Subject: [Swift-user] swift execution problem In-Reply-To: References: Message-ID: <49C24826.4090302@mcs.anl.gov> Yue, what version of Swift are you using? Please send the first few lines of your output file, where it says something like: Swift svn swift-r2701 cog-r2332 RunID: 20090319-0820-19zttiq9 (in fact please send the whole output file, stdout/err) Ive tried to run your code in a near-identical test and I cant reproduce the failure. Ive tried with both swift0.8 and the latest svn rev, and both seem to work. Also please can you post the pathname of the directory in which you are testing (I assume you are running this on a CI machine?) so that I can look at your logfile? And make it publicly accessible? Thanks, - Mike On 3/18/09 6:32 PM, Yue, Chen - BMD wrote: > Hi, > > I'm new to Swift programming. I was able to run a swift script before, > but I couldn't run it now. I'm wondering if someone can help me figure > out why. The swift script, sites.xml, tc.data, and all the error > messages are copied in this email. Thank you! > > Regards, > > Chen, Yue > > ********************* > Swift script > ********************* > type Fasta {} > type PTMapOut {} > type Solution {} > type Inputfile {} > app (PTMapOut ofile) PTMap (Solution sfile, Fasta fastafile, Inputfile > input, Inputfile parameter) > { > PTMap @filename(sfile) @filename(fastafile) @filename(input) > @filename(parameter) stdout=@filename(ofile > ); > } > Fasta texts[] ; > > doall(Fasta texts[]) > { > Solution sfile <"BSASolution.mzXML">; > Inputfile input <"inputs.txt">; > Inputfile parameter <"parameters.txt">; > foreach p in texts { > PTMapOut r source=@p , > match="fasta(.*)", > transform="\\1.out " > >; > r = PTMap(sfile, p, input, parameter); > } > } > // Main > doall(texts); > ************** > sites.xml > ************** > > > > /var/tmp > 0 > > ************** > tc.data > ************** > localhost echo /bin/echo INSTALLED > INTEL32::LINUX null > localhost cat /bin/cat INSTALLED > INTEL32::LINUX null > localhost ls /bin/ls INSTALLED > INTEL32::LINUX null > localhost grep /bin/grep INSTALLED > INTEL32::LINUX null > localhost sort /bin/sort INSTALLED > INTEL32::LINUX null > localhost paste /bin/paste INSTALLED > INTEL32::LINUX null > localhost PTMap /home/yuechen/PTMap/PTMap INSTALLED > INTEL32::LINUX null > ************** > Error messages > ************** > [yuechen at communicado PTMap]$ swift PTMap.swift > Execution failed: > java.lang.NullPointerException > at > org.globus.cog.abstraction.impl.common.task.ServiceImpl.toString(ServiceImpl.java:156) > at java.lang.String.valueOf(String.java:2577) > at java.lang.StringBuffer.append(StringBuffer.java:220) > at > org.globus.cog.karajan.workflow.nodes.grid.GridNode.function(GridNode.java:31) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.notificationEvent(ExecuteFile.java:163) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) > at > org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:45) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.childCompleted(UserDefinedElement.java:283) > at > org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:85) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.If.childCompleted(If.java:30) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > This email is intended only for the use of the individual or entity to > which it is addressed and may contain information that is privileged and > confidential. If the reader of this email message is not the intended > recipient, you are hereby notified that any dissemination, distribution, > or copying of this communication is prohibited. If you have received > this email in error, please notify the sender and destroy/delete all > copies of the transmittal. Thank you. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Thu Mar 19 09:17:43 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 09:17:43 -0500 Subject: [Swift-user] swift execution problem In-Reply-To: <49C24826.4090302@mcs.anl.gov> References: <49C24826.4090302@mcs.anl.gov> Message-ID: <49C25407.7010000@mcs.anl.gov> Yue, I found the log for the failing run in your working dir. I'll post it to the swift-devel list for the developers to look at. The problem seems related to your specific data (your 375 fasta files) but even when I copy your data I cant repro the problem, at least not using "echo" instead of "ptmap". - Mike On 3/19/09 8:27 AM, Michael Wilde wrote: > Yue, what version of Swift are you using? > > Please send the first few lines of your output file, where it says > something like: > > Swift svn swift-r2701 cog-r2332 > > RunID: 20090319-0820-19zttiq9 > > (in fact please send the whole output file, stdout/err) > > Ive tried to run your code in a near-identical test and I cant reproduce > the failure. Ive tried with both swift0.8 and the latest svn rev, and > both seem to work. > > Also please can you post the pathname of the directory in which you are > testing (I assume you are running this on a CI machine?) so that I can > look at your logfile? And make it publicly accessible? > > Thanks, > > - Mike > > > On 3/18/09 6:32 PM, Yue, Chen - BMD wrote: >> Hi, >> >> I'm new to Swift programming. I was able to run a swift script before, >> but I couldn't run it now. I'm wondering if someone can help me figure >> out why. The swift script, sites.xml, tc.data, and all the error >> messages are copied in this email. Thank you! >> >> Regards, >> >> Chen, Yue >> >> ********************* >> Swift script >> ********************* >> type Fasta {} >> type PTMapOut {} >> type Solution {} >> type Inputfile {} >> app (PTMapOut ofile) PTMap (Solution sfile, Fasta fastafile, Inputfile >> input, Inputfile parameter) >> { >> PTMap @filename(sfile) @filename(fastafile) @filename(input) >> @filename(parameter) stdout=@filename(ofile >> ); >> } >> Fasta texts[] ; >> >> doall(Fasta texts[]) >> { >> Solution sfile <"BSASolution.mzXML">; >> Inputfile input <"inputs.txt">; >> Inputfile parameter <"parameters.txt">; >> foreach p in texts { >> PTMapOut r > source=@p , >> match="fasta(.*)", >> transform="\\1.out " >> >; >> r = PTMap(sfile, p, input, parameter); >> } >> } >> // Main >> doall(texts); >> ************** >> sites.xml >> ************** >> >> >> >> /var/tmp >> 0 >> >> ************** >> tc.data >> ************** >> localhost echo /bin/echo INSTALLED >> INTEL32::LINUX null >> localhost cat /bin/cat INSTALLED >> INTEL32::LINUX null >> localhost ls /bin/ls INSTALLED >> INTEL32::LINUX null >> localhost grep /bin/grep INSTALLED >> INTEL32::LINUX null >> localhost sort /bin/sort INSTALLED >> INTEL32::LINUX null >> localhost paste /bin/paste INSTALLED >> INTEL32::LINUX null >> localhost PTMap /home/yuechen/PTMap/PTMap >> INSTALLED INTEL32::LINUX null >> ************** >> Error messages >> ************** >> [yuechen at communicado PTMap]$ swift PTMap.swift >> Execution failed: >> java.lang.NullPointerException >> at >> org.globus.cog.abstraction.impl.common.task.ServiceImpl.toString(ServiceImpl.java:156) >> >> at java.lang.String.valueOf(String.java:2577) >> at java.lang.StringBuffer.append(StringBuffer.java:220) >> at >> org.globus.cog.karajan.workflow.nodes.grid.GridNode.function(GridNode.java:31) >> >> at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) >> >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.ExecuteFile.notificationEvent(ExecuteFile.java:163) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:45) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> >> at >> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.childCompleted(UserDefinedElement.java:283) >> >> at >> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:85) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.If.childCompleted(If.java:30) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> >> at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >> >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> >> at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >> >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> >> at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) >> >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) >> at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> >> at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> >> >> >> >> >> >> This email is intended only for the use of the individual or entity to >> which it is addressed and may contain information that is privileged >> and confidential. If the reader of this email message is not the >> intended recipient, you are hereby notified that any dissemination, >> distribution, or copying of this communication is prohibited. If you >> have received this email in error, please notify the sender and >> destroy/delete all copies of the transmittal. Thank you. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From benc at hawaga.org.uk Thu Mar 19 09:19:08 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Mar 2009 14:19:08 +0000 (GMT) Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: Message-ID: Is that with the cog pbs provider (with in your sites file) or through GRAM (with or in the sites file?) -- From benc at hawaga.org.uk Thu Mar 19 10:02:42 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Mar 2009 15:02:42 +0000 (GMT) Subject: [Swift-user] swift execution problem In-Reply-To: References: Message-ID: It looks like this comes from having a malformed sites.xml file - you need to have your single element inside a element (see the example in the Swift release). I am able to get the same error as you here by removing . This is a terrible error message, so I will put in a bug report for it to be made more meaningful. -- From yuechen at bsd.uchicago.edu Thu Mar 19 10:07:59 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Thu, 19 Mar 2009 10:07:59 -0500 Subject: [Swift-user] swift execution problem References: Message-ID: Hi Ben, Thank you very much! I guess I accidentally deleted tag. Now it is working for me. Regards, Chen, Yue ________________________________ From: Ben Clifford [mailto:benc at hawaga.org.uk] Sent: Thu 3/19/2009 10:02 AM To: Yue, Chen - BMD Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] swift execution problem It looks like this comes from having a malformed sites.xml file - you need to have your single element inside a element (see the example in the Swift release). I am able to get the same error as you here by removing . This is a terrible error message, so I will put in a bug report for it to be made more meaningful. -- This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajboyce at jacks.sdstate.edu Thu Mar 19 10:58:57 2009 From: ajboyce at jacks.sdstate.edu (Andrew Boyce) Date: Thu, 19 Mar 2009 10:58:57 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: Message-ID: It is with the cog pbs provider And just to clarify for sure, the job is finished right away - the output files are ready; but Swift says it is still submitted or active, and doesn't do all of the things it normally does when a job has "finished successfully" (like copy the output files back into the folder with the input files, or delete the temporary work directory, for example). Thank you all for your assistance. Regards, Andrew From fedorov at cs.wm.edu Thu Mar 19 13:55:21 2009 From: fedorov at cs.wm.edu (Andriy Fedorov) Date: Thu, 19 Mar 2009 14:55:21 -0400 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? Message-ID: <82f536810903191155k114d2cecu9035627630304c64@mail.gmail.com> Andrew, I had the same problem with some of the TeraGrid sites. You might want to see the relevant discussion here: http://mail.ci.uchicago.edu/pipermail/swift-user/2008-June/000440.html Andrey Fedorov > Date: Thu, 19 Mar 2009 02:42:25 -0500 > From: Andrew Boyce > Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? > To: swift-user at ci.uchicago.edu > Message-ID: > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Hello, > > I am currently running Swift in conjunction with the PBS scheduler. My > annoyance at the moment is this: > > When running any script, even a simple script such as first.swift > (which normally finishes almost instantaneously), Swift always takes > precisely five minutes to tell me that my job Finished successfully > and copy the files back to the appropriate folder. It is always almost > exactly five minutes; I've checked many logs - it polls the scheduler > for five minutes. When I run a script (like first.swift) without using > the PBS scheduler, everything happens as normal; execution and > "Finished successfully" are nearly immediate. > > I think I know what the problem is: even after the scheduler says that > the job is 'completed,' (which is generally right away) the scheduler > keeps the job up on qstat and such for 5 minutes after (this setting > is a PBS server attribute known as 'keep_completed', and I have > checked that it is indeed set to 300 seconds; unfortunately I don't > have permissions to change it). So when Swift polls the scheduler, the > job is still up on qstat, and Swift must think that the task has not > yet "Finished successfully." > > My question is this: > Am I indeed right that Swift does not "understand" that when the PBS > scheduler says a job is 'completed', the job really has "Finished > successfully"? > Can this be changed so that Swift does "understand" that a 'completed' > job has "Finished successfully"? > > I have not included any files because I think I have narrowed the > problem down to a question that does not require those that I would > usually provide, but if I am wrong, then I can provide. > > Thank you and sorry for the length. > > Regards, > > Andrew Boyce > > > ------------------------------ > > Message: 2 > Date: Thu, 19 Mar 2009 07:26:19 -0500 > From: Michael Wilde > Subject: Re: [Swift-user] Swift/PBS Scheduler Slow to Report > ? ? ? ?"Finished"? > To: Andrew Boyce > Cc: swift-user at ci.uchicago.edu > Message-ID: <49C239EB.9010008 at mcs.anl.gov> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > The Swift developers will need to look into this issue, which seems to > be with the Karajan PBS provider. I dont think we see this delay on our > local PBS cluster here. > > In the meantime, you might want to try the fairly new "coaster" provider: > > http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters > > This starts "worker" jobs in the target cluster which stay up for the > duration of a script, into which Swift sends jobs directly without > involving the scheduler. > > If your scheduler has this 5-minute "linger" setting, the overall script > will still wait at the end (I think), but all the jobs in the script > should finish very quickly. > > If you're interested, preliminary notes on the design of coasters is at: > http://wiki.cogkit.org/wiki/Coasters > > A few cautions: > > - coasters are a new feature, code is changing rapidly, and they are not > yet suffciently tested. > > - we'd welcome your help in evaluating them > > - you'll want to run from a source release > > - I think they are a good base for a lot of interesting projects and > studies on scheduling and resource allocation algorithms and approaches. > > I tested a simple 10-echo foreach loop with this sites.xml file on our > local pbs cluster: > > -- > > > > ? > ? > ? /home/wilde/swiftwork > > > > -- which gave: > > tp$ swift hellos.swift -sites.file sites.xml -tc.file tc.data > Swift svn swift-r2701 cog-r2332 > > RunID: 20090319-0658-3ejpl9xc > Progress: > Progress: ?Submitting:9 Submitted:1 > Progress: ?Submitted:9 Active:1 > Progress: ?Submitted:4 Active:3 Stage out:1 Finished successfully:2 > Final status: ?Finished successfully:10 > Cleaning up... > Shutting down service at https://128.135.125.117:50002 > Got channel MetaChannel: 101224864 -> GSSSChannel-null(1) > - Done > > -- > > > On 3/19/09 2:42 AM, Andrew Boyce wrote: >> Hello, >> >> I am currently running Swift in conjunction with the PBS scheduler. My >> annoyance at the moment is this: >> >> When running any script, even a simple script such as first.swift (which >> normally finishes almost instantaneously), Swift always takes precisely >> five minutes to tell me that my job Finished successfully and copy the >> files back to the appropriate folder. It is always almost exactly five >> minutes; I've checked many logs - it polls the scheduler for five >> minutes. When I run a script (like first.swift) without using the PBS >> scheduler, everything happens as normal; execution and "Finished >> successfully" are nearly immediate. >> >> I think I know what the problem is: even after the scheduler says that >> the job is 'completed,' (which is generally right away) the scheduler >> keeps the job up on qstat and such for 5 minutes after (this setting is >> a PBS server attribute known as 'keep_completed', and I have checked >> that it is indeed set to 300 seconds; unfortunately I don't have >> permissions to change it). So when Swift polls the scheduler, the job is >> still up on qstat, and Swift must think that the task has not yet >> "Finished successfully." >> >> My question is this: >> Am I indeed right that Swift does not "understand" that when the PBS >> scheduler says a job is 'completed', the job really has "Finished >> successfully"? >> Can this be changed so that Swift does "understand" that a 'completed' >> job has "Finished successfully"? >> >> I have not included any files because I think I have narrowed the >> problem down to a question that does not require those that I would >> usually provide, but if I am wrong, then I can provide. >> >> Thank you and sorry for the length. >> >> Regards, >> >> Andrew Boyce >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------ > > Message: 3 > Date: Thu, 19 Mar 2009 07:35:56 -0500 > From: Michael Wilde > Subject: Re: [Swift-user] Swift/PBS Scheduler Slow to Report > ? ? ? ?"Finished"? > To: Andrew Boyce > Cc: swift-user at ci.uchicago.edu > Message-ID: <49C23C2C.1050507 at mcs.anl.gov> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Sorry, one clarification: > > ?> - you'll want to run from a source release > > By this I meant that the coaster code is changing frequently due to > ongoing testing. So you'll typically want to run the latest svn > revision. Checkout and build is simple and described under "Building > Swift" at: http://www.ci.uchicago.edu/swift/downloads/index.php > > > On 3/19/09 7:26 AM, Michael Wilde wrote: >> The Swift developers will need to look into this issue, which seems to >> be with the Karajan PBS provider. I dont think we see this delay on our >> local PBS cluster here. >> >> In the meantime, you might want to try the fairly new "coaster" provider: >> >> http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters >> >> This starts "worker" jobs in the target cluster which stay up for the >> duration of a script, into which Swift sends jobs directly without >> involving the scheduler. >> >> If your scheduler has this 5-minute "linger" setting, the overall script >> will still wait at the end (I think), but all the jobs in the script >> should finish very quickly. >> >> If you're interested, preliminary notes on the design of coasters is at: >> http://wiki.cogkit.org/wiki/Coasters >> >> A few cautions: >> >> - coasters are a new feature, code is changing rapidly, and they are not >> yet suffciently tested. >> >> - we'd welcome your help in evaluating them >> >> - you'll want to run from a source release >> >> - I think they are a good base for a lot of interesting projects and >> studies on scheduling and resource allocation algorithms and approaches. >> >> I tested a simple 10-echo foreach loop with this sites.xml file on our >> local pbs cluster: >> >> -- >> >> >> >> ? >> ? >> ? /home/wilde/swiftwork >> >> >> >> -- which gave: >> >> tp$ swift hellos.swift -sites.file sites.xml -tc.file tc.data >> Swift svn swift-r2701 cog-r2332 >> >> RunID: 20090319-0658-3ejpl9xc >> Progress: >> Progress: ?Submitting:9 Submitted:1 >> Progress: ?Submitted:9 Active:1 >> Progress: ?Submitted:4 Active:3 Stage out:1 Finished successfully:2 >> Final status: ?Finished successfully:10 >> Cleaning up... >> Shutting down service at https://128.135.125.117:50002 >> Got channel MetaChannel: 101224864 -> GSSSChannel-null(1) >> - Done >> >> -- >> >> >> On 3/19/09 2:42 AM, Andrew Boyce wrote: >>> Hello, >>> >>> I am currently running Swift in conjunction with the PBS scheduler. My >>> annoyance at the moment is this: >>> >>> When running any script, even a simple script such as first.swift >>> (which normally finishes almost instantaneously), Swift always takes >>> precisely five minutes to tell me that my job Finished successfully >>> and copy the files back to the appropriate folder. It is always almost >>> exactly five minutes; I've checked many logs - it polls the scheduler >>> for five minutes. When I run a script (like first.swift) without using >>> the PBS scheduler, everything happens as normal; execution and >>> "Finished successfully" are nearly immediate. >>> >>> I think I know what the problem is: even after the scheduler says that >>> the job is 'completed,' (which is generally right away) the scheduler >>> keeps the job up on qstat and such for 5 minutes after (this setting >>> is a PBS server attribute known as 'keep_completed', and I have >>> checked that it is indeed set to 300 seconds; unfortunately I don't >>> have permissions to change it). So when Swift polls the scheduler, the >>> job is still up on qstat, and Swift must think that the task has not >>> yet "Finished successfully." >>> >>> My question is this: >>> Am I indeed right that Swift does not "understand" that when the PBS >>> scheduler says a job is 'completed', the job really has "Finished >>> successfully"? >>> Can this be changed so that Swift does "understand" that a 'completed' >>> job has "Finished successfully"? >>> >>> I have not included any files because I think I have narrowed the >>> problem down to a question that does not require those that I would >>> usually provide, but if I am wrong, then I can provide. >>> >>> Thank you and sorry for the length. >>> >>> Regards, >>> >>> Andrew Boyce >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------ > > Message: 4 > Date: Thu, 19 Mar 2009 08:27:02 -0500 > From: Michael Wilde > Subject: Re: [Swift-user] swift execution problem > To: "Yue, Chen - BMD" > Cc: swift-user at ci.uchicago.edu > Message-ID: <49C24826.4090302 at mcs.anl.gov> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Yue, what version of Swift are you using? > > Please send the first few lines of your output file, where it says > something like: > > Swift svn swift-r2701 cog-r2332 > > RunID: 20090319-0820-19zttiq9 > > (in fact please send the whole output file, stdout/err) > > Ive tried to run your code in a near-identical test and I cant reproduce > the failure. Ive tried with both swift0.8 and the latest svn rev, and > both seem to work. > > Also please can you post the pathname of the directory in which you are > testing (I assume you are running this on a CI machine?) so that I can > look at your logfile? And make it publicly accessible? > > Thanks, > > - Mike > > > On 3/18/09 6:32 PM, Yue, Chen - BMD wrote: >> Hi, >> >> I'm new to Swift programming. I was able to run a swift script before, >> but I couldn't run it now. I'm wondering if someone can help me figure >> out why. The swift script, sites.xml, tc.data, and all the error >> messages are copied in this email. Thank you! >> >> Regards, >> >> Chen, Yue >> >> ********************* >> Swift script >> ********************* >> type Fasta {} >> type PTMapOut {} >> type Solution {} >> type Inputfile {} >> app (PTMapOut ofile) PTMap (Solution sfile, Fasta fastafile, Inputfile >> input, Inputfile parameter) >> { >> ? ?PTMap ?@filename(sfile) @filename(fastafile) @filename(input) >> @filename(parameter) stdout=@filename(ofile >> ); >> } >> Fasta texts[] ; >> >> doall(Fasta texts[]) >> { >> ? Solution sfile <"BSASolution.mzXML">; >> ? Inputfile input <"inputs.txt">; >> ? Inputfile parameter <"parameters.txt">; >> ? foreach p in texts { >> ? ? PTMapOut r > ? ? ? ? ? ? ?source=@p , >> ? ? ? ? ? ? ?match="fasta(.*)", >> ? ? ? ? ? ? ?transform="\\1.out " >> ? ? >; >> ? ?r = PTMap(sfile, p, input, parameter); >> ? } >> } >> // Main >> doall(texts); >> ************** >> sites.xml >> ************** >> >> ? >> ? >> ? /var/tmp >> ? 0 >> >> ************** >> tc.data >> ************** >> localhost ? ? ? echo ? ? ? ? ? ?/bin/echo ? ? ? INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? cat ? ? ? ? ? ? /bin/cat ? ? ? ?INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? ls ? ? ? ? ? ? ?/bin/ls ? ? ? ? INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? grep ? ? ? ? ? ?/bin/grep ? ? ? INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? sort ? ? ? ? ? ?/bin/sort ? ? ? INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? paste ? ? ? ? ? /bin/paste ? ? ?INSTALLED >> INTEL32::LINUX ?null >> localhost ? ? ? PTMap ? /home/yuechen/PTMap/PTMap ? ? ? INSTALLED >> INTEL32::LINUX ?null >> ************** >> Error messages >> ************** >> [yuechen at communicado PTMap]$ swift PTMap.swift >> Execution failed: >> ? ? ? ? java.lang.NullPointerException >> ? ? ? ? at >> org.globus.cog.abstraction.impl.common.task.ServiceImpl.toString(ServiceImpl.java:156) >> ? ? ? ? at java.lang.String.valueOf(String.java:2577) >> ? ? ? ? at java.lang.StringBuffer.append(StringBuffer.java:220) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.grid.GridNode.function(GridNode.java:31) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.ExecuteFile.notificationEvent(ExecuteFile.java:163) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:45) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.childCompleted(UserDefinedElement.java:283) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:85) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.If.childCompleted(If.java:30) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >> ? ? ? ? at >> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >> >> >> >> >> >> This email is intended only for the use of the individual or entity to >> which it is addressed and may contain information that is privileged and >> confidential. If the reader of this email message is not the intended >> recipient, you are hereby notified that any dissemination, distribution, >> or copying of this communication is prohibited. If you have received >> this email in error, please notify the sender and destroy/delete all >> copies of the transmittal. Thank you. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------ > > Message: 5 > Date: Thu, 19 Mar 2009 09:17:43 -0500 > From: Michael Wilde > Subject: Re: [Swift-user] swift execution problem > To: "Yue, Chen - BMD" > Cc: swift-user at ci.uchicago.edu > Message-ID: <49C25407.7010000 at mcs.anl.gov> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Yue, I found the log for the failing run in your working dir. > I'll post it to the swift-devel list for the developers to look at. > > The problem seems related to your specific data (your 375 fasta files) > but even when I copy your data I cant repro the problem, at least not > using "echo" instead of "ptmap". > > - Mike > > > On 3/19/09 8:27 AM, Michael Wilde wrote: >> Yue, what version of Swift are you using? >> >> Please send the first few lines of your output file, where it says >> something like: >> >> Swift svn swift-r2701 cog-r2332 >> >> RunID: 20090319-0820-19zttiq9 >> >> (in fact please send the whole output file, stdout/err) >> >> Ive tried to run your code in a near-identical test and I cant reproduce >> the failure. Ive tried with both swift0.8 and the latest svn rev, and >> both seem to work. >> >> Also please can you post the pathname of the directory in which you are >> testing (I assume you are running this on a CI machine?) so that I can >> look at your logfile? And make it publicly accessible? >> >> Thanks, >> >> - Mike >> >> >> On 3/18/09 6:32 PM, Yue, Chen - BMD wrote: >>> Hi, >>> >>> I'm new to Swift programming. I was able to run a swift script before, >>> but I couldn't run it now. I'm wondering if someone can help me figure >>> out why. The swift script, sites.xml, tc.data, and all the error >>> messages are copied in this email. Thank you! >>> >>> Regards, >>> >>> Chen, Yue >>> >>> ********************* >>> Swift script >>> ********************* >>> type Fasta {} >>> type PTMapOut {} >>> type Solution {} >>> type Inputfile {} >>> app (PTMapOut ofile) PTMap (Solution sfile, Fasta fastafile, Inputfile >>> input, Inputfile parameter) >>> { >>> ? ?PTMap ?@filename(sfile) @filename(fastafile) @filename(input) >>> @filename(parameter) stdout=@filename(ofile >>> ); >>> } >>> Fasta texts[] ; >>> >>> doall(Fasta texts[]) >>> { >>> ? Solution sfile <"BSASolution.mzXML">; >>> ? Inputfile input <"inputs.txt">; >>> ? Inputfile parameter <"parameters.txt">; >>> ? foreach p in texts { >>> ? ? PTMapOut r >> ? ? ? ? ? ? ?source=@p , >>> ? ? ? ? ? ? ?match="fasta(.*)", >>> ? ? ? ? ? ? ?transform="\\1.out " >>> ? ? >; >>> ? ?r = PTMap(sfile, p, input, parameter); >>> ? } >>> } >>> // Main >>> doall(texts); >>> ************** >>> sites.xml >>> ************** >>> >>> ? >>> ? >>> ? /var/tmp >>> ? 0 >>> >>> ************** >>> tc.data >>> ************** >>> localhost ? ? ? echo ? ? ? ? ? ?/bin/echo ? ? ? INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? cat ? ? ? ? ? ? /bin/cat ? ? ? ?INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? ls ? ? ? ? ? ? ?/bin/ls ? ? ? ? INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? grep ? ? ? ? ? ?/bin/grep ? ? ? INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? sort ? ? ? ? ? ?/bin/sort ? ? ? INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? paste ? ? ? ? ? /bin/paste ? ? ?INSTALLED >>> INTEL32::LINUX ?null >>> localhost ? ? ? PTMap ? /home/yuechen/PTMap/PTMap >>> INSTALLED ? ? ? INTEL32::LINUX ?null >>> ************** >>> Error messages >>> ************** >>> [yuechen at communicado PTMap]$ swift PTMap.swift >>> Execution failed: >>> ? ? ? ? java.lang.NullPointerException >>> ? ? ? ? at >>> org.globus.cog.abstraction.impl.common.task.ServiceImpl.toString(ServiceImpl.java:156) >>> >>> ? ? ? ? at java.lang.String.valueOf(String.java:2577) >>> ? ? ? ? at java.lang.StringBuffer.append(StringBuffer.java:220) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.grid.GridNode.function(GridNode.java:31) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.ExecuteFile.notificationEvent(ExecuteFile.java:163) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:45) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.childCompleted(UserDefinedElement.java:283) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:85) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.If.childCompleted(If.java:30) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) >>> >>> ? ? ? ? at >>> org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) >>> >>> >>> >>> >>> >>> >>> This email is intended only for the use of the individual or entity to >>> which it is addressed and may contain information that is privileged >>> and confidential. If the reader of this email message is not the >>> intended recipient, you are hereby notified that any dissemination, >>> distribution, or copying of this communication is prohibited. If you >>> have received this email in error, please notify the sender and >>> destroy/delete all copies of the transmittal. Thank you. >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > End of Swift-user Digest, Vol 24, Issue 6 > ***************************************** > From benc at hawaga.org.uk Thu Mar 19 14:10:29 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Mar 2009 19:10:29 +0000 (GMT) Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <82f536810903191155k114d2cecu9035627630304c64@mail.gmail.com> References: <82f536810903191155k114d2cecu9035627630304c64@mail.gmail.com> Message-ID: On Thu, 19 Mar 2009, Andriy Fedorov wrote: > I had the same problem with some of the TeraGrid sites. You might want > to see the relevant discussion here: > > http://mail.ci.uchicago.edu/pipermail/swift-user/2008-June/000440.html In that case, the problem was GRAM2's job manager (not a part of Swift) being slow to detect. In Andrew's case, its the client-side PBS provider showing a similar problem. This latter problem may be easier for us to fix and deploy. -- From yuechen at bsd.uchicago.edu Thu Mar 19 15:57:03 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Thu, 19 Mar 2009 15:57:03 -0500 Subject: [Swift-user] Teraport question Message-ID: Hi, I was testing PTMap swift script on Teraport and I have a question about the usage of sites.xml. I think maybe my configuration has some problem. The path to my swift script is at: /home/yuechen/PTMap/PTMap.swift. If I log onto Teraport and run the script on Teraport locally and the sites.xml is : /home/yuechen/swiftwork 0 and the tc.data is : localhost PTMap /home/yuechen/PTMap/PTMap INSTALLED INTEL32::LINUX null When I run PTMap.swift, the output of first ten runs looks like: Swift 0.8 swift-r2448 cog-r2261 RunID: 20090319-1527-1ad6vxja Progress: Progress: uninitialized:1 Progress: Selecting site:374 Initializing site shared directory:1 Progress: Selecting site:373 Stage in:1 Submitting:1 Progress: Selecting site:373 Active:1 Stage out:1 Progress: Selecting site:372 Stage in:1 Active:1 Finished successfully:1 Progress: Selecting site:372 Active:1 Stage out:1 Finished successfully:1 Progress: Selecting site:371 Active:1 Stage out:1 Finished successfully:2 Progress: Selecting site:370 Stage in:1 Active:1 Finished successfully:3 Progress: Selecting site:370 Active:1 Stage out:1 Finished successfully:3 Progress: Selecting site:369 Stage in:1 Active:1 Finished successfully:4 Progress: Selecting site:369 Active:1 Stage out:1 Finished successfully:4 Progress: Selecting site:368 Stage in:1 Active:1 Finished successfully:5 Progress: Selecting site:368 Active:1 Stage out:1 Finished successfully:5 Progress: Selecting site:367 Stage in:1 Active:1 Finished successfully:6 Progress: Selecting site:367 Active:1 Stage out:1 Finished successfully:6 Progress: Selecting site:366 Submitting:1 Stage out:1 Finished successfully:7 This output seems normal. But if I log on to Teraport and use the following configuration in sites.xml: /home/yuechen/swiftwork or /home/yuechen/swiftwork And tc.data is: teraport PTMap /home/yuechen/PTMap/PTMap INSTALLED INTEL32::LINUX null The output would look like: Swift 0.8 swift-r2448 cog-r2261 RunID: 20090319-1520-5ycz7ta9 Progress: Progress: uninitialized:1 Progress: Selecting site:374 Initializing site shared directory:1 Progress: Selecting site:373 Submitting:1 Submitted:1 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 Progress: Selecting site:373 Submitted:2 I feel that this output is not normal. Did I have any configuration problem? Thank you very much! Chen, Yue This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Thu Mar 19 16:07:58 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Mar 2009 21:07:58 +0000 (GMT) Subject: [Swift-user] Teraport question In-Reply-To: References: Message-ID: You can run the qstat command to see if your jobs are being queued, what state they are in, and which queue they went to. For example: 878629.tp-mgt null yuechen 0 Q extended shows a job submitted by you, that is in state Q (meaning queued) and that it is in the extended queue. There are a lot of other jobs running on teraport at the moment. You might have more luck using a different queue, such as 'fast' which you can specify by adding a line something like this: fast to your site definition. However, you'll likely still have to wait some time for your jobs to run - that is the nature of using a job queue... -- From wilde at mcs.anl.gov Thu Mar 19 16:34:56 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 16:34:56 -0500 Subject: [Swift-user] Teraport question In-Reply-To: References: Message-ID: <49C2BA80.2080402@mcs.anl.gov> The TeraPort queue policies are described at: http://www.ci.uchicago.edu/wiki/bin/view/Teraport/QueuePolicies which says the fast queue gives you up to an hour. -- Separate from that, further clarification of queue and time specs would be helpful. There was much discussion on the devel list about how times are treated in coaster scheduling. This left me confused as to whether time specs for coaster jobs were working as desired or still in flux. Specifically: 1 - it is unclear if profile time specs are all in hh:mm:ss. I think I reported to swift-devel a coaster case where they were not interpreted in that standard manner. 2 - it was unclear whether the time is for the pbs (i.e., coaster-worker) job or for the swift apps that run on them. Ie how does "coasterWorkerMaxwalltime" which is mentioned in the user guide interact with maxwalltime in the globus profile? 3 - is maxwalltime treated the same or different if specified in tc.data vs sites.xml, with respect to coasters? 4 - if I specify only a queue but no time limit, do I get the max for that queue? (would hope so...) 5 - is job time handling now working as the developers currently intend or are there outstanding issues in this area? On 3/19/09 4:07 PM, Ben Clifford wrote: > You can run the qstat command to see if your jobs are being queued, what > state they are in, and which queue they went to. For example: > > 878629.tp-mgt null yuechen 0 Q > extended > > shows a job submitted by you, that is in state Q (meaning queued) and that > it is in the extended queue. > > There are a lot of other jobs running on teraport at the moment. You might > have more luck using a different queue, such as 'fast' which you can > specify by adding a line something like this: > > fast > > to your site definition. > > However, you'll likely still have to wait some time for your jobs to run - > that is the nature of using a job queue... > From wilde at mcs.anl.gov Thu Mar 19 16:42:43 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 16:42:43 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <49C2BA80.2080402@mcs.anl.gov> References: <49C2BA80.2080402@mcs.anl.gov> Message-ID: <49C2BC53.60401@mcs.anl.gov> I may have asked too soon, as I just found this in the user guide: "coasterWorkerMaxwalltime specifies the maxwalltime to be used when submitting coaster workers. This profile entry is used by the coaster execution provider. If this entry is not specified, the coaster provider will compute a maxwalltime based on the maxwalltime of jobs submitted. (since Swift 0.9) " Which seems to leave a lot of room for interpretation, but is the following reasonable? If you can, just specify coasterWorkerMaxwalltime, and that will go to the resource manager scheduler in the absence of any other times. But, if you are running jobs on many sites, some of which specify coasters and some of which dont, then I take the above to mean that coasters will compute a max walltime base on whatever jobs it sees for a given site at the moment???? And how does it treat maxWalltime for an app (from tc.data) vs. a site? Can you provide some simple guidelines to make this whole issue of queues and times easy to understand and specify? On 3/19/09 4:34 PM, Michael Wilde wrote: > The TeraPort queue policies are described at: > http://www.ci.uchicago.edu/wiki/bin/view/Teraport/QueuePolicies > > which says the fast queue gives you up to an hour. > > -- > > Separate from that, further clarification of queue and time specs would > be helpful. > > There was much discussion on the devel list about how times are treated > in coaster scheduling. This left me confused as to whether time specs > for coaster jobs were working as desired or still in flux. Specifically: > > 1 - it is unclear if profile time specs are all in hh:mm:ss. I think I > reported to swift-devel a coaster case where they were not interpreted > in that standard manner. > > 2 - it was unclear whether the time is for the pbs (i.e., > coaster-worker) job or for the swift apps that run on them. Ie how does > "coasterWorkerMaxwalltime" which is mentioned in the user guide interact > with maxwalltime in the globus profile? > > 3 - is maxwalltime treated the same or different if specified in tc.data > vs sites.xml, with respect to coasters? > > 4 - if I specify only a queue but no time limit, do I get the max for > that queue? (would hope so...) > > 5 - is job time handling now working as the developers currently intend > or are there outstanding issues in this area? > > > On 3/19/09 4:07 PM, Ben Clifford wrote: >> You can run the qstat command to see if your jobs are being queued, >> what state they are in, and which queue they went to. For example: >> >> 878629.tp-mgt null yuechen 0 Q >> extended >> shows a job submitted by you, that is in state Q (meaning queued) and >> that it is in the extended queue. >> >> There are a lot of other jobs running on teraport at the moment. You >> might have more luck using a different queue, such as 'fast' which you >> can specify by adding a line something like this: >> >> fast >> >> to your site definition. >> >> However, you'll likely still have to wait some time for your jobs to >> run - that is the nature of using a job queue... >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Thu Mar 19 16:43:16 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 16:43:16 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: Message-ID: <1237498996.31133.2.camel@localhost> On Thu, 2009-03-19 at 10:58 -0500, Andrew Boyce wrote: > It is with the cog pbs provider > > And just to clarify for sure, the job is finished right away - the > output files are ready; but Swift says it is still submitted or > active, and doesn't do all of the things it normally does when a job > has "finished successfully" (like copy the output files back into the > folder with the input files, or delete the temporary work directory, > for example). The provider needs fixing to properly detect completed states instead of waiting for the job to be removed from the queue. From ajboyce at jacks.sdstate.edu Thu Mar 19 16:49:01 2009 From: ajboyce at jacks.sdstate.edu (Andrew Boyce) Date: Thu, 19 Mar 2009 16:49:01 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <1237498996.31133.2.camel@localhost> References: <1237498996.31133.2.camel@localhost> Message-ID: So I was right. OK. I suppose there isn't much I can do about that, besides change my provider configuration/use another provider/change keep_completed? From hategan at mcs.anl.gov Thu Mar 19 16:54:07 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 16:54:07 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <49C2BC53.60401@mcs.anl.gov> References: <49C2BA80.2080402@mcs.anl.gov> <49C2BC53.60401@mcs.anl.gov> Message-ID: <1237499647.31133.12.camel@localhost> On Thu, 2009-03-19 at 16:42 -0500, Michael Wilde wrote: > I may have asked too soon, as I just found this in the user guide: > > "coasterWorkerMaxwalltime specifies the maxwalltime to be used when > submitting coaster workers. This profile entry is used by the coaster > execution provider. If this entry is not specified, the coaster provider > will compute a maxwalltime based on the maxwalltime of jobs submitted. > (since Swift 0.9) " > > Which seems to leave a lot of room for interpretation, but is the > following reasonable? > > If you can, just specify coasterWorkerMaxwalltime, and that will go to > the resource manager scheduler in the absence of any other times. > > But, if you are running jobs on many sites, some of which specify > coasters and some of which dont, then I take the above to mean that > coasters will compute a max walltime base on whatever jobs it sees for a > given site at the moment???? And how does it treat maxWalltime for an > app (from tc.data) vs. a site? > > Can you provide some simple guidelines to make this whole issue of > queues and times easy to understand and specify? I guess the explanation above in the user guide is a little unclear. The main use for coasterWorkerMaxwalltime is to put a cap on the walltime for the workers, which may otherwise get a walltime that would cause the queuing system to bark. From hategan at mcs.anl.gov Thu Mar 19 16:55:51 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 16:55:51 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: <1237498996.31133.2.camel@localhost> Message-ID: <1237499751.31133.14.camel@localhost> On Thu, 2009-03-19 at 16:49 -0500, Andrew Boyce wrote: > So I was right. OK. I suppose there isn't much I can do about that, > besides change my provider configuration/use another provider/change > keep_completed? If you have control over keep_completed, that would be a reasonable workaround until the provider gets fixed. From yuechen at bsd.uchicago.edu Thu Mar 19 17:21:21 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Thu, 19 Mar 2009 17:21:21 -0500 Subject: [Swift-user] Teraport question References: Message-ID: Hi Ben, Thank you very much for the answer. After I changed to "fast" queue, it runs very quickly now. Regards, Chen, Yue ________________________________ From: Ben Clifford [mailto:benc at hawaga.org.uk] Sent: Thu 3/19/2009 4:07 PM To: Yue, Chen - BMD Cc: swift user Subject: Re: [Swift-user] Teraport question You can run the qstat command to see if your jobs are being queued, what state they are in, and which queue they went to. For example: 878629.tp-mgt null yuechen 0 Q extended shows a job submitted by you, that is in state Q (meaning queued) and that it is in the extended queue. There are a lot of other jobs running on teraport at the moment. You might have more luck using a different queue, such as 'fast' which you can specify by adding a line something like this: fast to your site definition. However, you'll likely still have to wait some time for your jobs to run - that is the nature of using a job queue... -- This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Mar 19 17:24:33 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 17:24:33 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: <1237498996.31133.2.camel@localhost> Message-ID: <49C2C621.2080203@mcs.anl.gov> I think using coasters direct to PBS would be a good approach for you to try. Your jobs will finish fast and send their data back. Im no sure if the Swift engine itself will exit when all the jobs are done, or if it will wait for the coaster-worker PBS jobs to terminate. Even if so, that should still enable you to do a lot of productive workflow testing and experimentation. You'll know all the jobs are done from the progress report on stdout. On 3/19/09 4:49 PM, Andrew Boyce wrote: > So I was right. OK. I suppose there isn't much I can do about that, > besides change my provider configuration/use another provider/change > keep_completed? > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Thu Mar 19 17:29:00 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 17:29:00 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <49C2C621.2080203@mcs.anl.gov> References: <1237498996.31133.2.camel@localhost> <49C2C621.2080203@mcs.anl.gov> Message-ID: <1237501740.32387.1.camel@localhost> On Thu, 2009-03-19 at 17:24 -0500, Michael Wilde wrote: > I think using coasters direct to PBS would be a good approach for you to > try. Your jobs will finish fast and send their data back. Im no sure if > the Swift engine itself will exit when all the jobs are done, or if it > will wait for the coaster-worker PBS jobs to terminate. It will try to shut the service (and workers) down when the client JVM exist. It may fail to do so, but it will not wait around. > Even if so, that > should still enable you to do a lot of productive workflow testing and > experimentation. You'll know all the jobs are done from the progress > report on stdout. > > On 3/19/09 4:49 PM, Andrew Boyce wrote: > > So I was right. OK. I suppose there isn't much I can do about that, > > besides change my provider configuration/use another provider/change > > keep_completed? > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From ajboyce at jacks.sdstate.edu Thu Mar 19 17:35:37 2009 From: ajboyce at jacks.sdstate.edu (Andrew Boyce) Date: Thu, 19 Mar 2009 17:35:37 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <49C2C621.2080203@mcs.anl.gov> References: <1237498996.31133.2.camel@localhost> <49C2C621.2080203@mcs.anl.gov> Message-ID: I will definitely have to try that. Now I think I understand the benefits of that approach. Sending the data back is important. I do have a question though. In the Swift user guide, it states: "CoG coasters provide a low-overhead job submission and file transfer mechanism suited for the execution of short jobs (on the order of a few seconds) and the transfer of small files (on the order of a few kilobytes)." If I want to use coasters with longer jobs, with the transfer of large files, will that be a problem/be ill-suited with this approach? I know that our goal is to work with much, much larger files down the road, and much longer jobs. Thank you both, Michael and Mihael, for your assistance in this matter. I really do appreciate it very much. Regards, Andrew Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Mar 19 17:41:49 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 17:41:49 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: References: <1237498996.31133.2.camel@localhost> <49C2C621.2080203@mcs.anl.gov> Message-ID: <1237502509.316.3.camel@localhost> On Thu, 2009-03-19 at 17:35 -0500, Andrew Boyce wrote: > I will definitely have to try that. Now I think I understand the > benefits of that approach. Sending the data back is important. > I do have a question though. In the Swift user guide, it states: "CoG > coasters provide a low-overhead job submission and > file transfer mechanism suited for the execution of short jobs (on the > order of a few seconds) and the transfer of small files > (on the order of a few kilobytes)." > > > If I want to use coasters with longer jobs, with the transfer of large > files, will that be a problem/be ill-suited with this approach? > I know that our goal is to work with much, much larger files down the > road, and much longer jobs. You may not want to use the coaster file provider for large files, but you could still use the coaster execution provider for running the jobs. It should support larger jobs, but the benefit of using coasters for such jobs may not be worth the extra layers of code (though there is this specific case where coasters can help running multiple jobs per node that has multiple cores/CPUs). > From wilde at mcs.anl.gov Thu Mar 19 18:04:05 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 18:04:05 -0500 Subject: [Swift-user] Swift/PBS Scheduler Slow to Report "Finished"? In-Reply-To: <1237502509.316.3.camel@localhost> References: <1237498996.31133.2.camel@localhost> <49C2C621.2080203@mcs.anl.gov> <1237502509.316.3.camel@localhost> Message-ID: <49C2CF65.3040000@mcs.anl.gov> On 3/19/09 5:41 PM, Mihael Hategan wrote: > On Thu, 2009-03-19 at 17:35 -0500, Andrew Boyce wrote: >> I will definitely have to try that. Now I think I understand the >> benefits of that approach. Sending the data back is important. >> I do have a question though. In the Swift user guide, it states: "CoG >> coasters provide a low-overhead job submission and >> file transfer mechanism suited for the execution of short jobs (on the >> order of a few seconds) and the transfer of small files >> (on the order of a few kilobytes)." >> >> >> If I want to use coasters with longer jobs, with the transfer of large >> files, will that be a problem/be ill-suited with this approach? >> I know that our goal is to work with much, much larger files down the >> road, and much longer jobs. > > You may not want to use the coaster file provider for large files, but > you could still use the coaster execution provider for running the jobs. > It should support larger jobs, but the benefit of using coasters for > such jobs may not be worth the extra layers of code (though there is > this specific case where coasters can help running multiple jobs per > node that has multiple cores/CPUs). Let me offer a slight variation on this. As the implementation stabilizes, coaster-providers would ideally become the predominant way of running all jobs on all sites. True, they involve more code layers than the "plain" job providers, but that will be transparent once they are solid. And they seem have reached a level of stability where they are quite usable, and readily fixable when new, unexpected cases are encountered. Once they work for a given site, they tend to be reliable. Making their startup transparent across many environments has been devilishly hard, though. This makes the trunk version prone to breakage as fixes are applied. So, Andrew, I would suggest you use them for all configurations where they work, and any help you can provide in testing and hardening them will be greatly appreciated. As Mihael suggests, for data, there is likely to be a crossover point where as file size grows, gridftp becomes more efficient. I would hazard a guess that if you file is <10Kbytes, try coasters, >1MB, use gridftp. From wilde at mcs.anl.gov Thu Mar 19 18:13:02 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 18:13:02 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <1237499647.31133.12.camel@localhost> References: <49C2BA80.2080402@mcs.anl.gov> <49C2BC53.60401@mcs.anl.gov> <1237499647.31133.12.camel@localhost> Message-ID: <49C2D17E.4030200@mcs.anl.gov> On 3/19/09 4:54 PM, Mihael Hategan wrote: > I guess the explanation above in the user guide is a little unclear. The > main use for coasterWorkerMaxwalltime is to put a cap on the walltime > for the workers, which may otherwise get a walltime that would cause the > queuing system to bark. So is the following summary correct (as far as it goes)? "If coasterWorkerMaxwalltime is specified for a site (which must be using coasters) then it determines the wall time that will be requested from the local scheduler when the coaster worker job is submitted. This will override all other time specifications for the worker job, and is typically the easiest way to manage the time requests for these jobs. "If coasterWorkerMaxwalltime is *not* specified then the maxWallTime computed for the worker is based on ??? - app jobs queued, and their tc.data maxwalltime specs? - site maxwalltime? multiplied by some scale factor? "If no times are specified, but a queue is specified for the site, then the wall time allowed the worker job is determined by the local scheduler, based on the requested queue." I'm looking for some spec like this to clarify the whole issue. Its that spec in the middle that seems complicated, but if coasterWorkerMaxwalltime overrides all other specs, I think that will meet most users (and my) needs. From hategan at mcs.anl.gov Thu Mar 19 18:22:12 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 18:22:12 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <49C2D17E.4030200@mcs.anl.gov> References: <49C2BA80.2080402@mcs.anl.gov> <49C2BC53.60401@mcs.anl.gov> <1237499647.31133.12.camel@localhost> <49C2D17E.4030200@mcs.anl.gov> Message-ID: <1237504932.316.8.camel@localhost> On Thu, 2009-03-19 at 18:13 -0500, Michael Wilde wrote: > On 3/19/09 4:54 PM, Mihael Hategan wrote: > > > I guess the explanation above in the user guide is a little unclear. The > > main use for coasterWorkerMaxwalltime is to put a cap on the walltime > > for the workers, which may otherwise get a walltime that would cause the > > queuing system to bark. > > So is the following summary correct (as far as it goes)? > > "If coasterWorkerMaxwalltime is specified for a site (which must be > using coasters) then it determines the wall time that will be requested > from the local scheduler when the coaster worker job is submitted. This > will override all other time specifications for the worker job, and is > typically the easiest way to manage the time requests for these jobs. The worker job walltime is otherwise unspecified (i.e. the software governing the workers can choose anything it pleases). > > "If coasterWorkerMaxwalltime is *not* specified then the maxWallTime > computed for the worker is based on ??? Nothing. It shall remain undisclosed (current implementation says walltime_of_job_that_started_the_worker * 10 + 1 minute) > - app jobs queued, and their tc.data maxwalltime specs? > - site maxwalltime? multiplied by some scale factor? > > "If no times are specified, but a queue is specified for the site, then > the wall time allowed the worker job is determined by the local > scheduler, based on the requested queue." I have no idea where that came from. From wilde at mcs.anl.gov Thu Mar 19 18:29:25 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 Mar 2009 18:29:25 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <1237504932.316.8.camel@localhost> References: <49C2BA80.2080402@mcs.anl.gov> <49C2BC53.60401@mcs.anl.gov> <1237499647.31133.12.camel@localhost> <49C2D17E.4030200@mcs.anl.gov> <1237504932.316.8.camel@localhost> Message-ID: <49C2D555.7060904@mcs.anl.gov> On 3/19/09 6:22 PM, Mihael Hategan wrote: >> "If no times are specified, but a queue is specified for the site, then >> the wall time allowed the worker job is determined by the local >> scheduler, based on the requested queue." > > I have no idea where that came from. It came from this reasoning: Ben suggested to Yue a sites.xml entry with a queue spec, and I assumed the intent was to use that without a wall time. I then further assumed that if this worked (and Yue indicated that it did: his jobs wound uop in the fast queue), then the jobs must be getting some default time spec. Ive always in the past found it a nuisance to have to get the maxwalltime spec compatible with the queue name, as you have to hunt town this data for every site. If you knew the queue names, and could just specify a queue name without a wall time, that seemed handy. (Although I suppose can lead to the pitfall of unexpectedly exceeding wall time limits). From hategan at mcs.anl.gov Thu Mar 19 19:06:37 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 Mar 2009 19:06:37 -0500 Subject: [Swift-user] Teraport question In-Reply-To: <49C2D555.7060904@mcs.anl.gov> References: <49C2BA80.2080402@mcs.anl.gov> <49C2BC53.60401@mcs.anl.gov> <1237499647.31133.12.camel@localhost> <49C2D17E.4030200@mcs.anl.gov> <1237504932.316.8.camel@localhost> <49C2D555.7060904@mcs.anl.gov> Message-ID: <1237507597.2071.10.camel@localhost> On Thu, 2009-03-19 at 18:29 -0500, Michael Wilde wrote: > > On 3/19/09 6:22 PM, Mihael Hategan wrote: > > >> "If no times are specified, but a queue is specified for the site, then > >> the wall time allowed the worker job is determined by the local > >> scheduler, based on the requested queue." > > > > I have no idea where that came from. I thought the paragraph was in the user guide. My answer was to that. If it's your paragraph, then that's not happening because I don't know how that could be done (i.e. figuring out the local queues and their walltime limits). > > It came from this reasoning: > > Ben suggested to Yue a sites.xml entry with a queue spec, and I assumed > the intent was to use that without a wall time. I then further assumed > that if this worked (and Yue indicated that it did: his jobs wound uop > in the fast queue), then the jobs must be getting some default time spec. > > Ive always in the past found it a nuisance to have to get the > maxwalltime spec compatible with the queue name, as you have to hunt > town this data for every site. Yes. This is an issue that The Grid introduces. If you have one cluster, you read the documentation for it. If you have a bunch of clusters coming and going, you hit keyboard with hammer. > > If you knew the queue names, and could just specify a queue name without > a wall time, that seemed handy. (Although I suppose can lead to the > pitfall of unexpectedly exceeding wall time limits). This let's maybe or like perhaps put job in possibly queue seems a little fishy, given that coasters rely on knapsacking job walltimes into worker walltimes to get things done. From wilde at mcs.anl.gov Fri Mar 20 01:11:55 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 01:11:55 -0500 Subject: [Swift-user] Problems in complex mapping situation Message-ID: <49C333AB.9040304@mcs.anl.gov> I get this: -- RunID: 20090320-0048-id8ot1y9 Progress: Execution failed: java.lang.RuntimeException: Data set initialization failed for result.[0][]/1. It should have been closed. Caused by: Handle open: OOPSOut[] result.[0][]/1 SwiftScript trace: T1af7, Round, 0, Sim, 0, StartTemp, , TempUpdate, Command exited with non-zero status 2 -- from the fairly lengthy script at www.ci.uchicago.edu/~wilde/oops.swift This script was working fine, and I was trying to improve its performance on large datasets on the BG/P by using an ext mapper instead of simple_mapper to map a large 2D array of structures (OOPSOut result) so that it spreads across multiple directories (to avoid the GPFS locking issue. (I have 4K cores writing 14,000 files to one directory, as I see no way with simple mapper to use the array index to, for example, insert more directory entries in the prefix or suffix.) So I am trying now to map the entire 2D array of structs "result" up front (just handling the simple case where the first dimension has only a single entry.) The log is at the same URL, file oops-20090320-0048-id8ot1y9.log In the simple first test case Im trying here (ie, map for one job), I think my mapper is being called with and returning this: int$ ./OOPSOutAll.map.sh -p 1foo -d out.n -r 1 -s 1 -t "" -u "" [0][0].pdt out.n/1foo/0/00/00/ST.TU.pdt [0][0].rmsd out.n/1foo/0/00/00/ST.TU.rmsd [0][0].log out.n/1foo/0/00/00/ST.TU.log int$ I'll debug this further, but am hoping you can give some clues as to what the error above means. I dont get the idea of the dataset needing to be closed at initialization. Is intialization here mean "mapping"? The state notion implied by this message is confusing. Thanks. From benc at hawaga.org.uk Fri Mar 20 03:00:41 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 08:00:41 +0000 (GMT) Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: <49C333AB.9040304@mcs.anl.gov> References: <49C333AB.9040304@mcs.anl.gov> Message-ID: That error message is Swift trying to do some consistency checking on what has been mapped, and failing. Initialization in this context means initializing the mapper for OOPSOut. I'll poke around. -- From benc at hawaga.org.uk Fri Mar 20 04:15:36 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 09:15:36 +0000 (GMT) Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: <49C333AB.9040304@mcs.anl.gov> References: <49C333AB.9040304@mcs.anl.gov> Message-ID: On Fri, 20 Mar 2009, Michael Wilde wrote: > This script was working fine, and I was trying to improve its performance on > large datasets on the BG/P by using an ext mapper instead of simple_mapper to > map a large 2D array of structures (OOPSOut result) so that it spreads across > multiple directories (to avoid the GPFS locking issue. (I have 4K cores > writing 14,000 files to one directory, as I see no way with simple mapper to > use the array index to, for example, insert more directory entries in the > prefix or suffix.) The concurrent mapper (used to map variables that have no explicitly declared mapper) makes a tree of directories for large arrays. This is done in a deliberately unspecified manner, but at present (according to the source) /** determines how many directories and element files are permitted in each directory. There will be no more than DIRECTORY_LOAD_FACTOR element files and no more than DIRECTORY_LOAD_FACTOR directories, so there could be up to 2 * DIRECTORY_LOAD_FACTOR elements. */ public final static int DIRECTORY_LOAD_FACTOR=25; so if you declare an array as myarray foo[]; with no mapper, you'll end up with a tree of directories, each directory having no more than 50 entries in it. You lose the ability to specify the filenames at all here, which may or may not be a problem for your applications. It might be that this functionality could be more made more general so that you can (for example) specify some option to the simple mapper to get automatically-computed hierarchical directories. It seems like a common enough use case. -- From wilde at mcs.anl.gov Fri Mar 20 11:05:19 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 11:05:19 -0500 Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: References: <49C333AB.9040304@mcs.anl.gov> Message-ID: <49C3BEBF.1000406@mcs.anl.gov> Cool. I forgot about this. I can certainly try it for perf measurements. A mapper-like external script can walk the output tree of the concurrent mapper and create links with the desired filenames. I'll try that. Some questions (but I'll find this out in a moment when I try it): - does this behavior start as soon as the directory hits 50 elements? - if the first element I store is, say 1000, is it triggered? - does it map arrays of structures? Im assuming yes to all these, and will experiment. Thanks! On 3/20/09 4:15 AM, Ben Clifford wrote: > On Fri, 20 Mar 2009, Michael Wilde wrote: > >> This script was working fine, and I was trying to improve its performance on >> large datasets on the BG/P by using an ext mapper instead of simple_mapper to >> map a large 2D array of structures (OOPSOut result) so that it spreads across >> multiple directories (to avoid the GPFS locking issue. (I have 4K cores >> writing 14,000 files to one directory, as I see no way with simple mapper to >> use the array index to, for example, insert more directory entries in the >> prefix or suffix.) > > The concurrent mapper (used to map variables that have no explicitly > declared mapper) makes a tree of directories for large arrays. > > This is done in a deliberately unspecified manner, but at present > (according to the source) /** determines how many directories and element > files are permitted > in each directory. There will be no more than > DIRECTORY_LOAD_FACTOR element files and no more than > DIRECTORY_LOAD_FACTOR directories, so there could be up to > 2 * DIRECTORY_LOAD_FACTOR elements. */ > public final static int DIRECTORY_LOAD_FACTOR=25; > > so if you declare an array as myarray foo[]; with no mapper, you'll end > up with a tree of directories, each directory having no more than 50 > entries in it. > > You lose the ability to specify the filenames at all here, which may or > may not be a problem for your applications. > > It might be that this functionality could be more made more general so > that you can (for example) specify some option to the simple mapper to get > automatically-computed hierarchical directories. It seems like a common > enough use case. > From wilde at mcs.anl.gov Fri Mar 20 11:13:36 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 11:13:36 -0500 Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: <49C3BEBF.1000406@mcs.anl.gov> References: <49C333AB.9040304@mcs.anl.gov> <49C3BEBF.1000406@mcs.anl.gov> Message-ID: <49C3C0B0.7040109@mcs.anl.gov> Very nice. This script: -- type file; type struct { file mem1; file mem2; } app (file o) echo (string s) { echo s stdout=@o; } file a[]; a[0] = echo("0"); a[49] = echo("49"); a[1000] = echo("1000"); a[20000] = echo("20000"); struct s[][]; s[123][456].mem1 = echo("s[123][456].mem1"); s[12300][987].mem1 = echo("s[12300][987].mem1"); -- gives: sur$ find _concurrent _concurrent _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17/elt-12300_-array _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17/elt-12300_-array/h12 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17/elt-12300_-array/h12/h14 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17/elt-12300_-array/h12/h14/elt-987.-field _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h0/h17/elt-12300_-array/h12/h14/elt-987.-field/mem1 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h23 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h23/elt-123_-array _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h23/elt-123_-array/h6 _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h23/elt-123_-array/h6/elt-456.-field _concurrent/s-3b2cbe1c-4a65-4452-ba62-11becc80935c--array/h23/elt-123_-array/h6/elt-456.-field/mem1 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0/h0 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0/h0/h7 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0/h0/h7/elt-20000 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0/h15 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h0/h15/elt-1000 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h24 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/h24/elt-49 _concurrent/a-cfcdbe0b-ed93-47fe-865f-5d27efb33f3f--array/elt-0 sur$ On 3/20/09 11:05 AM, Michael Wilde wrote: > Cool. I forgot about this. I can certainly try it for perf measurements. > > A mapper-like external script can walk the output tree of the concurrent > mapper and create links with the desired filenames. I'll try that. > > Some questions (but I'll find this out in a moment when I try it): > > - does this behavior start as soon as the directory hits 50 elements? > - if the first element I store is, say 1000, is it triggered? > - does it map arrays of structures? > > Im assuming yes to all these, and will experiment. Thanks! > > On 3/20/09 4:15 AM, Ben Clifford wrote: >> On Fri, 20 Mar 2009, Michael Wilde wrote: >> >>> This script was working fine, and I was trying to improve its >>> performance on >>> large datasets on the BG/P by using an ext mapper instead of >>> simple_mapper to >>> map a large 2D array of structures (OOPSOut result) so that it >>> spreads across >>> multiple directories (to avoid the GPFS locking issue. (I have 4K cores >>> writing 14,000 files to one directory, as I see no way with simple >>> mapper to >>> use the array index to, for example, insert more directory entries in >>> the >>> prefix or suffix.) >> >> The concurrent mapper (used to map variables that have no explicitly >> declared mapper) makes a tree of directories for large arrays. >> >> This is done in a deliberately unspecified manner, but at present >> (according to the source) /** determines how many directories and >> element files are permitted >> in each directory. There will be no more than >> DIRECTORY_LOAD_FACTOR element files and no more than >> DIRECTORY_LOAD_FACTOR directories, so there could be up to >> 2 * DIRECTORY_LOAD_FACTOR elements. */ >> public final static int DIRECTORY_LOAD_FACTOR=25; >> >> so if you declare an array as myarray foo[]; with no mapper, you'll >> end up with a tree of directories, each directory having no more than >> 50 entries in it. >> >> You lose the ability to specify the filenames at all here, which may >> or may not be a problem for your applications. >> >> It might be that this functionality could be more made more general so >> that you can (for example) specify some option to the simple mapper to >> get automatically-computed hierarchical directories. It seems like a >> common enough use case. >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From benc at hawaga.org.uk Fri Mar 20 11:13:15 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 16:13:15 +0000 (GMT) Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: <49C3BEBF.1000406@mcs.anl.gov> References: <49C333AB.9040304@mcs.anl.gov> <49C3BEBF.1000406@mcs.anl.gov> Message-ID: On Fri, 20 Mar 2009, Michael Wilde wrote: > - does this behavior start as soon as the directory hits 50 elements? it doesn't start/stop - its always there. something like a base-25 representation of the array index is constructed and used to name directories, one base-25 digit being one directory name. if you store a small array (under 25 at the moment) you'll get a single base-25 digit in your index and so a single level. You should see what happens in the present implementation if you make a bunch of entries and look in the _concurrent/ directory. > - if the first element I store is, say 1000, is it triggered? n/a > - does it map arrays of structures? yes -- From benc at hawaga.org.uk Fri Mar 20 11:17:04 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 16:17:04 +0000 (GMT) Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: <49C333AB.9040304@mcs.anl.gov> References: <49C333AB.9040304@mcs.anl.gov> Message-ID: I've recreated this in a much smaller test case (pretty much whenever multidimensional arrays are used with ext (and probably other mappers)). I made what appeared to be an obvious fix but its causing some surprising behaviour in other tests so I haven't committed it. I'll keep playing though. -- From yuechen at bsd.uchicago.edu Fri Mar 20 13:07:29 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Fri, 20 Mar 2009 13:07:29 -0500 Subject: [Swift-user] Teragrid execution problem Message-ID: Hi, I was testing swift script on Teragrid clusters. Earlier this morning the test was successful. But now it shows the following communication error, but the test with globus-job-run was successful. Is this a queuing issue? I tried qstat command but I cannot find my job on NCSA clusters. Thank you very much! Chen, Yue sites.xml --- /home/ac/yuechen/tmp fast tc.data --- NCSA_Abe_prews_pbs PTMap /home/yuechen/PTMap/PTMap INSTALLED INTEL32::LINUX null Error message and globus-job-run test --- [yuechen at communicado PTMap]$ swift PTMap-unmod.swift -sites.file sites.xml -tc.file tc.data Swift 0.8 swift-r2448 cog-r2261 RunID: 20090320-1255-aaszhfkg Progress: Progress: Selecting site:9 Initializing site shared directory:2 Progress: Selecting site:7 Initializing site shared directory:4 Progress: Selecting site:5 Initializing site shared directory:6 Progress: Selecting site:4 Initializing site shared directory:7 Progress: Selecting site:3 Initializing site shared directory:8 Progress: Selecting site:2 Initializing site shared directory:9 Progress: Selecting site:1 Initializing site shared directory:10 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Progress: Initializing site shared directory:11 Execution failed: Could not initialize shared directory on NCSA_Abe_prews_pbs Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error communicating with the GridFTP server Caused by: Server refused performing the request. Custom message: Bad password. (error code 1) [Nested exception message: Custom message: Unexpected reply: 530-Login incorrect. : globus_gss_assist: Gridmap lookup failure: Could not map /DC=org/DC=doegrids/OU=People/CN=Yue Chen 509341 530- 530 End.] [yuechen at communicado PTMap]$ globus-job-run tg-login.ncsa.teragrid.org /usr/bin/id uid=40836(yuechen) gid=13381(bdd) groups=13381(bdd),202(ac) This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Fri Mar 20 13:14:21 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 18:14:21 +0000 (GMT) Subject: [Swift-user] Teragrid execution problem In-Reply-To: References: Message-ID: That is an error coming from the gridftp server. Check that you van reliably (sveral times) use globus-url-copy to copy files from the gridftp server that you specify in the sites file: gsiftp://gridftp-abe.ncsa.teragrid.org:2811 From wilde at mcs.anl.gov Fri Mar 20 14:37:15 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 14:37:15 -0500 Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: References: <49C333AB.9040304@mcs.anl.gov> Message-ID: <49C3F06B.7060703@mcs.anl.gov> On 3/20/09 11:17 AM, Ben Clifford wrote: > I've recreated this in a much smaller test case (pretty much whenever > multidimensional arrays are used with ext (and probably other mappers)). > > I made what appeared to be an obvious fix but its causing some surprising > behaviour in other tests so I haven't committed it. I'll keep playing > though. Great, that would be helpful. I just tried the concurrent mapper on the real app, and its not going to work as easily as I hoped. The underlying app code (the actual "oops" executable in this case) is picky about the naming of output files: it wants to see a path ending in something like proteinname.pdt (eg T1af7.pdt), and will generate several other output files based on that: pname.Energy, .lib, .rmsd etc. The code is brittle and is getting a segfault, I think due to the unexpected form of the output filename from the concurrent mapper. I can try to deal with this in the shell script that wraps the app (or the app itself), but the ext mapper solution would also be handy, if thats close. From benc at hawaga.org.uk Fri Mar 20 14:44:19 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 19:44:19 +0000 (GMT) Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: References: <49C333AB.9040304@mcs.anl.gov> Message-ID: On Fri, 20 Mar 2009, Ben Clifford wrote: > I've recreated this in a much smaller test case (pretty much whenever > multidimensional arrays are used with ext (and probably other mappers)). Swift r2713 should fix this. -- From yuechen at bsd.uchicago.edu Fri Mar 20 14:53:59 2009 From: yuechen at bsd.uchicago.edu (Yue, Chen - BMD) Date: Fri, 20 Mar 2009 14:53:59 -0500 Subject: [Swift-user] Teragrid execution problem References: Message-ID: Hi Ben, Thanks for answering my question. I tried the following command: [yuechen at communicado PTMap]$ globus-url-copy file:///fasta334 gsiftp://gridftp-abe.ncsa.teragrid.org:2811/yuechen/tmp GlobusUrlCopy error: UrlCopy transfer failed. [Caused by: Server refused performing the request. Custom message: Bad password. (error code 1) [Nested exception message: Custom message: Unexpected reply: 530-Login incorrect. : globus_gss_assist: Gridmap lookup failure: Could not map /DC=org/DC=doegrids/OU=People/CN=Yue Chen 509341 530- 530 End.]] But I don't understand why it may fail, since I have performed grid-proxy-init and gx-request commands. Is it because the ftp server too busy? I also tried SDSC clusters, but I got different error message. Is this also because of gsiftp issue? sites.xml --- /gpfs-wan/scratch/yuechen fast tc.data --- SDSC_dtf_prews_pbs PTMap /home/yuechen/PTMap/PTMap INSTALLED INTEL32::LINUX null Error messages and globus-job-run test: [yuechen at communicado PTMap]$ swift PTMap-unmod.swift -sites.file sites.xml -tc.file tc.data Swift 0.8 swift-r2448 cog-r2261 RunID: 20090320-1433-oc8ufsv3 Progress: Progress: Selecting site:9 Stage in:1 Initializing site shared directory:1 Progress: Selecting site:9 Stage in:2 Progress: Selecting site:9 Stage in:2 Progress: Selecting site:9 Stage in:2 Progress: Selecting site:9 Stage in:2 Progress: Selecting site:9 Stage in:1 Submitting:1 Progress: Selecting site:9 Submitting:2 Failed to transfer wrapper log from PTMap-unmod-20090320-1433-oc8ufsv3/info/8 on SDSC_dtf_prews_pbs Progress: Selecting site:8 Stage in:1 Submitting:1 Failed but can retry:1 Progress: Selecting site:8 Submitting:1 Submitted:1 Failed but can retry:1 Failed to transfer wrapper log from PTMap-unmod-20090320-1433-oc8ufsv3/info/9 on SDSC_dtf_prews_pbs Progress: Selecting site:7 Stage in:1 Submitting:1 Failed but can retry:2 Progress: Selecting site:7 Submitting:1 Submitted:1 Failed but can retry:2 Failed to transfer wrapper log from PTMap-unmod-20090320-1433-oc8ufsv3/info/d on SDSC_dtf_prews_pbs Progress: Selecting site:7 Submitted:1 Failed but can retry:3 Failed to transfer wrapper log from PTMap-unmod-20090320-1433-oc8ufsv3/info/b on SDSC_dtf_prews_pbs Progress: Selecting site:7 Failed but can retry:4 Progress: Selecting site:7 Failed but can retry:4 Progress: Selecting site:6 Stage in:1 Failed but can retry:4 Progress: Selecting site:6 Submitting:1 Failed but can retry:4 Progress: Selecting site:6 Submitted:1 Failed but can retry:4 Failed to transfer wrapper log from PTMap-unmod-20090320-1433-oc8ufsv3/info/g on SDSC_dtf_prews_pbs Progress: Selecting site:6 Failed but can retry:5 [yuechen at communicado PTMap]$ globus-job-run tg-login1.sdsc.teragrid.org /usr/bin/id uid=502857(yuechen) gid=5195(anl100) groups=5195(anl100) ________________________________ From: Ben Clifford [mailto:benc at hawaga.org.uk] Sent: Fri 3/20/2009 1:14 PM To: Yue, Chen - BMD Cc: swift user Subject: Re: [Swift-user] Teragrid execution problem That is an error coming from the gridftp server. Check that you van reliably (sveral times) use globus-url-copy to copy files from the gridftp server that you specify in the sites file: gsiftp://gridftp-abe.ncsa.teragrid.org:2811 ________________________________ From: Ben Clifford [mailto:benc at hawaga.org.uk] Sent: Fri 3/20/2009 1:14 PM To: Yue, Chen - BMD Cc: swift user Subject: Re: [Swift-user] Teragrid execution problem That is an error coming from the gridftp server. Check that you van reliably (sveral times) use globus-url-copy to copy files from the gridftp server that you specify in the sites file: gsiftp://gridftp-abe.ncsa.teragrid.org:2811 This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Fri Mar 20 15:00:22 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Mar 2009 20:00:22 +0000 (GMT) Subject: [Swift-user] Teragrid execution problem In-Reply-To: References: Message-ID: On Fri, 20 Mar 2009, Yue, Chen - BMD wrote: > But I don't understand why it may fail, since I have performed > grid-proxy-init and gx-request commands. Is it because the ftp server > too busy? I doubt it is because the server is busy. Email the teragrid helpdesk at help at teragrid.org with your globus-url-copy demonstration above. > > I also tried SDSC clusters, but I got different error message. Is this also because of gsiftp issue? > Those error messages look different. For that, please can you put a log file from that run somewhere where I can see it? (for example on the CI NFS) -- From wilde at mcs.anl.gov Fri Mar 20 15:02:07 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 15:02:07 -0500 Subject: [Swift-user] Teragrid execution problem In-Reply-To: References: Message-ID: <49C3F63F.4080702@mcs.anl.gov> Yuechecn, I think your pathnames to globus-url-copy are not correct, but I dont *think* thats causing this problem: there is likely a problem on abe with the mapping of your certs. I'll take a look. (OK, I see Ben you are looking into it, will wait, but let me know if I should help Yue) But, also fix the pathnames. In: globus-url-copy file:///fasta334 gsiftp://gridftp-abe.ncsa.teragrid.org:2811/yuechen/tmp these should be: file:///home/yuechen/whateverdirs/fasta334 and similarly, whatever is after gsiftp://gridftp-abe.ncsa.teragrid.org:2811/ should be a valid full path name on abe. (eg, /home/users/something/yuechen/tmp - whatever the directory structure is on abe. On 3/20/09 2:53 PM, Yue, Chen - BMD wrote: > Hi Ben, > > Thanks for answering my question. I tried the following command: > > [yuechen at communicado PTMap]$ globus-url-copy file:///fasta334 > gsiftp://gridftp-abe.ncsa.teragrid.org:2811/yuechen/tmp > GlobusUrlCopy error: UrlCopy transfer failed. [Caused by: Server refused > performing the request. Custom message: Bad password. (error code 1) > [Nested exception message: Custom message: Unexpected reply: 530-Login > incorrect. : globus_gss_assist: Gridmap lookup failure: Could not map > /DC=org/DC=doegrids/OU=People/CN=Yue Chen 509341 > 530- > 530 End.]] > But I don't understand why it may fail, since I have performed > grid-proxy-init and gx-request commands. Is it because the ftp server > too busy? > > I also tried SDSC clusters, but I got different error message. Is this > also because of gsiftp issue? > > sites.xml --- > > > > url="tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs" major="2" /> > /gpfs-wan/scratch/yuechen > fast > > tc.data --- > > SDSC_dtf_prews_pbs PTMap /home/yuechen/PTMap/PTMap > INSTALLED INTEL32::LINUX null > > Error messages and globus-job-run test: > > [yuechen at communicado PTMap]$ swift PTMap-unmod.swift -sites.file > sites.xml -tc.file tc.data > Swift 0.8 swift-r2448 cog-r2261 > RunID: 20090320-1433-oc8ufsv3 > Progress: > Progress: Selecting site:9 Stage in:1 Initializing site shared directory:1 > Progress: Selecting site:9 Stage in:2 > Progress: Selecting site:9 Stage in:2 > Progress: Selecting site:9 Stage in:2 > Progress: Selecting site:9 Stage in:2 > Progress: Selecting site:9 Stage in:1 Submitting:1 > Progress: Selecting site:9 Submitting:2 > Failed to transfer wrapper log from > PTMap-unmod-20090320-1433-oc8ufsv3/info/8 on SDSC_dtf_prews_pbs > Progress: Selecting site:8 Stage in:1 Submitting:1 Failed but can retry:1 > Progress: Selecting site:8 Submitting:1 Submitted:1 Failed but can retry:1 > Failed to transfer wrapper log from > PTMap-unmod-20090320-1433-oc8ufsv3/info/9 on SDSC_dtf_prews_pbs > Progress: Selecting site:7 Stage in:1 Submitting:1 Failed but can retry:2 > Progress: Selecting site:7 Submitting:1 Submitted:1 Failed but can retry:2 > Failed to transfer wrapper log from > PTMap-unmod-20090320-1433-oc8ufsv3/info/d on SDSC_dtf_prews_pbs > Progress: Selecting site:7 Submitted:1 Failed but can retry:3 > Failed to transfer wrapper log from > PTMap-unmod-20090320-1433-oc8ufsv3/info/b on SDSC_dtf_prews_pbs > Progress: Selecting site:7 Failed but can retry:4 > Progress: Selecting site:7 Failed but can retry:4 > Progress: Selecting site:6 Stage in:1 Failed but can retry:4 > Progress: Selecting site:6 Submitting:1 Failed but can retry:4 > Progress: Selecting site:6 Submitted:1 Failed but can retry:4 > Failed to transfer wrapper log from > PTMap-unmod-20090320-1433-oc8ufsv3/info/g on SDSC_dtf_prews_pbs > Progress: Selecting site:6 Failed but can retry:5 > [yuechen at communicado PTMap]$ globus-job-run tg-login1.sdsc.teragrid.org > /usr/bin/id > uid=502857(yuechen) gid=5195(anl100) groups=5195(anl100) > > ------------------------------------------------------------------------ > *From:* Ben Clifford [mailto:benc at hawaga.org.uk] > *Sent:* Fri 3/20/2009 1:14 PM > *To:* Yue, Chen - BMD > *Cc:* swift user > *Subject:* Re: [Swift-user] Teragrid execution problem > > > That is an error coming from the gridftp server. > > Check that you van reliably (sveral times) use globus-url-copy to copy > files from the gridftp server that you specify in the sites file: > > gsiftp://gridftp-abe.ncsa.teragrid.org:2811 > > > ------------------------------------------------------------------------ > *From:* Ben Clifford [mailto:benc at hawaga.org.uk] > *Sent:* Fri 3/20/2009 1:14 PM > *To:* Yue, Chen - BMD > *Cc:* swift user > *Subject:* Re: [Swift-user] Teragrid execution problem > > > That is an error coming from the gridftp server. > > Check that you van reliably (sveral times) use globus-url-copy to copy > files from the gridftp server that you specify in the sites file: > > gsiftp://gridftp-abe.ncsa.teragrid.org:2811 > > > > > This email is intended only for the use of the individual or entity to > which it is addressed and may contain information that is privileged and > confidential. If the reader of this email message is not the intended > recipient, you are hereby notified that any dissemination, distribution, > or copying of this communication is prohibited. If you have received > this email in error, please notify the sender and destroy/delete all > copies of the transmittal. Thank you. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Fri Mar 20 15:22:41 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 15:22:41 -0500 Subject: [Swift-user] Teragrid execution problem In-Reply-To: References: Message-ID: <49C3FB11.5020707@mcs.anl.gov> Yue, I *suspect* that its simply that you are not mapped on abe. If you are around, stop by my office, and I can help you debug this, it will go faster than email. Looking on abe, I dont see your DOEGrids cert dn in the mapfile. Did you do a gx-request on abe? Can you do a globus-job-run of /usr/bin/id to the abe gatekeeper? On 3/20/09 3:00 PM, Ben Clifford wrote: > On Fri, 20 Mar 2009, Yue, Chen - BMD wrote: > >> But I don't understand why it may fail, since I have performed >> grid-proxy-init and gx-request commands. Is it because the ftp server >> too busy? > > I doubt it is because the server is busy. Email the teragrid helpdesk at > help at teragrid.org with your globus-url-copy demonstration above. > >> > >> I also tried SDSC clusters, but I got different error message. Is this also because of gsiftp issue? >> > > Those error messages look different. For that, please can you put a log > file from that run somewhere where I can see it? (for example on the CI > NFS) > From wilde at mcs.anl.gov Fri Mar 20 15:32:25 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 15:32:25 -0500 Subject: [Swift-user] Problems in complex mapping situation In-Reply-To: References: <49C333AB.9040304@mcs.anl.gov> Message-ID: <49C3FD59.9090406@mcs.anl.gov> Initial testing locally indicates the fix works great. About to test on the BG/P at larger scale. Thanks! On 3/20/09 2:44 PM, Ben Clifford wrote: > On Fri, 20 Mar 2009, Ben Clifford wrote: > >> I've recreated this in a much smaller test case (pretty much whenever >> multidimensional arrays are used with ext (and probably other mappers)). > > Swift r2713 should fix this. > From wilde at mcs.anl.gov Fri Mar 20 17:30:30 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 20 Mar 2009 17:30:30 -0500 Subject: [Swift-user] Teragrid execution problem In-Reply-To: <49C3FB11.5020707@mcs.anl.gov> References: <49C3FB11.5020707@mcs.anl.gov> Message-ID: <49C41906.4000102@mcs.anl.gov> that was indeed the case; this particular problem is solved. On 3/20/09 3:22 PM, Michael Wilde wrote: > Yue, I *suspect* that its simply that you are not mapped on abe. > > If you are around, stop by my office, and I can help you debug this, it > will go faster than email. > > Looking on abe, I dont see your DOEGrids cert dn in the mapfile. > > Did you do a gx-request on abe? > > Can you do a globus-job-run of /usr/bin/id to the abe gatekeeper? > > On 3/20/09 3:00 PM, Ben Clifford wrote: >> On Fri, 20 Mar 2009, Yue, Chen - BMD wrote: >> >>> But I don't understand why it may fail, since I have performed >>> grid-proxy-init and gx-request commands. Is it because the ftp server >>> too busy? >> >> I doubt it is because the server is busy. Email the teragrid helpdesk >> at help at teragrid.org with your globus-url-copy demonstration above. >> >>> >> >>> I also tried SDSC clusters, but I got different error message. Is >>> this also because of gsiftp issue? >>> >> >> Those error messages look different. For that, please can you put a >> log file from that run somewhere where I can see it? (for example on >> the CI NFS) >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From aespinosa at cs.uchicago.edu Tue Mar 31 20:39:42 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 31 Mar 2009 20:39:42 -0500 Subject: [Swift-user] how to map arrays of structs? Message-ID: <50b07b4b0903311839t4632356ld81242bd9d068360@mail.gmail.com> I know i should be using the csv_mapper. but can an extern mapper also handle structs? of course the script below does not compile :) type output; type error; type BlastDatabase; type BlastQuery; type BlastResult { output out; error err; } (BlastResult out) blastall(BlastQuery i, BlastDatabase db[]) { app { blastall "-p" "blastp" "-F" "F" "-d" @filename(db[10]) "-i" @filename(i) "-v" "300" "-b" "300" "-m8" "-o" @filename(out) stderr=@filename(err); } } BlastDatabase pir[] ; BlastResult out[]; out[].out; out[].err ; BlastQuery input[] ; BlastResult out_test <"test.out">; BlastQuery in_test <"test.in">; -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Tue Mar 31 20:54:29 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 31 Mar 2009 20:54:29 -0500 Subject: [Swift-user] how to map arrays of structs? In-Reply-To: <50b07b4b0903311839t4632356ld81242bd9d068360@mail.gmail.com> References: <50b07b4b0903311839t4632356ld81242bd9d068360@mail.gmail.com> Message-ID: <1238550869.7897.3.camel@localhost> On Tue, 2009-03-31 at 20:39 -0500, Allan Espinosa wrote: > I know i should be using the csv_mapper. Actually, no, you probably shouldn't be using the csv mapper :) > but can an extern mapper also > handle structs? Yes. > > of course the script below does not compile :) It's not that obvious. Swift compilation has little to do with mappers. Care to share the error that you're getting? From hockyg at uchicago.edu Tue Mar 31 17:40:50 2009 From: hockyg at uchicago.edu (Glen Hocky) Date: Tue, 31 Mar 2009 17:40:50 -0500 Subject: [Swift-user] possible coasters problem Message-ID: <49D29BF2.6060101@uchicago.edu> Hi Guys, Do you think this is a problem with coasters or just the way i'm using it... Thanks, Glen > Exception in runoops: > Arguments: [input/fasta/T1ubq.fasta, > teraportoutdir.100/T1ubq/T1ubq.ST50.TU200.0000.secseq, > input/native/T1ubq.pdb, > teraportoutdir.100/T1ubq//ST50.TU200/0000/00/06/T1ubq.ST50.TU200.0000.0006.pdt, > teraportoutdir.100/T1ubq//ST50.TU200/0000/00/06/T1ubq.ST50.TU200.0000.0006.rmsd, > 6, DEFAULT_INIT_TEMP_=_50, TEMP_UPDATE_INTERVAL_=_200, > MAX_NUMBER_OF_ANNEALING_STEPS_=_0, KILL_TIME_=_55] > Host: teraport > Directory: oops-20090331-1701-fpuie7be/jobs/d/runoops-dsmccq8j > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > java.lang.IllegalArgumentException: No worker with id=1956306968 > at > org.globus.cog.abstraction.coaster.service.job.manager.CoasterTaskHandler.submit(CoasterTaskHandler.java:85) > at > org.globus.cog.abstraction.coaster.service.job.manager.CoasterQueueProcessor.run(CoasterQueueProcessor.java:71) > Caused by: java.lang.IllegalArgumentException: No worker with > id=1956306968 > at > org.globus.cog.abstraction.coaster.service.job.manager.WorkerManager.getChannelContext(WorkerManager.java:483) > at > org.globus.cog.abstraction.coaster.service.job.manager.CoasterTaskHandler.submit(CoasterTaskHandler.java:78) > ... 1 more > > Cleaning up... > Shutting down service at https://128.135.125.118:55513 > Got channel MetaChannel: 22129174 -> GSSSChannel-null(1) > - Done > Command exited with non-zero status 2 > real 1628.27 > user 169.87 From benc at hawaga.org.uk Tue Mar 31 21:30:57 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 1 Apr 2009 02:30:57 +0000 (GMT) Subject: [Swift-user] how to map arrays of structs? In-Reply-To: <50b07b4b0903311839t4632356ld81242bd9d068360@mail.gmail.com> References: <50b07b4b0903311839t4632356ld81242bd9d068360@mail.gmail.com> Message-ID: On Tue, 31 Mar 2009, Allan Espinosa wrote: > can an extern mapper also > handle structs? yes, but not like this: > out[].out; > out[].err ; Look at tests/language-behaviour/0755-ext-mapper.swift --