From tiejing at gmail.com Sat Aug 18 18:24:20 2007 From: tiejing at gmail.com (Jing Tie) Date: Sat, 18 Aug 2007 18:24:20 -0500 Subject: [Swift-user] Kickstart executable not found Message-ID: Hi, I am working on SID application now. Job cwtsmall is a script wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R on input data 101_FB-epochs.Rdata, and should output 101-FBchannel1_cwt-avgResults.Rdata to 101-FBchannel28_cwt-avgResults.Rdata these 28 files. But when I runed swift client with kickstart.enabled = false, it had the exit code 1024 error. And the stderr.txt said: Kickstart executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details below: site: AGLT2 gatekeeper: gate01.aglt2.org app_dir: /atlas/data08/OSG/APP/SIDGrid data_dir: /atlas/data08/OSG/DATA condor_dir: /opt/condor/bin R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R output: Application exception: Job cwtsmall failed with an exit code of 1024 sys:throw @ vdl-int.k, line: 109 vdl:checkexitcode @ vdl-int.k, line: 370 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ sid-wf1.kml, line: 20 wavelettransf @ sid-wf1.kml, line: 362 batchtrials @ sid-wf1.kml, line: 402 vdl:mains @ sid-wf1.kml, line: 399 cwtsmall failed Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot The following errors have occurred: 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) Arguments: "scripts/runWaveletsAvg.R, 101, FB" Host: NWICG_NotreDame Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi STDERR: Kickstart executable (101-FBchannel18_cwt-avgResults.Rdata) not found STDOUT: Errors detected. Cleanup not done. Execution completed with errors sys:throw @ vdl.k, line: 140 vdl:mains @ sid-wf1.kml, line: 399 at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) I found that there are about 8 sites in OSG having the problem. Many thanks, Jing From hategan at mcs.anl.gov Sun Aug 19 20:51:04 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 19 Aug 2007 20:51:04 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: Message-ID: <1187574664.6412.0.camel@blabla.mcs.anl.gov> On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > Hi, > > I am working on SID application now. Job cwtsmall is a script > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > on input data 101_FB-epochs.Rdata, and should output > 101-FBchannel1_cwt-avgResults.Rdata to > 101-FBchannel28_cwt-avgResults.Rdata > these 28 files. > > But when I runed swift client with kickstart.enabled = false, Where did you set this? Mihael > it had > the exit code 1024 error. And the stderr.txt said: Kickstart > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > below: > > site: AGLT2 > gatekeeper: gate01.aglt2.org > app_dir: /atlas/data08/OSG/APP/SIDGrid > data_dir: /atlas/data08/OSG/DATA > condor_dir: /opt/condor/bin > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > output: > Application exception: Job cwtsmall failed with an exit code of 1024 > sys:throw @ vdl-int.k, line: 109 > vdl:checkexitcode @ vdl-int.k, line: 370 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ sid-wf1.kml, line: 20 > wavelettransf @ sid-wf1.kml, line: 362 > batchtrials @ sid-wf1.kml, line: 402 > vdl:mains @ sid-wf1.kml, line: 399 > cwtsmall failed > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > The following errors have occurred: > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > Host: NWICG_NotreDame > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > STDERR: Kickstart executable > (101-FBchannel18_cwt-avgResults.Rdata) not found > STDOUT: > Errors detected. Cleanup not done. > Execution completed with errors > sys:throw @ vdl.k, line: 140 > vdl:mains @ sid-wf1.kml, line: 399 > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > I found that there are about 8 sites in OSG having the problem. > > Many thanks, > Jing > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From hategan at mcs.anl.gov Mon Aug 20 00:06:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 00:06:49 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> Message-ID: <1187586409.13110.0.camel@blabla.mcs.anl.gov> It puzzles me. Can you attach that file? On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > in $SWIFT_HOME/etc/swift.properties > > > Jing > > On 8/19/07, Mihael Hategan wrote: > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > Hi, > > > > > > I am working on SID application now. Job cwtsmall is a script > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > > > on input data 101_FB-epochs.Rdata, and should output > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > 101-FBchannel28_cwt-avgResults.Rdata > > > these 28 files. > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > Where did you set this? > > > > Mihael > > > > > it had > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > > > below: > > > > > > site: AGLT2 > > > gatekeeper: gate01.aglt2.org > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > data_dir: /atlas/data08/OSG/DATA > > > condor_dir: /opt/condor/bin > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > output: > > > Application exception: Job cwtsmall failed with an exit code of 1024 > > > sys:throw @ vdl-int.k, line: 109 > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > vdl:execute2 @ execute-default.k, line: 22 > > > vdl:execute @ sid-wf1.kml, line: 20 > > > wavelettransf @ sid-wf1.kml, line: 362 > > > batchtrials @ sid-wf1.kml, line: 402 > > > vdl:mains @ sid-wf1.kml, line: 399 > > > cwtsmall failed > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > The following errors have occurred: > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > Host: NWICG_NotreDame > > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > STDERR: Kickstart executable > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > STDOUT: > > > Errors detected. Cleanup not done. > > > Execution completed with errors > > > sys:throw @ vdl.k, line: 140 > > > vdl:mains @ sid-wf1.kml, line: 399 > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > I found that there are about 8 sites in OSG having the problem. > > > > > > Many thanks, > > > Jing > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > From tiejing at gmail.com Mon Aug 20 00:43:08 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 00:43:08 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: <1187586409.13110.0.camel@blabla.mcs.anl.gov> References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> Message-ID: Sure. On 8/20/07, Mihael Hategan wrote: > It puzzles me. Can you attach that file? > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > in $SWIFT_HOME/etc/swift.properties > > > > > > Jing > > > > On 8/19/07, Mihael Hategan wrote: > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > Hi, > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > > > > on input data 101_FB-epochs.Rdata, and should output > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > 101-FBchannel28_cwt-avgResults.Rdata > > > > these 28 files. > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > Where did you set this? > > > > > > Mihael > > > > > > > it had > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > > > > below: > > > > > > > > site: AGLT2 > > > > gatekeeper: gate01.aglt2.org > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > data_dir: /atlas/data08/OSG/DATA > > > > condor_dir: /opt/condor/bin > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > output: > > > > Application exception: Job cwtsmall failed with an exit code of 1024 > > > > sys:throw @ vdl-int.k, line: 109 > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > cwtsmall failed > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > The following errors have occurred: > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > Host: NWICG_NotreDame > > > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > STDERR: Kickstart executable > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > STDOUT: > > > > Errors detected. Cleanup not done. > > > > Execution completed with errors > > > > sys:throw @ vdl.k, line: 140 > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > > > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > > > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > I found that there are about 8 sites in OSG having the problem. > > > > > > > > Many thanks, > > > > Jing > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: swift.properties Type: application/octet-stream Size: 5931 bytes Desc: not available URL: From tiejing at gmail.com Sun Aug 19 21:37:14 2007 From: tiejing at gmail.com (Jing Tie) Date: Sun, 19 Aug 2007 21:37:14 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: <1187574664.6412.0.camel@blabla.mcs.anl.gov> References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> Message-ID: in $SWIFT_HOME/etc/swift.properties Jing On 8/19/07, Mihael Hategan wrote: > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > Hi, > > > > I am working on SID application now. Job cwtsmall is a script > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > > on input data 101_FB-epochs.Rdata, and should output > > 101-FBchannel1_cwt-avgResults.Rdata to > > 101-FBchannel28_cwt-avgResults.Rdata > > these 28 files. > > > > But when I runed swift client with kickstart.enabled = false, > > Where did you set this? > > Mihael > > > it had > > the exit code 1024 error. And the stderr.txt said: Kickstart > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > > below: > > > > site: AGLT2 > > gatekeeper: gate01.aglt2.org > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > data_dir: /atlas/data08/OSG/DATA > > condor_dir: /opt/condor/bin > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > output: > > Application exception: Job cwtsmall failed with an exit code of 1024 > > sys:throw @ vdl-int.k, line: 109 > > vdl:checkexitcode @ vdl-int.k, line: 370 > > vdl:execute2 @ execute-default.k, line: 22 > > vdl:execute @ sid-wf1.kml, line: 20 > > wavelettransf @ sid-wf1.kml, line: 362 > > batchtrials @ sid-wf1.kml, line: 402 > > vdl:mains @ sid-wf1.kml, line: 399 > > cwtsmall failed > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > The following errors have occurred: > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > Host: NWICG_NotreDame > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > STDERR: Kickstart executable > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > STDOUT: > > Errors detected. Cleanup not done. > > Execution completed with errors > > sys:throw @ vdl.k, line: 140 > > vdl:mains @ sid-wf1.kml, line: 399 > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > I found that there are about 8 sites in OSG having the problem. > > > > Many thanks, > > Jing > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > From hategan at mcs.anl.gov Mon Aug 20 09:32:56 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 09:32:56 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> Message-ID: <1187620376.14708.2.camel@blabla.mcs.anl.gov> Right. The condor job manager has a bug. It does not properly quote arguments. So you'll see strange things like this if you use it. Mihael On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > Sure. > > On 8/20/07, Mihael Hategan wrote: > > It puzzles me. Can you attach that file? > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > Jing > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > Hi, > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > 101-FBchannel28_cwt-avgResults.Rdata > > > > > these 28 files. > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > Where did you set this? > > > > > > > > Mihael > > > > > > > > > it had > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > > > > > below: > > > > > > > > > > site: AGLT2 > > > > > gatekeeper: gate01.aglt2.org > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > condor_dir: /opt/condor/bin > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > output: > > > > > Application exception: Job cwtsmall failed with an exit code of 1024 > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > cwtsmall failed > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > The following errors have occurred: > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > Host: NWICG_NotreDame > > > > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > STDERR: Kickstart executable > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > STDOUT: > > > > > Errors detected. Cleanup not done. > > > > > Execution completed with errors > > > > > sys:throw @ vdl.k, line: 140 > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > > > > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > I found that there are about 8 sites in OSG having the problem. > > > > > > > > > > Many thanks, > > > > > Jing > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > From tiejing at gmail.com Mon Aug 20 11:13:03 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 11:13:03 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: <1187620376.14708.2.camel@blabla.mcs.anl.gov> References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> Message-ID: Right, it's the problem of condor. After replacing jobmanager-condor with jobmanager, the job finished successfully. Thanks, Jing On 8/20/07, Mihael Hategan wrote: > Right. The condor job manager has a bug. It does not properly quote > arguments. So you'll see strange things like this if you use it. > > Mihael > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > Sure. > > > > On 8/20/07, Mihael Hategan wrote: > > > It puzzles me. Can you attach that file? > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > Jing > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > Hi, > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs runWaveletsAvg.R > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > 101-FBchannel28_cwt-avgResults.Rdata > > > > > > these 28 files. > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > Where did you set this? > > > > > > > > > > Mihael > > > > > > > > > > > it had > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. Details > > > > > > below: > > > > > > > > > > > > site: AGLT2 > > > > > > gatekeeper: gate01.aglt2.org > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > condor_dir: /opt/condor/bin > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > output: > > > > > > Application exception: Job cwtsmall failed with an exit code of 1024 > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > cwtsmall failed > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > The following errors have occurred: > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1024) > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > Host: NWICG_NotreDame > > > > > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > STDERR: Kickstart executable > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > STDOUT: > > > > > > Errors detected. Cleanup not done. > > > > > > Execution completed with errors > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > > at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post(GenerateErrorNode.java:28) > > > > > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:239) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > > > > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > > > I found that there are about 8 sites in OSG having the problem. > > > > > > > > > > > > Many thanks, > > > > > > Jing > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > From tiejing at gmail.com Mon Aug 20 12:07:46 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 12:07:46 -0500 Subject: [Swift-user] Exception in getFile Message-ID: Hi, Here is another problem. It seems like something wrong with GFS system. site: MIT_CMS gatekeeper: ce01.cmsaf.mit.edu app_dir: /osg/app data_dir: /osg/data condor_dir: /usr/local/condor/bin R_dir: /osg/app/R-2.5.1/bin/R output: Application exception: Exception in getFile task:transfer @ vdl-int.k, line: 235 vdl:dostageout @ vdl-int.k, line: 378 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ sid-wf1.kml, line: 20 wavelettransf @ sid-wf1.kml, line: 362 batchtrials @ sid-wf1.kml, line: 402 vdl:mains @ sid-wf1.kml, line: 399 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Exception in getFile Caused by: org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: (error code 1) cwtsmall failed Provenance graph saved in sid-wf1-7thy5mbfh09e1.dot The following errors have occurred: 1. Application "cwtsmall" failed (Exception in getFile Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException : Custom message: Unexpected reply: 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- avgResults.Rdata 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End.]) Arguments: "scripts/runWaveletsAvg.R, 101, FB" Host: UCSDT2 Directory: sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi STDERR: STDOUT: Errors detected. Cleanup not done. Execution completed with errors sys:throw @ vdl.k, line: 140 vdl:mains @ sid-wf1.kml, line: 399 at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java :413) at org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java :417) at org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post ( GenerateErrorNode.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( Sequential.java :33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event( FlowNode.java:334) at org.globus.cog.karajan.workflow.events.EventBus.send( EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked ( EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( FlowNode.java:172) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete( FlowNode.java:298) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java:37) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart ( FlowNode.java:239) at org.globus.cog.karajan.workflow.nodes.FlowNode.start( FlowNode.java:280) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent( FlowNode.java:392) at org.globus.cog.karajan.workflow.nodes.FlowNode.event ( FlowNode.java:331) at org.globus.cog.karajan.workflow.FlowElementWrapper.event( FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send( EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked ( EventBus.java:97) at org.globus.cog.karajan.workflow.events.EventWorker.run( EventWorker.java:69) Many thanks, Jing -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 20 12:17:24 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 12:17:24 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: Message-ID: <1187630244.20900.0.camel@blabla.mcs.anl.gov> Not much we can do if the filesystem is broken. Did you check to confirm that the file is not there? Mihael On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > Hi, > > Here is another problem. It seems like something wrong with GFS > system. > > site: MIT_CMS > gatekeeper: ce01.cmsaf.mit.edu > app_dir: /osg/app > data_dir: /osg/data > condor_dir: /usr/local/condor/bin > R_dir: /osg/app/R-2.5.1/bin/R > > output: > Application exception: Exception in getFile > task:transfer @ vdl-int.k, line: 235 > vdl:dostageout @ vdl-int.k, line: 378 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ sid-wf1.kml, line: 20 > wavelettransf @ sid-wf1.kml, line: 362 > batchtrials @ sid-wf1.kml, line: 402 > vdl:mains @ sid-wf1.kml, line: 399 > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > Exception in getFile > Caused by: org.globus.ftp.exception.ServerException: Server refused > performing the request. Custom message: (error code 1) cwtsmall > failed > Provenance graph saved in sid-wf1-7thy5mbfh09e1.dot > The following errors have occurred: > 1. Application "cwtsmall" failed (Exception in getFile > Caused by: > Server refused performing the request. Custom message: (error code > 1) > [Nested exception message: Nested exception is > org.globus.ftp.exception.UnexpectedReplyCodeException : > Custom message: Unexpected reply: > 500-Command failed. : > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt-avgResults.Rdata > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End.]) > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > Host: UCSDT2 > Directory: sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > STDERR: > STDOUT: > Errors detected. Cleanup not done. > Execution completed with errors > sys:throw @ vdl.k, line: 140 > vdl:mains @ sid-wf1.kml, line: 399 > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > at > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > (GenerateErrorNode.java:28) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked > (EventBus.java:97) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:37) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart > (FlowNode.java:239) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event > (FlowNode.java:331) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked > (EventBus.java:97) > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > Many thanks, > Jing > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From tiejing at gmail.com Mon Aug 20 12:21:21 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 12:21:21 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187630244.20900.0.camel@blabla.mcs.anl.gov> References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> Message-ID: Yes. There is no *avgResults.Rdata under shared directory, only input file, scripts, wrapper.sh and seq.sh. Jing On 8/20/07, Mihael Hategan wrote: > > Not much we can do if the filesystem is broken. > Did you check to confirm that the file is not there? > > Mihael > > On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > > Hi, > > > > Here is another problem. It seems like something wrong with GFS > > system. > > > > site: MIT_CMS > > gatekeeper: ce01.cmsaf.mit.edu > > app_dir: /osg/app > > data_dir: /osg/data > > condor_dir: /usr/local/condor/bin > > R_dir: /osg/app/R-2.5.1/bin/R > > > > output: > > Application exception: Exception in getFile > > task:transfer @ vdl-int.k, line: 235 > > vdl:dostageout @ vdl-int.k, line: 378 > > vdl:execute2 @ execute-default.k, line: 22 > > vdl:execute @ sid-wf1.kml, line: 20 > > wavelettransf @ sid-wf1.kml, line: 362 > > batchtrials @ sid-wf1.kml, line: 402 > > vdl:mains @ sid-wf1.kml, line: 399 > > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > > Exception in getFile > > Caused by: org.globus.ftp.exception.ServerException: Server refused > > performing the request. Custom message: (error code 1) cwtsmall > > failed > > Provenance graph saved in sid-wf1-7thy5mbfh09e1.dot > > The following errors have occurred: > > 1. Application "cwtsmall" failed (Exception in getFile > > Caused by: > > Server refused performing the request. Custom message: (error code > > 1) > > [Nested exception message: Nested exception is > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > Custom message: Unexpected reply: > > 500-Command failed. : > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > 500-globus_l_gfs_file_open failed. > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > 500-globus_xio_register_open failed. > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > 500-Unable to open > > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- > avgResults.Rdata > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > 500-System error in open: No such file or directory > > 500-globus_xio: A system call failed: No such file or directory > > 500 End.]) > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > Host: UCSDT2 > > Directory: sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > STDERR: > > STDOUT: > > Errors detected. Cleanup not done. > > Execution completed with errors > > sys:throw @ vdl.k, line: 140 > > vdl:mains @ sid-wf1.kml, line: 399 > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > (GenerateErrorNode.java:28) > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( > Sequential.java :33) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > at > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > (EventBus.java:97) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( > FlowNode.java:172) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java > :298) > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren > (AbstractFunction.java:37) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( > FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > (FlowNode.java:239) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent( > FlowNode.java:392) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.event > > (FlowNode.java:331) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.event( > FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > (EventBus.java:97) > > at > > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java > :69) > > > > Many thanks, > > Jing > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiejing at gmail.com Mon Aug 20 12:22:03 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 12:22:03 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> Message-ID: Hi, There is one site running the application successfully with jobmanager-condor: site: GLOW gatekeeper: cmsgrid01.hep.wisc.edu app_dir: /afs/hep.wisc.edu/osg/app data_dir: /afs/hep.wisc.edu/osg/data condor_dir: /condor/bin R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R Maybe it has some special configurations or arguments. Jing On 8/20/07, Jing Tie wrote: > > Right, it's the problem of condor. After replacing jobmanager-condor > with jobmanager, the job finished successfully. > > Thanks, > Jing > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > Right. The condor job manager has a bug. It does not properly quote > > arguments. So you'll see strange things like this if you use it. > > > > Mihael > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > > Sure. > > > > > > On 8/20/07, Mihael Hategan wrote: > > > > It puzzles me. Can you attach that file? > > > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > > > > Jing > > > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs > runWaveletsAvg.R > > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata > > > > > > > these 28 files. > > > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > > > Where did you set this? > > > > > > > > > > > > Mihael > > > > > > > > > > > > > it had > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. > Details > > > > > > > below: > > > > > > > > > > > > > > site: AGLT2 > > > > > > > gatekeeper: gate01.aglt2.org > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > > condor_dir: /opt/condor/bin > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > > > output: > > > > > > > Application exception: Job cwtsmall failed with an exit code > of 1024 > > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > > vdl:execute2 @ execute-default.k , line: 22 > > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > cwtsmall failed > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > > The following errors have occurred: > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an > exit code of 1024) > > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > > Host: NWICG_NotreDame > > > > > > > Directory: sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > > STDERR: Kickstart executable > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > > STDOUT: > > > > > > > Errors detected. Cleanup not done. > > > > > > > Execution completed with errors > > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail( > FlowNode.java:413) > > > > > > > at org.globus.cog.karajan.workflow.nodes.FlowNode.fail > (FlowNode.java:417) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post ( > GenerateErrorNode.java:28) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent ( > Sequential.java:33) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.send (EventBus.java:123) > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java > :97) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent ( > FlowNode.java:172) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren( > AbstractFunction.java:37) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( > FlowContainer.java:63) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart (FlowNode.java:239) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent (FlowNode.java > :392) > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event ( > FlowElementWrapper.java:227) > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked (EventBus.java > :97) > > > > > > > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java > :69) > > > > > > > > > > > > > > I found that there are about 8 sites in OSG having the > problem. > > > > > > > > > > > > > > Many thanks, > > > > > > > Jing > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 20 12:23:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 12:23:44 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> Message-ID: <1187630624.21340.0.camel@blabla.mcs.anl.gov> On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > Yes. There is no *avgResults.Rdata under shared directory, only input > file, scripts, wrapper.sh and seq.sh. Did the job actually run? > > Jing > > On 8/20/07, Mihael Hategan wrote: > Not much we can do if the filesystem is broken. > Did you check to confirm that the file is not there? > > Mihael > > On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > > Hi, > > > > Here is another problem. It seems like something wrong with > GFS > > system. > > > > site: MIT_CMS > > gatekeeper: ce01.cmsaf.mit.edu > > app_dir: /osg/app > > data_dir: /osg/data > > condor_dir: /usr/local/condor/bin > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > output: > > Application exception: Exception in getFile > > task:transfer @ vdl-int.k, line: 235 > > vdl:dostageout @ vdl-int.k, line: 378 > > vdl:execute2 @ execute-default.k, line: 22 > > vdl:execute @ sid-wf1.kml, line: 20 > > wavelettransf @ sid-wf1.kml, line: 362 > > batchtrials @ sid-wf1.kml, line: 402 > > vdl:mains @ sid-wf1.kml , line: 399 > > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > > Exception in getFile > > Caused by: org.globus.ftp.exception.ServerException: Server > refused > > performing the request. Custom message: (error code > 1) cwtsmall > > failed > > Provenance graph saved in sid-wf1-7thy5mbfh09e1.dot > > The following errors have occurred: > > 1. Application "cwtsmall" failed (Exception in getFile > > Caused by: > > Server refused performing the request. Custom > message: (error code > > 1) > > [Nested exception message: Nested exception is > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > Custom message: Unexpected reply: > > 500-Command failed. : > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > 500-globus_l_gfs_file_open failed. > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > 500-globus_xio_register_open failed. > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > 500-Unable to open > > > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt-avgResults.Rdata > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > 500-System error in open: No such file or directory > > 500-globus_xio: A system call failed: No such file or > directory > > 500 End.]) > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > Host: UCSDT2 > > Directory: sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > STDERR: > > STDOUT: > > Errors detected. Cleanup not done. > > Execution completed with errors > > sys:throw @ vdl.k, line: 140 > > vdl:mains @ sid-wf1.kml, line: 399 > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > (GenerateErrorNode.java:28) > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > at > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > (EventBus.java:97) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java:37) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > (FlowNode.java :239) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > (FlowNode.java:331) > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > at > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > (EventBus.java:97) > > at > > org.globus.cog.karajan.workflow.events.EventWorker.run > (EventWorker.java:69) > > > > Many thanks, > > Jing > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From tiejing at gmail.com Mon Aug 20 12:28:53 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 12:28:53 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187630624.21340.0.camel@blabla.mcs.anl.gov> References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> Message-ID: Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to 101-FBchannel28_cwt- avgResults.Rdata 28 output files on the swift client, but all the files were empty. Jing On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > > Yes. There is no *avgResults.Rdata under shared directory, only input > > file, scripts, wrapper.sh and seq.sh. > > Did the job actually run? > > > > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > Not much we can do if the filesystem is broken. > > Did you check to confirm that the file is not there? > > > > Mihael > > > > On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > > > Hi, > > > > > > Here is another problem. It seems like something wrong with > > GFS > > > system. > > > > > > site: MIT_CMS > > > gatekeeper: ce01.cmsaf.mit.edu > > > app_dir: /osg/app > > > data_dir: /osg/data > > > condor_dir: /usr/local/condor/bin > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > output: > > > Application exception: Exception in getFile > > > task:transfer @ vdl-int.k, line: 235 > > > vdl:dostageout @ vdl-int.k, line: 378 > > > vdl:execute2 @ execute-default.k, line: 22 > > > vdl:execute @ sid-wf1.kml, line: 20 > > > wavelettransf @ sid-wf1.kml, line: 362 > > > batchtrials @ sid-wf1.kml, line: 402 > > > vdl:mains @ sid-wf1.kml , line: 399 > > > Caused by: > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > Exception in getFile > > > Caused by: org.globus.ftp.exception.ServerException: Server > > refused > > > performing the request. Custom message: (error code > > 1) cwtsmall > > > failed > > > Provenance graph saved in sid-wf1-7thy5mbfh09e1.dot > > > The following errors have occurred: > > > 1. Application "cwtsmall" failed (Exception in getFile > > > Caused by: > > > Server refused performing the request. Custom > > message: (error code > > > 1) > > > [Nested exception message: Nested exception is > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > Custom message: Unexpected reply: > > > 500-Command failed. : > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > 500-globus_l_gfs_file_open failed. > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > 500-globus_xio_register_open failed. > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > 500-Unable to open > > > > > file > /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- > avgResults.Rdata > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > 500-System error in open: No such file or directory > > > 500-globus_xio: A system call failed: No such file or > > directory > > > 500 End.]) > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > Host: UCSDT2 > > > Directory: sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > STDERR: > > > STDOUT: > > > Errors detected. Cleanup not done. > > > Execution completed with errors > > > sys:throw @ vdl.k, line: 140 > > > vdl:mains @ sid-wf1.kml, line: 399 > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail( > FlowNode.java:413) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail( > FlowNode.java:417) > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > (GenerateErrorNode.java:28) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( > Sequential.java :33) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event ( > FlowNode.java:334) > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.send( > EventBus.java:123) > > > at > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > (EventBus.java:97) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent ( > FlowNode.java:172) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete( > FlowNode.java:298) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren( > AbstractFunction.java:37) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( > FlowContainer.java:63) > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > (FlowNode.java :239) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start( > FlowNode.java :280) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent( > FlowNode.java:392) > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > (FlowNode.java:331) > > > at > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event( > FlowElementWrapper.java:227) > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.send( > EventBus.java:123) > > > at > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > ( EventBus.java:97) > > > at > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > (EventWorker.java:69) > > > > > > Many thanks, > > > Jing > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 20 12:52:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 12:52:34 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> Message-ID: <1187632354.22920.0.camel@blabla.mcs.anl.gov> But those are not from the same job. On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > 101-FBchannel28_cwt-avgResults.Rdata 28 output files on the swift > client, but all the files were empty. > > Jing > > > On 8/20/07, Mihael Hategan wrote: > On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > > Yes. There is no *avgResults.Rdata under shared directory, > only input > > file, scripts, wrapper.sh and seq.sh. > > Did the job actually run? > > > > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > Not much we can do if the filesystem is broken. > > Did you check to confirm that the file is not > there? > > > > Mihael > > > > On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > > > Hi, > > > > > > Here is another problem. It seems like something > wrong with > > GFS > > > system. > > > > > > site: MIT_CMS > > > gatekeeper: ce01.cmsaf.mit.edu > > > app_dir: /osg/app > > > data_dir: /osg/data > > > condor_dir: /usr/local/condor/bin > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > output: > > > Application exception: Exception in getFile > > > task:transfer @ vdl-int.k, line: 235 > > > vdl:dostageout @ vdl-int.k, line: 378 > > > vdl:execute2 @ execute-default.k, line: 22 > > > vdl:execute @ sid-wf1.kml, line: 20 > > > wavelettransf @ sid-wf1.kml, line: 362 > > > batchtrials @ sid-wf1.kml, line: 402 > > > vdl:mains @ sid-wf1.kml , line: 399 > > > Caused by: > > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > Exception in getFile > > > Caused by: > org.globus.ftp.exception.ServerException: Server > > refused > > > performing the request. Custom message: (error > code > > 1) cwtsmall > > > failed > > > Provenance graph saved in > sid-wf1-7thy5mbfh09e1.dot > > > The following errors have occurred: > > > 1. Application "cwtsmall" failed (Exception in > getFile > > > Caused by: > > > Server refused performing the request. Custom > > message: (error code > > > 1) > > > [Nested exception message: Nested exception is > > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > Custom message: Unexpected reply: > > > 500-Command failed. : > > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > 500-globus_l_gfs_file_open failed. > > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > 500-globus_xio_register_open failed. > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > 500-Unable to open > > > > > > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- avgResults.Rdata > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > 500-System error in open: No such file or > directory > > > 500-globus_xio: A system call failed: No such file > or > > directory > > > 500 End.]) > > > Arguments: "scripts/runWaveletsAvg.R, 101, > FB" > > > Host: UCSDT2 > > > Directory: > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > STDERR: > > > STDOUT: > > > Errors detected. Cleanup not done. > > > Execution completed with errors > > > sys:throw @ vdl.k, line: 140 > > > vdl:mains @ sid-wf1.kml, line: 399 > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:413) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > at > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > (GenerateErrorNode.java:28) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33) > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > (FlowNode.java:334) > > > at > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > (EventBus.java:97) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java:37) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > (FlowNode.java :239) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java :280) > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > (FlowNode.java:331) > > > at > > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > at > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > ( EventBus.java:97) > > > at > > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > (EventWorker.java:69) > > > > > > Many thanks, > > > Jing > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > From tiejing at gmail.com Mon Aug 20 13:43:11 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 13:43:11 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187632354.22920.0.camel@blabla.mcs.anl.gov> References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> Message-ID: I think these files were from the job. Because I deleted all the * Results.Rdata before the job submitting, and found these empty files after the execution. output of the process of execution: RunID: 3szhlhvg4seu0 cwtsmall started Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status to Active Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status to Completed Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status to Submitted Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status to Active Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status to Completed ... Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453) setting status to Completed Staged in scripts/runWaveletsAvg.R to sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS Running job cwtsmall-gt3062gi cwtsmall with arguments [scripts/runWaveletsAvg.R, 101, FB] in sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status to Submitted Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status to Active Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status to Completed Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status to Active Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status to Completed Completed job cwtsmall-gt3062gi cwtsmall with arguments [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS Staging out sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt- avgResults.Rdata to 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting status to Active Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting status to Completed ...... Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557) setting status to Active Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554) setting status to Failed Exception in getFile Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560) setting status to Submitted ...... Thanks, Jing On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > But those are not from the same job. > > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > > 101-FBchannel28_cwt-avgResults.Rdata 28 output files on the swift > > client, but all the files were empty. > > > > Jing > > > > > > On 8/20/07, Mihael Hategan wrote: > > On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > > > Yes. There is no * avgResults.Rdata under shared directory, > > only input > > > file, scripts, wrapper.sh and seq.sh. > > > > Did the job actually run? > > > > > > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > Not much we can do if the filesystem is broken. > > > Did you check to confirm that the file is not > > there? > > > > > > Mihael > > > > > > On Mon, 2007-08-20 at 12:07 -0500, Jing Tie wrote: > > > > Hi, > > > > > > > > Here is another problem. It seems like something > > wrong with > > > GFS > > > > system. > > > > > > > > site: MIT_CMS > > > > gatekeeper: ce01.cmsaf.mit.edu > > > > app_dir: /osg/app > > > > data_dir: /osg/data > > > > condor_dir: /usr/local/condor/bin > > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > > > output: > > > > Application exception: Exception in getFile > > > > task:transfer @ vdl-int.k, line: 235 > > > > vdl:dostageout @ vdl-int.k, line: 378 > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > vdl:mains @ sid-wf1.kml , line: 399 > > > > Caused by: > > > > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > > Exception in getFile > > > > Caused by: > > org.globus.ftp.exception.ServerException : Server > > > refused > > > > performing the request. Custom message: (error > > code > > > 1) cwtsmall > > > > failed > > > > Provenance graph saved in > > sid-wf1-7thy5mbfh09e1.dot > > > > The following errors have occurred: > > > > 1. Application "cwtsmall" failed (Exception in > > getFile > > > > Caused by: > > > > Server refused performing the request. Custom > > > message: (error code > > > > 1) > > > > [Nested exception message: Nested exception is > > > > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > > Custom message: Unexpected reply: > > > > 500-Command failed. : > > > > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > > 500-globus_l_gfs_file_open failed. > > > > > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > > 500-globus_xio_register_open failed. > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > > 500-Unable to open > > > > > > > > > file > /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- > avgResults.Rdata > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > > 500-System error in open: No such file or > > directory > > > > 500-globus_xio: A system call failed: No such file > > or > > > directory > > > > 500 End.]) > > > > Arguments: "scripts/runWaveletsAvg.R, 101, > > FB" > > > > Host: UCSDT2 > > > > Directory: > > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > > STDERR: > > > > STDOUT: > > > > Errors detected. Cleanup not done. > > > > Execution completed with errors > > > > sys:throw @ vdl.k, line: 140 > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail ( > FlowNode.java:413) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail( > FlowNode.java:417) > > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > > (GenerateErrorNode.java:28) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( > Sequential.java :33) > > > > at > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > (FlowNode.java:334) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send ( > EventBus.java:123) > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > (EventBus.java:97) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent ( > FlowNode.java:172) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete( > FlowNode.java:298) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren( > AbstractFunction.java:37) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( > FlowContainer.java:63) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > > (FlowNode.java :239) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start( > FlowNode.java :280) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent( > FlowNode.java:392) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > > (FlowNode.java:331) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event( > FlowElementWrapper.java:227) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send( > EventBus.java:123) > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > ( EventBus.java:97) > > > > at > > > > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > > ( EventWorker.java:69) > > > > > > > > Many thanks, > > > > Jing > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 20 14:03:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 14:03:09 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> Message-ID: <1187636589.25702.1.camel@blabla.mcs.anl.gov> Local empty files may be created even if the remote files don't exist. So don't take that as a sign that the application has run. In the mean time I'll try to convince it to not create empty local files, if they don't exist remotely. Mihael On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > I think these files were from the job. Because I deleted all the > *Results.Rdata before the job submitting, and found these empty files > after the execution. > > output of the process of execution: > RunID: 3szhlhvg4seu0 > cwtsmall started > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status > to Active > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status > to Completed > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > to Submitted > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > to Active > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > to Completed > ... > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453) setting > status to Completed > Staged in scripts/runWaveletsAvg.R to > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > Running job cwtsmall-gt3062gi cwtsmall with arguments > [scripts/runWaveletsAvg.R, 101, FB] in > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > to Submitted > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > to Active > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > to Completed > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status > to Active > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status > to Completed > Completed job cwtsmall-gt3062gi cwtsmall with arguments > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > Staging out > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting > status to Active > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting > status to Completed > ...... > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557) setting > status to Active > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554) setting > status to Failed Exception in getFile > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560) setting > status to Submitted > ...... > > Thanks, > Jing > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > But those are not from the same job. > > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > > 101-FBchannel28_cwt-avgResults.Rdata 28 output files on the > swift > > client, but all the files were empty. > > > > Jing > > > > > > On 8/20/07, Mihael Hategan wrote: > > On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > > > Yes. There is no * avgResults.Rdata under shared > directory, > > only input > > > file, scripts, wrapper.sh and seq.sh. > > > > Did the job actually run? > > > > > > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > wrote: > > > Not much we can do if the filesystem is > broken. > > > Did you check to confirm that the file is > not > > there? > > > > > > Mihael > > > > > > On Mon, 2007-08-20 at 12:07 -0500, Jing > Tie wrote: > > > > Hi, > > > > > > > > Here is another problem. It seems like > something > > wrong with > > > GFS > > > > system. > > > > > > > > site: MIT_CMS > > > > gatekeeper: ce01.cmsaf.mit.edu > > > > app_dir: /osg/app > > > > data_dir: /osg/data > > > > condor_dir: /usr/local/condor/bin > > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > > > output: > > > > Application exception: Exception in > getFile > > > > task:transfer @ vdl-int.k, line: > 235 > > > > vdl:dostageout @ vdl-int.k, > line: 378 > > > > vdl:execute2 @ > execute-default.k, line: 22 > > > > vdl:execute @ sid-wf1.kml, line: > 20 > > > > wavelettransf @ sid-wf1.kml, > line: 362 > > > > batchtrials @ sid-wf1.kml, line: > 402 > > > > vdl:mains @ sid-wf1.kml , line: > 399 > > > > Caused by: > > > > > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > > Exception in getFile > > > > Caused by: > > org.globus.ftp.exception.ServerException : Server > > > refused > > > > performing the request. Custom > message: (error > > code > > > 1) cwtsmall > > > > failed > > > > Provenance graph saved in > > sid-wf1-7thy5mbfh09e1.dot > > > > The following errors have occurred: > > > > 1. Application "cwtsmall" failed > (Exception in > > getFile > > > > Caused by: > > > > Server refused performing the request. > Custom > > > message: (error code > > > > 1) > > > > [Nested exception message: Nested > exception is > > > > > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > > Custom message: Unexpected reply: > > > > 500-Command failed. : > > > > > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > > 500-globus_l_gfs_file_open failed. > > > > > > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > > 500-globus_xio_register_open failed. > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > > 500-Unable to open > > > > > > > > > > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- avgResults.Rdata > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > > 500-System error in open: No such file > or > > directory > > > > 500-globus_xio: A system call failed: No > such file > > or > > > directory > > > > 500 End.]) > > > > Arguments: > "scripts/runWaveletsAvg.R, 101, > > FB" > > > > Host: UCSDT2 > > > > Directory: > > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > > STDERR: > > > > STDOUT: > > > > Errors detected. Cleanup not done. > > > > Execution completed with errors > > > > sys:throw @ vdl.k, line: 140 > > > > vdl:mains @ sid-wf1.kml, line: > 399 > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > (FlowNode.java:413) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > at > > > > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > > (GenerateErrorNode.java:28) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33) > > > > at > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > (FlowNode.java:334) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send > (EventBus.java:123) > > > > at > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > (EventBus.java:97) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java:37) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > at > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > > (FlowNode.java :239) > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start > ( FlowNode.java :280) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:392) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > > (FlowNode.java:331) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > at > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > ( EventBus.java:97) > > > > at > > > > > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > > ( EventWorker.java:69) > > > > > > > > Many thanks, > > > > Jing > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > From hategan at mcs.anl.gov Mon Aug 20 14:43:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 14:43:54 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> <1187636589.25702.1.camel@blabla.mcs.anl.gov> Message-ID: <1187639034.28035.1.camel@blabla.mcs.anl.gov> No. Swift will always try to stage out the output files if it has no indication that something went wrong with the job. But if the filesystem is broken, and the files are not actually there, well, that's what you seem to be observing. On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote: > I see. Could this output be viewed as a sign? > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > Staging out > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > Thanks, > Jing > > On 8/20/07, Mihael Hategan wrote: > Local empty files may be created even if the remote files > don't exist. > So don't take that as a sign that the application has run. > > In the mean time I'll try to convince it to not create empty > local > files, if they don't exist remotely. > > Mihael > > On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > > I think these files were from the job. Because I deleted all > the > > *Results.Rdata before the job submitting, and found these > empty files > > after the execution. > > > > output of the process of execution: > > RunID: 3szhlhvg4seu0 > > cwtsmall started > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) > setting status > > to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) > setting status > > to Completed > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > setting status > > to Submitted > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > setting status > > to Active > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > setting status > > to Completed > > ... > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453) > setting > > status to Completed > > Staged in scripts/runWaveletsAvg.R to > > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > > Running job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] in > > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > setting status > > to Submitted > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > setting status > > to Active > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > setting status > > to Completed > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) > setting status > > to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) > setting status > > to Completed > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > Staging out > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) > setting > > status to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) > setting > > status to Completed > > ...... > > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557) > setting > > status to Active > > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554) > setting > > status to Failed Exception in getFile > > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560) > setting > > status to Submitted > > ...... > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > But those are not from the same job. > > > > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > > > 101-FBchannel28_cwt-avgResults.Rdata 28 output > files on the > > swift > > > client, but all the files were empty. > > > > > > Jing > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > wrote: > > > On Mon, 2007-08-20 at 12:21 -0500, Jing > Tie wrote: > > > > Yes. There is no * avgResults.Rdata > under shared > > directory, > > > only input > > > > file, scripts, wrapper.sh and seq.sh. > > > > > > Did the job actually run? > > > > > > > > > > > Jing > > > > > > > > On 8/20/07, Mihael Hategan < > hategan at mcs.anl.gov> > > wrote: > > > > Not much we can do if the > filesystem is > > broken. > > > > Did you check to confirm that > the file is > > not > > > there? > > > > > > > > Mihael > > > > > > > > On Mon, 2007-08-20 at 12:07 > -0500, Jing > > Tie wrote: > > > > > Hi, > > > > > > > > > > Here is another problem. It > seems like > > something > > > wrong with > > > > GFS > > > > > system. > > > > > > > > > > site: MIT_CMS > > > > > gatekeeper: ce01.cmsaf.mit.edu > > > > > app_dir: /osg/app > > > > > data_dir: /osg/data > > > > > > condor_dir: /usr/local/condor/bin > > > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > > > > > output: > > > > > Application exception: > Exception in > > getFile > > > > > task:transfer @ > vdl-int.k, line: > > 235 > > > > > vdl:dostageout @ > vdl-int.k, > > line: 378 > > > > > vdl:execute2 @ > > execute-default.k, line: 22 > > > > > vdl:execute @ > sid-wf1.kml , line: > > 20 > > > > > wavelettransf @ > sid-wf1.kml, > > line: 362 > > > > > batchtrials @ > sid-wf1.kml, line: > > 402 > > > > > vdl:mains @ > sid-wf1.kml , line: > > 399 > > > > > Caused by: > > > > > > > > > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > > > Exception in getFile > > > > > Caused by: > > > org.globus.ftp.exception.ServerException : > Server > > > > refused > > > > > performing the request. Custom > > message: (error > > > code > > > > 1) cwtsmall > > > > > failed > > > > > Provenance graph saved in > > > sid-wf1-7thy5mbfh09e1.dot > > > > > The following errors have > occurred: > > > > > 1. Application "cwtsmall" > failed > > (Exception in > > > getFile > > > > > Caused by: > > > > > Server refused performing the > request. > > Custom > > > > message: (error code > > > > > 1) > > > > > [Nested exception > message: Nested > > exception is > > > > > > > > > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > > > Custom message: Unexpected > reply: > > > > > 500-Command failed. : > > > > > > > > > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > > > 500-globus_l_gfs_file_open > failed. > > > > > > > > > > > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > > > 500-globus_xio_register_open > failed. > > > > > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > > > 500-Unable to open > > > > > > > > > > > > > > > file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- avgResults.Rdata > > > > > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > > > 500-System error in open: No > such file > > or > > > directory > > > > > 500-globus_xio: A system call > failed: No > > such file > > > or > > > > directory > > > > > 500 End.]) > > > > > Arguments: > > "scripts/runWaveletsAvg.R, 101, > > > FB" > > > > > Host: UCSDT2 > > > > > Directory: > > > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > > > STDERR: > > > > > STDOUT: > > > > > Errors detected. Cleanup not > done. > > > > > Execution completed with > errors > > > > > sys:throw @ vdl.k, > line: 140 > > > > > vdl:mains @ > sid-wf1.kml, line: > > 399 > > > > > at > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > > (FlowNode.java:413) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > at > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > > > (GenerateErrorNode.java:28) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33) > > > > > at > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > (FlowNode.java:334) > > > > > at > > > > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send > > (EventBus.java:123) > > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > > (EventBus.java:97) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java :37) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute > (FlowContainer.java:63) > > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > > > ( FlowNode.java :239) > > > > > at > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start > > ( FlowNode.java :280) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > (FlowNode.java:392) > > > > > at > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > > > ( FlowNode.java:331) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event > (FlowElementWrapper.java:227) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send > (EventBus.java:123) > > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > > ( EventBus.java:97) > > > > > at > > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > > > ( EventWorker.java:69) > > > > > > > > > > Many thanks, > > > > > Jing > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > From tiejing at gmail.com Mon Aug 20 14:55:15 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 14:55:15 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187639034.28035.1.camel@blabla.mcs.anl.gov> References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> <1187636589.25702.1.camel@blabla.mcs.anl.gov> <1187639034.28035.1.camel@blabla.mcs.anl.gov> Message-ID: I see. So at this point, the problem could be caused by two reasons: 1. GFS system is broken, and missed the output files; 2. Swift has problem to create output files. Is it right? Thanks, Jing On 8/20/07, Mihael Hategan wrote: > > No. Swift will always try to stage out the output files if it has no > indication that something went wrong with the job. But if the filesystem > is broken, and the files are not actually there, well, that's what you > seem to be observing. > > On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote: > > I see. Could this output be viewed as a sign? > > > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > Staging out > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan wrote: > > Local empty files may be created even if the remote files > > don't exist. > > So don't take that as a sign that the application has run. > > > > In the mean time I'll try to convince it to not create empty > > local > > files, if they don't exist remotely. > > > > Mihael > > > > On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > > > I think these files were from the job. Because I deleted all > > the > > > *Results.Rdata before the job submitting, and found these > > empty files > > > after the execution. > > > > > > output of the process of execution: > > > RunID: 3szhlhvg4seu0 > > > cwtsmall started > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) > > setting status > > > to Active > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) > > setting status > > > to Completed > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Submitted > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Active > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Completed > > > ... > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453) > > setting > > > status to Completed > > > Staged in scripts/runWaveletsAvg.R to > > > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > > > Running job cwtsmall-gt3062gi cwtsmall with arguments > > > [scripts/runWaveletsAvg.R, 101, FB] in > > > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Submitted > > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Active > > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Completed > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) > > setting status > > > to Active > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) > > setting status > > > to Completed > > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > > Staging out > > > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt- > avgResults.Rdata to > > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > setting > > > status to Active > > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > setting > > > status to Completed > > > ...... > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557) > > setting > > > status to Active > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554) > > setting > > > status to Failed Exception in getFile > > > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560) > > setting > > > status to Submitted > > > ...... > > > > > > Thanks, > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > But those are not from the same job. > > > > > > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > > > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > > > > 101-FBchannel28_cwt-avgResults.Rdata 28 output > > files on the > > > swift > > > > client, but all the files were empty. > > > > > > > > Jing > > > > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > > wrote: > > > > On Mon, 2007-08-20 at 12:21 -0500, Jing > > Tie wrote: > > > > > Yes. There is no * avgResults.Rdata > > under shared > > > directory, > > > > only input > > > > > file, scripts, wrapper.sh and seq.sh. > > > > > > > > Did the job actually run? > > > > > > > > > > > > > > Jing > > > > > > > > > > On 8/20/07, Mihael Hategan < > > hategan at mcs.anl.gov> > > > wrote: > > > > > Not much we can do if the > > filesystem is > > > broken. > > > > > Did you check to confirm that > > the file is > > > not > > > > there? > > > > > > > > > > Mihael > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 20 14:58:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 20 Aug 2007 14:58:47 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> <1187636589.25702.1.camel@blabla.mcs.anl.gov> <1187639034.28035.1.camel@blabla.mcs.anl.gov> Message-ID: <1187639927.28827.1.camel@blabla.mcs.anl.gov> On Mon, 2007-08-20 at 14:55 -0500, Jing Tie wrote: > I see. So at this point, the problem could be caused by two reasons: > 1. GFS system is broken, and missed the output files; > 2. Swift has problem to create output files. > > Is it right? Swift doesn't really create output files. It's the application that does. So I don't see how (2) can be the problem. There are other possibilities, including the application not actually having run correctly, and thus not having produced the output files. > > Thanks, > Jing > > On 8/20/07, Mihael Hategan wrote: > No. Swift will always try to stage out the output files if it > has no > indication that something went wrong with the job. But if the > filesystem > is broken, and the files are not actually there, well, that's > what you > seem to be observing. > > On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote: > > I see. Could this output be viewed as a sign? > > > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > Staging out > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > Local empty files may be created even if the remote > files > > don't exist. > > So don't take that as a sign that the application > has run. > > > > In the mean time I'll try to convince it to not > create empty > > local > > files, if they don't exist remotely. > > > > Mihael > > > > On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > > > I think these files were from the job. Because I > deleted all > > the > > > *Results.Rdata before the job submitting, and > found these > > empty files > > > after the execution. > > > > > > output of the process of execution: > > > RunID: 3szhlhvg4seu0 > > > cwtsmall started > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-1187633646429) > > setting status > > > to Active > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-1187633646429) > > setting status > > > to Completed > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Submitted > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Active > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-1187633646432) > > setting status > > > to Completed > > > ... > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-1-1187633646453) > > setting > > > status to Completed > > > Staged in scripts/runWaveletsAvg.R to > > > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > > > Running job cwtsmall-gt3062gi cwtsmall with > arguments > > > [scripts/runWaveletsAvg.R, 101, FB] in > > > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > > > Task(type=1, > identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Submitted > > > Task(type=1, > identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Active > > > Task(type=1, > identity=urn:0-0-0-1-0-1-0-1187633646457) > > setting status > > > to Completed > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-1187633646459) > > setting status > > > to Active > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-1187633646459) > > setting status > > > to Completed > > > Completed job cwtsmall-gt3062gi cwtsmall with > arguments > > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > > Staging out > > > > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > > 101-FBchannel15_cwt- avgResults.Rdata from MIT_CMS > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > setting > > > status to Active > > > Task(type=4, > identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > setting > > > status to Completed > > > ...... > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-23-1187633646557) > > setting > > > status to Active > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-22-1187633646554) > > setting > > > status to Failed Exception in getFile > > > Task(type=2, > identity=urn:0-0-0-1-0-1-0-2-1187633646560) > > setting > > > status to Submitted > > > ...... > > > > > > Thanks, > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > wrote: > > > But those are not from the same job. > > > > > > On Mon, 2007-08-20 at 12:28 -0500, Jing > Tie wrote: > > > > Yes. I saw > 101-FBchannel1_cwt-avgResults.Rdata to > > > > 101-FBchannel28_cwt-avgResults.Rdata 28 > output > > files on the > > > swift > > > > client, but all the files were empty. > > > > > > > > Jing > > > > > > > > > > > > On 8/20/07, Mihael Hategan < > hategan at mcs.anl.gov> > > wrote: > > > > On Mon, 2007-08-20 at 12:21 > -0500, Jing > > Tie wrote: > > > > > Yes. There is no * > avgResults.Rdata > > under shared > > > directory, > > > > only input > > > > > file, scripts, wrapper.sh and > seq.sh . > > > > > > > > Did the job actually run? > > > > > > > > > > > > > > Jing > > > > > > > > > > On 8/20/07, Mihael Hategan < > > hategan at mcs.anl.gov> > > > wrote: > > > > > Not much we can do if > the > > filesystem is > > > broken. > > > > > Did you check to > confirm that > > the file is > > > not > > > > there? > > > > > > > > > > Mihael > > > > > > > From tiejing at gmail.com Tue Aug 21 12:18:48 2007 From: tiejing at gmail.com (Jing Tie) Date: Tue, 21 Aug 2007 12:18:48 -0500 Subject: [Swift-user] cannot find R executable on WNs Message-ID: Hi, When running SID application on GROW-UNI-P site, the WN got the wrong R directory. Here is the detail: site: GROW-UNI-P gatekeeper: grow.cs.uni.edu app_dir: /mnt/nfs/user/worker/app data_dir: /mnt/nfs/user/worker/data pbs_dir: /usr/local/bin output: ----------------------- Application exception: Job cwtsmall failed with an exit code of 1 sys:throw @ vdl-int.k , line: 109 vdl:checkexitcode @ vdl-int.k, line: 370 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ sid-wf1.kml, line: 20 wavelettransf @ sid-wf1.kml, line: 362 batchtrials @ sid-wf1.kml, line: 402 vdl:mains @ sid-wf1.kml, line: 399 cwtsmall failed Provenance graph saved in sid-wf1-9cs3c2egp8fi1.dot The following errors have occurred: 1. Application "cwtsmall" failed (Job cwtsmall failed with an exit code of 1) Arguments: "scripts/runWaveletsAvg.R, 101, FB" Host: GROW-UNI-P Directory: sid-wf1-9cs3c2egp8fi1/cwtsmall-33i5bqfi STDERR: STDOUT: Errors detected. Cleanup not done. Execution completed with errors ... ---------------------- globus-job-run cat /mnt/nfs/user/worker/data/sid-wf1-9cs3c2egp8fi1/cwtsmall-33i5bqfi/runWaveletsAvg.Rout: /mnt/nfs/user/worker/app/R-2.5.1/lib64/R/bin/R: line 199: /mnt/nfs/user/worker/app/R- 2.5.1/lib64/R/bin/exec/R: cannot execute binary file /mnt/nfs/user/worker/app/R-2.5.1/lib64/R/bin/R: line 199: /mnt/nfs/user/worker/app/R-2.5.1/lib64/R/bin/exec/R: Success ---------------------- OSG admin has confirmed that there are two different clusters on the back of this CE. So this might be the reason not finding R execute binary file on WNs. Should I do some mappings on the CE/WN? Many thanks, Jing -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 21 12:21:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 21 Aug 2007 12:21:55 -0500 Subject: [Swift-user] cannot find R executable on WNs In-Reply-To: References: Message-ID: <1187716915.21800.1.camel@blabla.mcs.anl.gov> On Tue, 2007-08-21 at 12:18 -0500, Jing Tie wrote: > Hi, > > When running SID application on GROW-UNI-P site, the WN got the wrong > R directory. Here is the detail: > [...] > OSG admin has confirmed that there are two different clusters on the back of this CE. So this might be the reason not finding R execute binary file on WNs. Should I do some mappings on the CE/WN? What's a CE and what do you mean by doing some mappings? > > Many thanks, > Jing > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From tiejing at gmail.com Tue Aug 21 12:38:21 2007 From: tiejing at gmail.com (Jing Tie) Date: Tue, 21 Aug 2007 12:38:21 -0500 Subject: [Swift-user] cannot find R executable on WNs In-Reply-To: <1187716915.21800.1.camel@blabla.mcs.anl.gov> References: <1187716915.21800.1.camel@blabla.mcs.anl.gov> Message-ID: I don't know whether I am using the right concepts. Compute Element is the gatekeeper of the site, while Worker Nodes are the workers of the site. The mapping means to help the worker finding out the right R directory if R exists. I am sorry if my understanding is wrong. Thanks, Jing On 8/21/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > On Tue, 2007-08-21 at 12:18 -0500, Jing Tie wrote: > > Hi, > > > > When running SID application on GROW-UNI-P site, the WN got the wrong > > R directory. Here is the detail: > > [...] > > OSG admin has confirmed that there are two different clusters on the > back of this CE. So this might be the reason not finding R execute binary > file on WNs. Should I do some mappings on the CE/WN? > > What's a CE and what do you mean by doing some mappings? > > > > > Many thanks, > > Jing > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiejing at gmail.com Tue Aug 21 16:42:33 2007 From: tiejing at gmail.com (Jing Tie) Date: Tue, 21 Aug 2007 16:42:33 -0500 Subject: [Swift-user] Exception in getFile Message-ID: Hi, I tried SID application on TTU-ANTAEUS site. It works fine with jobmanager, but has "exception in getFile" problem with jobmanager-lsf. btw: globus-job-run antaeus.hpcc.ttu.edu/jobmanager-lsf /bin/hostname works fine. site: TTU-ANTAEUS gatekeeper: antaeus.hpcc.ttu.edu app_dir: /mnt/lustre/antaeus/apps data_dir: /mnt/hep/osg lsf_dir: /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin R_dir: /mnt/lustre/antaeus/apps/R-2.5.1/bin -------------- output: cwtsmall failed Provenance graph saved in sid-wf1-lyk35d4m9l2y0.dot The following errors have occurred: 1. Application "cwtsmall" failed (Exception in getFile Caused by: Server refused performing the request. Custom message: (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: 500-globus_l_gfs_file_open failed. 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: 500-globus_xio_register_open failed. 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: 500-Unable to open file /mnt/hep/osg/sid-wf1-lyk35d4m9l2y0/shared//101-FBchannel15_cwt- avgResults.Rdata 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: 500-System error in open: No such file or directory 500-globus_xio: A system call failed: No such file or directory 500 End.]) Arguments: "scripts/runWaveletsAvg.R, 101, FB" Host: TTU-ANTAEUS Directory: sid-wf1-lyk35d4m9l2y0/cwtsmall-91u714gi STDERR: STDOUT: ---------------------------- But there is only one directory under $data_dir/sid-wf1-lyk35d4m9l2y0/, i.e. shared, and no output files are found. Do I miss some special configurations for LSF? Thanks a lot, Jing -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 21 17:00:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 21 Aug 2007 17:00:34 -0500 Subject: [Swift-user] cannot find R executable on WNs In-Reply-To: References: <1187716915.21800.1.camel@blabla.mcs.anl.gov> Message-ID: <1187733634.31763.1.camel@blabla.mcs.anl.gov> On Tue, 2007-08-21 at 12:38 -0500, Jing Tie wrote: > I don't know whether I am using the right concepts. Compute Element is > the gatekeeper of the site, while Worker Nodes are the workers of the > site. The mapping means to help the worker finding out the right R > directory if R exists. I am sorry if my understanding is wrong. Not sure what the "right" thing to do here is, but you do need the script to find R in order to run it. Mihael > > Thanks, > Jing > > On 8/21/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > On Tue, 2007-08-21 at 12:18 -0500, Jing Tie wrote: > > Hi, > > > > When running SID application on GROW-UNI-P site, the WN got > the wrong > > R directory. Here is the detail: > > [...] > > OSG admin has confirmed that there are two different > clusters on the back of this CE. So this might be the reason > not finding R execute binary file on WNs. Should I do some > mappings on the CE/WN? > > What's a CE and what do you mean by doing some mappings? > > > > > Many thanks, > > Jing > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Tue Aug 21 17:06:38 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 21 Aug 2007 17:06:38 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: References: Message-ID: <1187733998.31763.6.camel@blabla.mcs.anl.gov> It doesn't look like the application runs. Can you try a globusrun or cog-job-submit with some dummy sleep job and see if it works (i.e. if you can see it with bjobs/bhist)? On Tue, 2007-08-21 at 16:42 -0500, Jing Tie wrote: > Hi, > > I tried SID application on TTU-ANTAEUS site. It works fine with > jobmanager, but has "exception in getFile" problem with > jobmanager-lsf. btw: globus-job-run > antaeus.hpcc.ttu.edu/jobmanager-lsf /bin/hostname works fine. > > site: TTU-ANTAEUS > gatekeeper: antaeus.hpcc.ttu.edu > app_dir: /mnt/lustre/antaeus/apps > data_dir: /mnt/hep/osg > lsf_dir: /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin > R_dir: /mnt/lustre/antaeus/apps/R-2.5.1/bin > > -------------- > output: > cwtsmall failed > Provenance graph saved in sid-wf1-lyk35d4m9l2y0.dot > The following errors have occurred: > 1. Application "cwtsmall" failed (Exception in getFile > Caused by: > Server refused performing the request. Custom message: (error > code 1) [Nested exception message: Custom message: Unexpected reply: > 500-Command failed. : > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > 500-globus_l_gfs_file_open failed. > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > 500-globus_xio_register_open failed. > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > 500-Unable to open > file /mnt/hep/osg/sid-wf1-lyk35d4m9l2y0/shared//101-FBchannel15_cwt- > avgResults.Rdata > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > 500-System error in open: No such file or directory > 500-globus_xio: A system call failed: No such file or directory > 500 End.]) > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > Host: TTU-ANTAEUS > Directory: sid-wf1-lyk35d4m9l2y0/cwtsmall-91u714gi > STDERR: > STDOUT: > ---------------------------- > > But there is only one directory under > $data_dir/sid-wf1-lyk35d4m9l2y0/, i.e. shared, and no output files are > found. > > Do I miss some special configurations for LSF? > > Thanks a lot, > Jing > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From tiejing at gmail.com Wed Aug 22 15:48:52 2007 From: tiejing at gmail.com (Jing Tie) Date: Wed, 22 Aug 2007 15:48:52 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187733998.31763.6.camel@blabla.mcs.anl.gov> References: <1187733998.31763.6.camel@blabla.mcs.anl.gov> Message-ID: Hi, I have tried globusrun. It succeed. globusrun -b -r antaeus.hpcc.ttu.edu/jobmanager-lsf -f HelloRSL HelloRSL: & (executable = /mnt/lustre/antaeus/apps/test_files/myscript.sh) (stdout = /mnt/lustre/antaeus/apps/test_files/HelloRSL.output) (stderr = /mnt/lustre/antaeus/apps/test_files/HelloRSL.output) myscript.sh: #! /bin/bash echo "I'm process id $$ on" `hostname` date echo "Running as binary $0" "$@" echo "Done." output: Successfully completed. Resource usage summary: CPU time : 0.18 sec. Max Memory : 2 MB Max Swap : 11 MB Max Processes : 1 Max Threads : 1 The output (if any) follows: I'm process id 4538 on compute-10-16.local Wed Aug 22 15:33:30 CDT 2007 Running as binary /mnt/lustre/antaeus/apps/test_files/myscript.sh Done. Thanks, Jing On 8/21/07, Mihael Hategan wrote: > It doesn't look like the application runs. > > Can you try a globusrun or cog-job-submit with some dummy sleep job and > see if it works (i.e. if you can see it with bjobs/bhist)? > > On Tue, 2007-08-21 at 16:42 -0500, Jing Tie wrote: > > Hi, > > > > I tried SID application on TTU-ANTAEUS site. It works fine with > > jobmanager, but has "exception in getFile" problem with > > jobmanager-lsf. btw: globus-job-run > > antaeus.hpcc.ttu.edu/jobmanager-lsf /bin/hostname works fine. > > > > site: TTU-ANTAEUS > > gatekeeper: antaeus.hpcc.ttu.edu > > app_dir: /mnt/lustre/antaeus/apps > > data_dir: /mnt/hep/osg > > lsf_dir: /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin > > R_dir: /mnt/lustre/antaeus/apps/R-2.5.1/bin > > > > -------------- > > output: > > cwtsmall failed > > Provenance graph saved in sid-wf1-lyk35d4m9l2y0.dot > > The following errors have occurred: > > 1. Application "cwtsmall" failed (Exception in getFile > > Caused by: > > Server refused performing the request. Custom message: (error > > code 1) [Nested exception message: Custom message: Unexpected reply: > > 500-Command failed. : > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > 500-globus_l_gfs_file_open failed. > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > 500-globus_xio_register_open failed. > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > 500-Unable to open > > file /mnt/hep/osg/sid-wf1-lyk35d4m9l2y0/shared//101-FBchannel15_cwt- > > avgResults.Rdata > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > 500-System error in open: No such file or directory > > 500-globus_xio: A system call failed: No such file or directory > > 500 End.]) > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > Host: TTU-ANTAEUS > > Directory: sid-wf1-lyk35d4m9l2y0/cwtsmall-91u714gi > > STDERR: > > STDOUT: > > ---------------------------- > > > > But there is only one directory under > > $data_dir/sid-wf1-lyk35d4m9l2y0/, i.e. shared, and no output files are > > found. > > > > Do I miss some special configurations for LSF? > > > > Thanks a lot, > > Jing > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From tiejing at gmail.com Tue Aug 28 14:00:45 2007 From: tiejing at gmail.com (Jing Tie) Date: Tue, 28 Aug 2007 14:00:45 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187639927.28827.1.camel@blabla.mcs.anl.gov> References: <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> <1187636589.25702.1.camel@blabla.mcs.anl.gov> <1187639034.28035.1.camel@blabla.mcs.anl.gov> <1187639927.28827.1.camel@blabla.mcs.anl.gov> Message-ID: Hi, Could we know whether the problem is cause by 1 or 2 now? 1. GFS system is broken, and missed the output files; 2. the application not actually having run correctly, and thus not having produced the output files. Thanks, Jing On 8/20/07, Mihael Hategan wrote: > On Mon, 2007-08-20 at 14:55 -0500, Jing Tie wrote: > > I see. So at this point, the problem could be caused by two reasons: > > 1. GFS system is broken, and missed the output files; > > 2. Swift has problem to create output files. > > > > Is it right? > > Swift doesn't really create output files. It's the application that > does. So I don't see how (2) can be the problem. > > There are other possibilities, including the application not actually > having run correctly, and thus not having produced the output files. > > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan wrote: > > No. Swift will always try to stage out the output files if it > > has no > > indication that something went wrong with the job. But if the > > filesystem > > is broken, and the files are not actually there, well, that's > > what you > > seem to be observing. > > > > On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote: > > > I see. Could this output be viewed as a sign? > > > > > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > > Staging out > > > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > > > > > Thanks, > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > Local empty files may be created even if the remote > > files > > > don't exist. > > > So don't take that as a sign that the application > > has run. > > > > > > In the mean time I'll try to convince it to not > > create empty > > > local > > > files, if they don't exist remotely. > > > > > > Mihael > > > > > > On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > > > > I think these files were from the job. Because I > > deleted all > > > the > > > > *Results.Rdata before the job submitting, and > > found these > > > empty files > > > > after the execution. > > > > > > > > output of the process of execution: > > > > RunID: 3szhlhvg4seu0 > > > > cwtsmall started > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-1187633646429) > > > setting status > > > > to Active > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-1187633646429) > > > setting status > > > > to Completed > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-1187633646432) > > > setting status > > > > to Submitted > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-1187633646432) > > > setting status > > > > to Active > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-1187633646432) > > > setting status > > > > to Completed > > > > ... > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-1-1187633646453) > > > setting > > > > status to Completed > > > > Staged in scripts/runWaveletsAvg.R to > > > > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > > > > Running job cwtsmall-gt3062gi cwtsmall with > > arguments > > > > [scripts/runWaveletsAvg.R, 101, FB] in > > > > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > > > > Task(type=1, > > identity=urn:0-0-0-1-0-1-0-1187633646457) > > > setting status > > > > to Submitted > > > > Task(type=1, > > identity=urn:0-0-0-1-0-1-0-1187633646457) > > > setting status > > > > to Active > > > > Task(type=1, > > identity=urn:0-0-0-1-0-1-0-1187633646457) > > > setting status > > > > to Completed > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-1187633646459) > > > setting status > > > > to Active > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-1187633646459) > > > setting status > > > > to Completed > > > > Completed job cwtsmall-gt3062gi cwtsmall with > > arguments > > > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > > > Staging out > > > > > > > > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > > > 101-FBchannel15_cwt- avgResults.Rdata from MIT_CMS > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > > setting > > > > status to Active > > > > Task(type=4, > > identity=urn:0-0-0-1-0-1-0-7-1187633646462) > > > setting > > > > status to Completed > > > > ...... > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-23-1187633646557) > > > setting > > > > status to Active > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-22-1187633646554) > > > setting > > > > status to Failed Exception in getFile > > > > Task(type=2, > > identity=urn:0-0-0-1-0-1-0-2-1187633646560) > > > setting > > > > status to Submitted > > > > ...... > > > > > > > > Thanks, > > > > Jing > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > > wrote: > > > > But those are not from the same job. > > > > > > > > On Mon, 2007-08-20 at 12:28 -0500, Jing > > Tie wrote: > > > > > Yes. I saw > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > 101-FBchannel28_cwt-avgResults.Rdata 28 > > output > > > files on the > > > > swift > > > > > client, but all the files were empty. > > > > > > > > > > Jing > > > > > > > > > > > > > > > On 8/20/07, Mihael Hategan < > > hategan at mcs.anl.gov> > > > wrote: > > > > > On Mon, 2007-08-20 at 12:21 > > -0500, Jing > > > Tie wrote: > > > > > > Yes. There is no * > > avgResults.Rdata > > > under shared > > > > directory, > > > > > only input > > > > > > file, scripts, wrapper.sh and > > seq.sh . > > > > > > > > > > Did the job actually run? > > > > > > > > > > > > > > > > > Jing > > > > > > > > > > > > On 8/20/07, Mihael Hategan < > > > hategan at mcs.anl.gov> > > > > wrote: > > > > > > Not much we can do if > > the > > > filesystem is > > > > broken. > > > > > > Did you check to > > confirm that > > > the file is > > > > not > > > > > there? > > > > > > > > > > > > Mihael > > > > > > > > > > > > From wilde at mcs.anl.gov Wed Aug 29 14:58:49 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 29 Aug 2007 14:58:49 -0500 Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: <46D5CF78.1060009@mcs.anl.gov> References: <46D5CF78.1060009@mcs.anl.gov> Message-ID: <46D5CFF9.3010207@mcs.anl.gov> Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev - Mike Michael Wilde wrote: > The swift script: > > type pcapfile; > type angleout; > type anglecenter; > > (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) > { > app { angle4 @ifile @ofile @cfile; } > } > > pcapfile pcapfiles[]; > > foreach pf in pcapfiles { > angleout of; > anglecenter cf; > (of,cf) = angle4(pf); > } > > > throws the error below using trunk, 1730, updated around noon today: > > awf2.swift: source file is new. Recompiling. > Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml > Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data > > Swift v0.2-dev > > RunID: amf8l37tq0rh1 > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > vdl:new @ awf2.kml, line: 47 > Caused by: java.lang.ClassCastException: > org.griphyn.vdl.mapping.RootDataNode > at > org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) > at org.griphyn.vdl.karajan.lib.New.function(New.java:88) > ... > > The source, data and out/log files are attached. > > This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben > made to 0.2 around that date). > > I will continue to fiddle with it and try simpler code using 1730 but > I'd appreciate it Ben and/or Mihael if you could investigate. > > Thanks, > > - Mike From hategan at mcs.anl.gov Wed Aug 29 15:06:32 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 29 Aug 2007 15:06:32 -0500 Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: <46D5CFF9.3010207@mcs.anl.gov> References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> Message-ID: <1188417992.4372.1.camel@blabla.mcs.anl.gov> Are you sure that's the most recent stack trace? RootArrayDataNode.java:22 is a declaration not an instruction. Mihael On Wed, 2007-08-29 at 14:58 -0500, Michael Wilde wrote: > Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data > using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev > > - Mike > > Michael Wilde wrote: > > The swift script: > > > > type pcapfile; > > type angleout; > > type anglecenter; > > > > (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) > > { > > app { angle4 @ifile @ofile @cfile; } > > } > > > > pcapfile pcapfiles[]; > > > > foreach pf in pcapfiles { > > angleout of; > > anglecenter cf; > > (of,cf) = angle4(pf); > > } > > > > > > throws the error below using trunk, 1730, updated around noon today: > > > > awf2.swift: source file is new. Recompiling. > > Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml > > Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data > > > > Swift v0.2-dev > > > > RunID: amf8l37tq0rh1 > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > vdl:new @ awf2.kml, line: 47 > > Caused by: java.lang.ClassCastException: > > org.griphyn.vdl.mapping.RootDataNode > > at > > org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) > > at org.griphyn.vdl.karajan.lib.New.function(New.java:88) > > ... > > > > The source, data and out/log files are attached. > > > > This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben > > made to 0.2 around that date). > > > > I will continue to fiddle with it and try simpler code using 1730 but > > I'd appreciate it Ben and/or Mihael if you could investigate. > > > > Thanks, > > > > - Mike > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From benc at hawaga.org.uk Thu Aug 30 07:30:40 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 12:30:40 +0000 (GMT) Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: <46D5CFF9.3010207@mcs.anl.gov> References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> Message-ID: What SVN version is that? (type svn info into your root build directory) On Wed, 29 Aug 2007, Michael Wilde wrote: > Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data > using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev > > - Mike > > Michael Wilde wrote: > > The swift script: > > > > type pcapfile; > > type angleout; > > type anglecenter; > > > > (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) > > { > > app { angle4 @ifile @ofile @cfile; } > > } > > > > pcapfile pcapfiles[]; > > > > foreach pf in pcapfiles { > > angleout of; > > anglecenter cf; > > (of,cf) = angle4(pf); > > } > > > > > > throws the error below using trunk, 1730, updated around noon today: > > > > awf2.swift: source file is new. Recompiling. > > Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml > > Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data > > > > Swift v0.2-dev > > > > RunID: amf8l37tq0rh1 > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > vdl:new @ awf2.kml, line: 47 > > Caused by: java.lang.ClassCastException: > > org.griphyn.vdl.mapping.RootDataNode > > at > > org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) > > at org.griphyn.vdl.karajan.lib.New.function(New.java:88) > > ... > > > > The source, data and out/log files are attached. > > > > This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben made > > to 0.2 around that date). > > > > I will continue to fiddle with it and try simpler code using 1730 but I'd > > appreciate it Ben and/or Mihael if you could investigate. > > > > Thanks, > > > > - Mike > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From benc at hawaga.org.uk Thu Aug 30 07:53:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 12:53:14 +0000 (GMT) Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: <46D5CFF9.3010207@mcs.anl.gov> References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> Message-ID: awf1.swift or awf2.swift? The error says awf2 but your description says awf1. On Wed, 29 Aug 2007, Michael Wilde wrote: > Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data > using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev > > - Mike > > Michael Wilde wrote: > > The swift script: > > > > type pcapfile; > > type angleout; > > type anglecenter; > > > > (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) > > { > > app { angle4 @ifile @ofile @cfile; } > > } > > > > pcapfile pcapfiles[]; > > > > foreach pf in pcapfiles { > > angleout of; > > anglecenter cf; > > (of,cf) = angle4(pf); > > } > > > > > > throws the error below using trunk, 1730, updated around noon today: > > > > awf2.swift: source file is new. Recompiling. > > Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml > > Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data > > > > Swift v0.2-dev > > > > RunID: amf8l37tq0rh1 > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode > > vdl:new @ awf2.kml, line: 47 > > Caused by: java.lang.ClassCastException: > > org.griphyn.vdl.mapping.RootDataNode > > at > > org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) > > at org.griphyn.vdl.karajan.lib.New.function(New.java:88) > > ... > > > > The source, data and out/log files are attached. > > > > This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben made > > to 0.2 around that date). > > > > I will continue to fiddle with it and try simpler code using 1730 but I'd > > appreciate it Ben and/or Mihael if you could investigate. > > > > Thanks, > > > > - Mike > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From wilde at mcs.anl.gov Thu Aug 30 07:55:35 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 07:55:35 -0500 Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> Message-ID: <46D6BE47.8030807@mcs.anl.gov> This was solved. Apparently a svn update done within the cog directory doesnt extend into the cog/modules/vdsk directory. Thats surprising, and it dont know if its normal SVN behavior or some mis-config of mine. Also, once I had done the vdsk update, an additional ant clean and ant distclean was needed before the dist worked. - Mike Ben Clifford wrote: > > What SVN version is that? (type svn info into your root build directory) > > On Wed, 29 Aug 2007, Michael Wilde wrote: > >> Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data >> using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev >> >> - Mike >> >> Michael Wilde wrote: >>> The swift script: >>> >>> type pcapfile; >>> type angleout; >>> type anglecenter; >>> >>> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) >>> { >>> app { angle4 @ifile @ofile @cfile; } >>> } >>> >>> pcapfile pcapfiles[]; >>> >>> foreach pf in pcapfiles { >>> angleout of; >>> anglecenter cf; >>> (of,cf) = angle4(pf); >>> } >>> >>> >>> throws the error below using trunk, 1730, updated around noon today: >>> >>> awf2.swift: source file is new. Recompiling. >>> Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml >>> Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data >>> >>> Swift v0.2-dev >>> >>> RunID: amf8l37tq0rh1 >>> java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode >>> java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode >>> vdl:new @ awf2.kml, line: 47 >>> Caused by: java.lang.ClassCastException: >>> org.griphyn.vdl.mapping.RootDataNode >>> at >>> org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) >>> at org.griphyn.vdl.karajan.lib.New.function(New.java:88) >>> ... >>> >>> The source, data and out/log files are attached. >>> >>> This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben made >>> to 0.2 around that date). >>> >>> I will continue to fiddle with it and try simpler code using 1730 but I'd >>> appreciate it Ben and/or Mihael if you could investigate. >>> >>> Thanks, >>> >>> - Mike >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > > From tiejing at gmail.com Mon Aug 20 14:36:03 2007 From: tiejing at gmail.com (Jing Tie) Date: Mon, 20 Aug 2007 14:36:03 -0500 Subject: [Swift-user] Exception in getFile In-Reply-To: <1187636589.25702.1.camel@blabla.mcs.anl.gov> References: <1187630244.20900.0.camel@blabla.mcs.anl.gov> <1187630624.21340.0.camel@blabla.mcs.anl.gov> <1187632354.22920.0.camel@blabla.mcs.anl.gov> <1187636589.25702.1.camel@blabla.mcs.anl.gov> Message-ID: I see. Could this output be viewed as a sign? Completed job cwtsmall-gt3062gi cwtsmall with arguments [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS Staging out sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt- avgResults.Rdata to 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS Thanks, Jing On 8/20/07, Mihael Hategan wrote: > > Local empty files may be created even if the remote files don't exist. > So don't take that as a sign that the application has run. > > In the mean time I'll try to convince it to not create empty local > files, if they don't exist remotely. > > Mihael > > On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: > > I think these files were from the job. Because I deleted all the > > *Results.Rdata before the job submitting, and found these empty files > > after the execution. > > > > output of the process of execution: > > RunID: 3szhlhvg4seu0 > > cwtsmall started > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status > > to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429) setting status > > to Completed > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > > to Submitted > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > > to Active > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432) setting status > > to Completed > > ... > > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453) setting > > status to Completed > > Staged in scripts/runWaveletsAvg.R to > > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS > > Running job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] in > > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > > to Submitted > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > > to Active > > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457) setting status > > to Completed > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status > > to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459) setting status > > to Completed > > Completed job cwtsmall-gt3062gi cwtsmall with arguments > > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS > > Staging out > > sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to > > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting > > status to Active > > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462) setting > > status to Completed > > ...... > > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557) setting > > status to Active > > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554) setting > > status to Failed Exception in getFile > > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560) setting > > status to Submitted > > ...... > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > But those are not from the same job. > > > > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote: > > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to > > > 101-FBchannel28_cwt-avgResults.Rdata 28 output files on the > > swift > > > client, but all the files were empty. > > > > > > Jing > > > > > > > > > On 8/20/07, Mihael Hategan wrote: > > > On Mon, 2007-08-20 at 12:21 -0500, Jing Tie wrote: > > > > Yes. There is no * avgResults.Rdata under shared > > directory, > > > only input > > > > file, scripts, wrapper.sh and seq.sh. > > > > > > Did the job actually run? > > > > > > > > > > > Jing > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> > > wrote: > > > > Not much we can do if the filesystem is > > broken. > > > > Did you check to confirm that the file is > > not > > > there? > > > > > > > > Mihael > > > > > > > > On Mon, 2007-08-20 at 12:07 -0500, Jing > > Tie wrote: > > > > > Hi, > > > > > > > > > > Here is another problem. It seems like > > something > > > wrong with > > > > GFS > > > > > system. > > > > > > > > > > site: MIT_CMS > > > > > gatekeeper: ce01.cmsaf.mit.edu > > > > > app_dir: /osg/app > > > > > data_dir: /osg/data > > > > > condor_dir: /usr/local/condor/bin > > > > > R_dir: /osg/app/R- 2.5.1/bin/R > > > > > > > > > > output: > > > > > Application exception: Exception in > > getFile > > > > > task:transfer @ vdl-int.k, line: > > 235 > > > > > vdl:dostageout @ vdl-int.k, > > line: 378 > > > > > vdl:execute2 @ > > execute-default.k, line: 22 > > > > > vdl:execute @ sid-wf1.kml, line: > > 20 > > > > > wavelettransf @ sid-wf1.kml, > > line: 362 > > > > > batchtrials @ sid-wf1.kml, line: > > 402 > > > > > vdl:mains @ sid-wf1.kml , line: > > 399 > > > > > Caused by: > > > > > > > > > org.globus.cog.abstraction.impl.file.FileResourceException: > > > > > Exception in getFile > > > > > Caused by: > > > org.globus.ftp.exception.ServerException : Server > > > > refused > > > > > performing the request. Custom > > message: (error > > > code > > > > 1) cwtsmall > > > > > failed > > > > > Provenance graph saved in > > > sid-wf1-7thy5mbfh09e1.dot > > > > > The following errors have occurred: > > > > > 1. Application "cwtsmall" failed > > (Exception in > > > getFile > > > > > Caused by: > > > > > Server refused performing the request. > > Custom > > > > message: (error code > > > > > 1) > > > > > [Nested exception message: Nested > > exception is > > > > > > > > > > org.globus.ftp.exception.UnexpectedReplyCodeException : > > > > > Custom message: Unexpected reply: > > > > > 500-Command failed. : > > > > > > > > > > globus_gridftp_server_file.c:globus_l_gfs_file_send:2190: > > > > > 500-globus_l_gfs_file_open failed. > > > > > > > > > > > > > > 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694: > > > > > 500-globus_xio_register_open failed. > > > > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:438: > > > > > 500-Unable to open > > > > > > > > > > > > > > file > /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- > avgResults.Rdata > > > > > > > > > > 500-globus_xio_file_driver.c:globus_l_xio_file_open:381: > > > > > 500-System error in open: No such file > > or > > > directory > > > > > 500-globus_xio: A system call failed: No > > such file > > > or > > > > directory > > > > > 500 End.]) > > > > > Arguments: > > "scripts/runWaveletsAvg.R, 101, > > > FB" > > > > > Host: UCSDT2 > > > > > Directory: > > > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi > > > > > STDERR: > > > > > STDOUT: > > > > > Errors detected. Cleanup not done. > > > > > Execution completed with errors > > > > > sys:throw @ vdl.k, line: 140 > > > > > vdl:mains @ sid-wf1.kml, line: > > 399 > > > > > at > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > > (FlowNode.java:413) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail( > FlowNode.java:417) > > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > > > (GenerateErrorNode.java:28) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( > Sequential.java :33) > > > > > at > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > (FlowNode.java:334) > > > > > at > > > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send > > (EventBus.java:123) > > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > > (EventBus.java:97) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent ( > FlowNode.java:172) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete( > FlowNode.java:298) > > > > > at > > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren( > AbstractFunction.java:37) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute( > FlowContainer.java:63) > > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > > > (FlowNode.java :239) > > > > > at > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start > > ( FlowNode.java :280) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent( > FlowNode.java:392) > > > > > at > > > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event > > > > > (FlowNode.java:331) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event( > FlowElementWrapper.java:227) > > > > > at > > > > > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.send( > EventBus.java:123) > > > > > at > > > > > > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > > ( EventBus.java:97) > > > > > at > > > > > > > > > > org.globus.cog.karajan.workflow.events.EventWorker.run > > > > ( EventWorker.java:69) > > > > > > > > > > Many thanks, > > > > > Jing > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Aug 29 14:56:40 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 29 Aug 2007 14:56:40 -0500 Subject: [Swift-user] Swift script throws exception in mapping Message-ID: <46D5CF78.1060009@mcs.anl.gov> The swift script: type pcapfile; type angleout; type anglecenter; (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) { app { angle4 @ifile @ofile @cfile; } } pcapfile pcapfiles[]; foreach pf in pcapfiles { angleout of; anglecenter cf; (of,cf) = angle4(pf); } throws the error below using trunk, 1730, updated around noon today: awf2.swift: source file is new. Recompiling. Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data Swift v0.2-dev RunID: amf8l37tq0rh1 java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode vdl:new @ awf2.kml, line: 47 Caused by: java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode at org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) at org.griphyn.vdl.karajan.lib.New.function(New.java:88) ... The source, data and out/log files are attached. This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben made to 0.2 around that date). I will continue to fiddle with it and try simpler code using 1730 but I'd appreciate it Ben and/or Mihael if you could investigate. Thanks, - Mike -------------- next part -------------- A non-text attachment was scrubbed... Name: bug2.tar Type: application/x-tar Size: 51200 bytes Desc: not available URL: From wilde at mcs.anl.gov Wed Aug 29 23:57:07 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 29 Aug 2007 23:57:07 -0500 Subject: [Swift-user] I/O errors in swift script Message-ID: <46D64E23.70705@mcs.anl.gov> I'm progressing on the angle runs. Previous errors were due to problems with svn update, and then apparently needing ant clean and distclean. Now I'm executing but getting I/O errors. Ive attached all the logs and output from this run. My result files are coming back zero-length and Im seeing I/O errors in the logs (eg, in swift.out): ... Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting status to Active Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Failed Exception in getFile ... My suspcion is that the app is failing and not proucing an expected output file. Perhaps theres a clean error in the log that says this but I havent found it yet. I think I saw error #500's from gridftp in the log. While I debug further, if anyone sees a different or obvious cause, I'd appreciate your eyeballs on it. Thanks, Mike -------------- next part -------------- A non-text attachment was scrubbed... Name: bug3.tar Type: application/x-tar Size: 245760 bytes Desc: not available URL: From benc at hawaga.org.uk Thu Aug 30 07:59:33 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 12:59:33 +0000 (GMT) Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: <46D6BE47.8030807@mcs.anl.gov> References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> <46D6BE47.8030807@mcs.anl.gov> Message-ID: On Thu, 30 Aug 2007, Michael Wilde wrote: > This was solved. Apparently a svn update done within the cog directory doesnt > extend into the cog/modules/vdsk directory. Thats surprising, and it dont know > if its normal SVN behavior or some mis-config of mine. SVN doesn't recurse across checkouts. the vdsk directory is a separate checkout that gets grafted into the cog/modules/ directory. likewise for provider-deef. this is a fairly unconventional way to build things. > Also, once I had done the vdsk update, an additional ant clean and ant > distclean was needed before the dist worked. yes. that's pretty standard behaviour. -- From wilde at mcs.anl.gov Thu Aug 30 07:59:56 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 07:59:56 -0500 Subject: [Swift-user] Re: Swift script throws exception in mapping In-Reply-To: References: <46D5CF78.1060009@mcs.anl.gov> <46D5CFF9.3010207@mcs.anl.gov> Message-ID: <46D6BF4C.2040000@mcs.anl.gov> sorry, we're out of sync - this one's resolved. Ben Clifford wrote: > awf1.swift or awf2.swift? The error says awf2 but your description says > awf1. > > On Wed, 29 Aug 2007, Michael Wilde wrote: > >> Also: this was run on tg-viz-login1 from the directory ~wilde/angle/data >> using awf1.swift and the swift dist in ~wilde/swift1730/vdsk-0.2-dev >> >> - Mike >> >> Michael Wilde wrote: >>> The swift script: >>> >>> type pcapfile; >>> type angleout; >>> type anglecenter; >>> >>> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) >>> { >>> app { angle4 @ifile @ofile @cfile; } >>> } >>> >>> pcapfile pcapfiles[]; >>> >>> foreach pf in pcapfiles { >>> angleout of; >>> anglecenter cf; >>> (of,cf) = angle4(pf); >>> } >>> >>> >>> throws the error below using trunk, 1730, updated around noon today: >>> >>> awf2.swift: source file is new. Recompiling. >>> Using sites file: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/sites.xml >>> Using tc.data: /home/wilde/swift1730/vdsk-0.2-dev/bin/../etc/tc.data >>> >>> Swift v0.2-dev >>> >>> RunID: amf8l37tq0rh1 >>> java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode >>> java.lang.ClassCastException: org.griphyn.vdl.mapping.RootDataNode >>> vdl:new @ awf2.kml, line: 47 >>> Caused by: java.lang.ClassCastException: >>> org.griphyn.vdl.mapping.RootDataNode >>> at >>> org.griphyn.vdl.mapping.RootArrayDataNode.init(RootArrayDataNode.java:22) >>> at org.griphyn.vdl.karajan.lib.New.function(New.java:88) >>> ... >>> >>> The source, data and out/log files are attached. >>> >>> This code seemed to work circa 8/1 with 0.2 (or with a patch that Ben made >>> to 0.2 around that date). >>> >>> I will continue to fiddle with it and try simpler code using 1730 but I'd >>> appreciate it Ben and/or Mihael if you could investigate. >>> >>> Thanks, >>> >>> - Mike >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > > From wilde at mcs.anl.gov Thu Aug 30 08:31:37 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 08:31:37 -0500 Subject: [Swift-user] Resending: I/O errors in swift script Message-ID: <46D6C6B9.6020708@mcs.anl.gov> Resending this after changing list to take larger attachments. Previous message seems to have gotten lost (I musta pressed the wrong button in the list manager?) --- I'm progressing on the angle runs. Previous errors were due to problems with svn update, and then apparently needing ant clean and distclean. Now I'm executing but getting I/O errors. Ive attached all the logs and output from this run. My result files are coming back zero-length and Im seeing I/O errors in the logs (eg, in swift.out): ... Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting status to Active Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Failed Exception in getFile ... My suspcion is that the app is failing and not proucing an expected output file. Perhaps theres a clean error in the log that says this but I havent found it yet. I think I saw error #500's from gridftp in the log. While I debug further, if anyone sees a different or obvious cause, I'd appreciate your eyeballs on it. Thanks, Mike -------------- next part -------------- A non-text attachment was scrubbed... Name: bug3.tar Type: application/x-tar Size: 245760 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu Aug 30 09:22:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 09:22:18 -0500 Subject: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D6C6B9.6020708@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> Message-ID: <1188483739.27541.8.camel@blabla.mcs.anl.gov> Ok. You have a bunch of errors, mainly of two types: 1. Missing output file (we should add a rule in error.properties to make that verbose message a little more readable). This may be because the application didn't run or because the filesystem is broken. Right now an exit code file is produced by the wrapper only if the exit code of the application is not 0. This does not allow telling between the application having completed successfully or the filesystem being broken. I believe that a stamp file should also be created by the wrapper in order to distinguish between the two. The reason for the stamp file instead of always having an exit code file is that it is more efficient to check the existence of a file than to stage it out and look at its contents. 2. Exit code != 0. Looks like some issues with R. Mihael On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: > Resending this after changing list to take larger attachments. > Previous message seems to have gotten lost (I musta pressed the wrong > button in the list manager?) > > --- > > I'm progressing on the angle runs. Previous errors were due to problems > with svn update, and then apparently needing ant clean and distclean. > > Now I'm executing but getting I/O errors. Ive attached all the logs and > output from this run. > > My result files are coming back zero-length and Im seeing I/O errors in > the logs (eg, in swift.out): > > ... > Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting > status to Active > Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active > Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > Failed Exception in getFile > > ... > > My suspcion is that the app is failing and not proucing an expected > output file. Perhaps theres a clean error in the log that says this but > I havent found it yet. I think I saw error #500's from gridftp in the log. > > While I debug further, if anyone sees a different or obvious cause, I'd > appreciate your eyeballs on it. > > Thanks, > > Mike > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Thu Aug 30 10:56:02 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 10:56:02 -0500 Subject: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <1188483739.27541.8.camel@blabla.mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> Message-ID: <46D6E892.5000706@mcs.anl.gov> Great - thanks. That was indeed the problem: my application script had a typo and was trying to run the 32-bit binary regardless what processor type it wound up on. When I last run successfully, I was getting most or all i686 machines; this time I was getting ia64 machines. I'll try to re-run it w/o debug, and see if the messages need improvement. Kickstart would have helped here - would have told me that Im running on ia64. This is the kind of problem that on a local machine would have been recognizable instantly but on a remote machine through swift, karajan, globus and PBS is a much greater challenge to diagnose. We should think in terms of how to make that long pipeline to the remote execution environment much more transparent to the user. Think: "what would I see if I ran this locally" and "how do I bring that environment to the swift user"? Also noted that: - the retry logic here did more harm than good. Maybe we want the default for this to be off, especially during debugging. - in my latest run, which succeeded, the final job completion was excessively delayed. The output files were all back on the submit host, 4 of 5 jobs were logged as completed, and the completion of the final job seemed to take a few minutes longer. I'll work through the error logs more closely and file an enhancement request in bugz. I can batch these for later discussion or bring them as I encounter things, whatever people prefer. I dont want to distract anyone at the moment into long discssions on these; I'll organize them into bug reports and enhancement requests and file for discussion when we next review priorities. Ian was suggesting that this be soon - now is when we need to pick the next features for you to work on, Ben and Mihael. Maybe a review of bugs and requests next week, which can be started by email discussion, and we'll note which topics needs voice or f2f discussion. - Mike Mihael Hategan wrote: > Ok. You have a bunch of errors, mainly of two types: > 1. Missing output file (we should add a rule in error.properties to make > that verbose message a little more readable). This may be because the > application didn't run or because the filesystem is broken. Right now an > exit code file is produced by the wrapper only if the exit code of the > application is not 0. This does not allow telling between the > application having completed successfully or the filesystem being > broken. I believe that a stamp file should also be created by the > wrapper in order to distinguish between the two. The reason for the > stamp file instead of always having an exit code file is that it is more > efficient to check the existence of a file than to stage it out and look > at its contents. > > 2. Exit code != 0. Looks like some issues with R. > > Mihael > > On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: >> Resending this after changing list to take larger attachments. >> Previous message seems to have gotten lost (I musta pressed the wrong >> button in the list manager?) >> >> --- >> >> I'm progressing on the angle runs. Previous errors were due to problems >> with svn update, and then apparently needing ant clean and distclean. >> >> Now I'm executing but getting I/O errors. Ive attached all the logs and >> output from this run. >> >> My result files are coming back zero-length and Im seeing I/O errors in >> the logs (eg, in swift.out): >> >> ... >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting >> status to Active >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >> Failed Exception in getFile >> >> ... >> >> My suspcion is that the app is failing and not proucing an expected >> output file. Perhaps theres a clean error in the log that says this but >> I havent found it yet. I think I saw error #500's from gridftp in the log. >> >> While I debug further, if anyone sees a different or obvious cause, I'd >> appreciate your eyeballs on it. >> >> Thanks, >> >> Mike >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From tiejing at gmail.com Fri Aug 31 13:10:01 2007 From: tiejing at gmail.com (Jing Tie) Date: Fri, 31 Aug 2007 13:10:01 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> Message-ID: Hi Michael, You said that this problem is caused by condor's bug. But the site GLOW(see below) can run the job successfully with condor jobmanager. Could you explain this? Many thanks, Jing On 8/20/07, Jing Tie wrote: > Hi, > > There is one site running the application successfully with > jobmanager-condor: > > site: GLOW > gatekeeper: cmsgrid01.hep.wisc.edu > app_dir: /afs/hep.wisc.edu/osg/app > data_dir: /afs/hep.wisc.edu/osg/data > condor_dir: /condor/bin > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R > > Maybe it has some special configurations or arguments. > > Jing > > > On 8/20/07, Jing Tie wrote: > > Right, it's the problem of condor. After replacing jobmanager-condor > > with jobmanager, the job finished successfully. > > > > Thanks, > > Jing > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > Right. The condor job manager has a bug. It does not properly quote > > > arguments. So you'll see strange things like this if you use it. > > > > > > Mihael > > > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > > > Sure. > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > It puzzles me. Can you attach that file? > > > > > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > > > > > > > Jing > > > > > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs > runWaveletsAvg.R > > > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata > > > > > > > > these 28 files. > > > > > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > > > > > Where did you set this? > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > it had > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. > Details > > > > > > > > below: > > > > > > > > > > > > > > > > site: AGLT2 > > > > > > > > gatekeeper: gate01.aglt2.org > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > > > condor_dir: /opt/condor/bin > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > > > > > output: > > > > > > > > Application exception: Job cwtsmall failed with an exit code > of 1024 > > > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > > > vdl:execute2 @ execute-default.k , line: 22 > > > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > cwtsmall failed > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > > > The following errors have occurred: > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an > exit code of 1024) > > > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > > > Host: NWICG_NotreDame > > > > > > > > Directory: > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > > > STDERR: Kickstart executable > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > > > STDOUT: > > > > > > > > Errors detected. Cleanup not done. > > > > > > > > Execution completed with errors > > > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > (FlowNode.java:413) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > (GenerateErrorNode.java:28) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent > (Sequential.java:33) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.send > (EventBus.java:123) > > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent > (FlowNode.java:172) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren > (AbstractFunction.java:37) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > (FlowNode.java:239) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > (FlowNode.java:392) > > > > > > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.event > (FlowElementWrapper.java:227) > > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > > > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > (EventBus.java:97) > > > > > > > > at > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > > > > > > > I found that there are about 8 sites in OSG having the > problem. > > > > > > > > > > > > > > > > Many thanks, > > > > > > > > Jing > > > > > > > > > _______________________________________________ > > > > > > > > Swift-user mailing list > > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Fri Aug 31 13:18:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 31 Aug 2007 13:18:11 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> Message-ID: <1188584292.20777.1.camel@blabla.mcs.anl.gov> On Fri, 2007-08-31 at 13:10 -0500, Jing Tie wrote: > Hi Michael, > > You said that this problem is caused by condor's bug. But the site > GLOW(see below) can run the job successfully with condor jobmanager. > Could you explain this? I can't. Perhaps this site has the problem fixed in some way. Mihael > > Many thanks, > Jing > > On 8/20/07, Jing Tie wrote: > > Hi, > > > > There is one site running the application successfully with > > jobmanager-condor: > > > > site: GLOW > > gatekeeper: cmsgrid01.hep.wisc.edu > > app_dir: /afs/hep.wisc.edu/osg/app > > data_dir: /afs/hep.wisc.edu/osg/data > > condor_dir: /condor/bin > > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R > > > > Maybe it has some special configurations or arguments. > > > > Jing > > > > > > On 8/20/07, Jing Tie wrote: > > > Right, it's the problem of condor. After replacing jobmanager-condor > > > with jobmanager, the job finished successfully. > > > > > > Thanks, > > > Jing > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > Right. The condor job manager has a bug. It does not properly quote > > > > arguments. So you'll see strange things like this if you use it. > > > > > > > > Mihael > > > > > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > > > > Sure. > > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > > It puzzles me. Can you attach that file? > > > > > > > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > > > > > > > > > > Jing > > > > > > > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs > > runWaveletsAvg.R > > > > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata > > > > > > > > > these 28 files. > > > > > > > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > > > > > > > Where did you set this? > > > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > it had > > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. > > Details > > > > > > > > > below: > > > > > > > > > > > > > > > > > > site: AGLT2 > > > > > > > > > gatekeeper: gate01.aglt2.org > > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > > > > condor_dir: /opt/condor/bin > > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > > > > > > > output: > > > > > > > > > Application exception: Job cwtsmall failed with an exit code > > of 1024 > > > > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > > > > vdl:execute2 @ execute-default.k , line: 22 > > > > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > cwtsmall failed > > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > > > > The following errors have occurred: > > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an > > exit code of 1024) > > > > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > > > > Host: NWICG_NotreDame > > > > > > > > > Directory: > > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > > > > STDERR: Kickstart executable > > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > > > > STDOUT: > > > > > > > > > Errors detected. Cleanup not done. > > > > > > > > > Execution completed with errors > > > > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > > (FlowNode.java:413) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > (GenerateErrorNode.java:28) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent > > (Sequential.java:33) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.send > > (EventBus.java:123) > > > > > > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent > > (FlowNode.java:172) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren > > (AbstractFunction.java:37) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > (FlowNode.java:239) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > > (FlowNode.java:392) > > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > > > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.event > > (FlowElementWrapper.java:227) > > > > > > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > (EventBus.java:97) > > > > > > > > > at > > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > > > > > > > > > I found that there are about 8 sites in OSG having the > > problem. > > > > > > > > > > > > > > > > > > Many thanks, > > > > > > > > > Jing > > > > > > > > > > > _______________________________________________ > > > > > > > > > Swift-user mailing list > > > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From tiejing at gmail.com Fri Aug 31 14:35:33 2007 From: tiejing at gmail.com (Jing Tie) Date: Fri, 31 Aug 2007 14:35:33 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: <1188584292.20777.1.camel@blabla.mcs.anl.gov> References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> <1188584292.20777.1.camel@blabla.mcs.anl.gov> Message-ID: Hi Mihael, OSG troubleshooting group would like to help me with some running issues on OSG sites. Is it possible for me to see the submit file that swift generated? Thanks, Jing On 8/31/07, Mihael Hategan wrote: > On Fri, 2007-08-31 at 13:10 -0500, Jing Tie wrote: > > Hi Michael, > > > > You said that this problem is caused by condor's bug. But the site > > GLOW(see below) can run the job successfully with condor jobmanager. > > Could you explain this? > > I can't. Perhaps this site has the problem fixed in some way. > > Mihael > > > > > Many thanks, > > Jing > > > > On 8/20/07, Jing Tie wrote: > > > Hi, > > > > > > There is one site running the application successfully with > > > jobmanager-condor: > > > > > > site: GLOW > > > gatekeeper: cmsgrid01.hep.wisc.edu > > > app_dir: /afs/hep.wisc.edu/osg/app > > > data_dir: /afs/hep.wisc.edu/osg/data > > > condor_dir: /condor/bin > > > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R > > > > > > Maybe it has some special configurations or arguments. > > > > > > Jing > > > > > > > > > On 8/20/07, Jing Tie wrote: > > > > Right, it's the problem of condor. After replacing jobmanager-condor > > > > with jobmanager, the job finished successfully. > > > > > > > > Thanks, > > > > Jing > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > Right. The condor job manager has a bug. It does not properly quote > > > > > arguments. So you'll see strange things like this if you use it. > > > > > > > > > > Mihael > > > > > > > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > > > > > Sure. > > > > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > > > It puzzles me. Can you attach that file? > > > > > > > > > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > > > > > > > > > > > > > Jing > > > > > > > > > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs > > > runWaveletsAvg.R > > > > > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata > > > > > > > > > > these 28 files. > > > > > > > > > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > > > > > > > > > Where did you set this? > > > > > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > it had > > > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. > > > Details > > > > > > > > > > below: > > > > > > > > > > > > > > > > > > > > site: AGLT2 > > > > > > > > > > gatekeeper: gate01.aglt2.org > > > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > > > > > condor_dir: /opt/condor/bin > > > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > > > > > > > > > output: > > > > > > > > > > Application exception: Job cwtsmall failed with an exit code > > > of 1024 > > > > > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > > > > > vdl:execute2 @ execute-default.k , line: 22 > > > > > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > > cwtsmall failed > > > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > > > > > The following errors have occurred: > > > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an > > > exit code of 1024) > > > > > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > > > > > Host: NWICG_NotreDame > > > > > > > > > > Directory: > > > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > > > > > STDERR: Kickstart executable > > > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > > > > > STDOUT: > > > > > > > > > > Errors detected. Cleanup not done. > > > > > > > > > > Execution completed with errors > > > > > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > > > (FlowNode.java:413) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > (GenerateErrorNode.java:28) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent > > > (Sequential.java:33) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.send > > > (EventBus.java:123) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent > > > (FlowNode.java:172) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren > > > (AbstractFunction.java:37) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > (FlowNode.java:239) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > > > (FlowNode.java:392) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event > > > (FlowElementWrapper.java:227) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > (EventBus.java:97) > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > > > > > > > > > > > I found that there are about 8 sites in OSG having the > > > problem. > > > > > > > > > > > > > > > > > > > > Many thanks, > > > > > > > > > > Jing > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Swift-user mailing list > > > > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Fri Aug 31 14:49:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 31 Aug 2007 14:49:09 -0500 Subject: [Swift-user] Kickstart executable not found In-Reply-To: References: <1187574664.6412.0.camel@blabla.mcs.anl.gov> <1187586409.13110.0.camel@blabla.mcs.anl.gov> <1187620376.14708.2.camel@blabla.mcs.anl.gov> <1188584292.20777.1.camel@blabla.mcs.anl.gov> Message-ID: <1188589750.22221.13.camel@blabla.mcs.anl.gov> On Fri, 2007-08-31 at 14:35 -0500, Jing Tie wrote: > Hi Mihael, > > OSG troubleshooting group would like to help me with some running > issues on OSG sites. Is it possible for me to see the submit file that > swift generated? If you're referring to a condor submit file, then no, because Swift doesn't use those. It makes a direct GRAM call. You can however see the RSL specs that are submitted by adding the following to etc/log4j.properties: org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler=DEBUG The relevant information will then be in the log file. Grep for "RSL:". You can also try the following incantation for the OSG troubleshooting group: "An empty string argument is not the same as no argument. Please make sure empty string arguments make it to the executable." Mihael > > Thanks, > Jing > > On 8/31/07, Mihael Hategan wrote: > > On Fri, 2007-08-31 at 13:10 -0500, Jing Tie wrote: > > > Hi Michael, > > > > > > You said that this problem is caused by condor's bug. But the site > > > GLOW(see below) can run the job successfully with condor jobmanager. > > > Could you explain this? > > > > I can't. Perhaps this site has the problem fixed in some way. > > > > Mihael > > > > > > > > Many thanks, > > > Jing > > > > > > On 8/20/07, Jing Tie wrote: > > > > Hi, > > > > > > > > There is one site running the application successfully with > > > > jobmanager-condor: > > > > > > > > site: GLOW > > > > gatekeeper: cmsgrid01.hep.wisc.edu > > > > app_dir: /afs/hep.wisc.edu/osg/app > > > > data_dir: /afs/hep.wisc.edu/osg/data > > > > condor_dir: /condor/bin > > > > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R > > > > > > > > Maybe it has some special configurations or arguments. > > > > > > > > Jing > > > > > > > > > > > > On 8/20/07, Jing Tie wrote: > > > > > Right, it's the problem of condor. After replacing jobmanager-condor > > > > > with jobmanager, the job finished successfully. > > > > > > > > > > Thanks, > > > > > Jing > > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > > Right. The condor job manager has a bug. It does not properly quote > > > > > > arguments. So you'll see strange things like this if you use it. > > > > > > > > > > > > Mihael > > > > > > > > > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote: > > > > > > > Sure. > > > > > > > > > > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote: > > > > > > > > It puzzles me. Can you attach that file? > > > > > > > > > > > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote: > > > > > > > > > in $SWIFT_HOME/etc/swift.properties > > > > > > > > > > > > > > > > > > > > > > > > > > > Jing > > > > > > > > > > > > > > > > > > On 8/19/07, Mihael Hategan wrote: > > > > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I am working on SID application now. Job cwtsmall is a script > > > > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs > > > > runWaveletsAvg.R > > > > > > > > > > > on input data 101_FB-epochs.Rdata, and should output > > > > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to > > > > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata > > > > > > > > > > > these 28 files. > > > > > > > > > > > > > > > > > > > > > > But when I runed swift client with kickstart.enabled = false, > > > > > > > > > > > > > > > > > > > > Where did you set this? > > > > > > > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > it had > > > > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart > > > > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found. > > > > Details > > > > > > > > > > > below: > > > > > > > > > > > > > > > > > > > > > > site: AGLT2 > > > > > > > > > > > gatekeeper: gate01.aglt2.org > > > > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid > > > > > > > > > > > data_dir: /atlas/data08/OSG/DATA > > > > > > > > > > > condor_dir: /opt/condor/bin > > > > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R > > > > > > > > > > > > > > > > > > > > > > output: > > > > > > > > > > > Application exception: Job cwtsmall failed with an exit code > > > > of 1024 > > > > > > > > > > > sys:throw @ vdl-int.k, line: 109 > > > > > > > > > > > vdl:checkexitcode @ vdl-int.k, line: 370 > > > > > > > > > > > vdl:execute2 @ execute-default.k , line: 22 > > > > > > > > > > > vdl:execute @ sid-wf1.kml, line: 20 > > > > > > > > > > > wavelettransf @ sid-wf1.kml, line: 362 > > > > > > > > > > > batchtrials @ sid-wf1.kml, line: 402 > > > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > > > cwtsmall failed > > > > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot > > > > > > > > > > > The following errors have occurred: > > > > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an > > > > exit code of 1024) > > > > > > > > > > > Arguments: "scripts/runWaveletsAvg.R, 101, FB" > > > > > > > > > > > Host: NWICG_NotreDame > > > > > > > > > > > Directory: > > > > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi > > > > > > > > > > > STDERR: Kickstart executable > > > > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found > > > > > > > > > > > STDOUT: > > > > > > > > > > > Errors detected. Cleanup not done. > > > > > > > > > > > Execution completed with errors > > > > > > > > > > > sys:throw @ vdl.k, line: 140 > > > > > > > > > > > vdl:mains @ sid-wf1.kml, line: 399 > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail > > > > (FlowNode.java:413) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post > > > > (GenerateErrorNode.java:28) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent > > > > (Sequential.java:33) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.send > > > > (EventBus.java:123) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent > > > > (FlowNode.java:172) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren > > > > (AbstractFunction.java:37) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart > > > > (FlowNode.java:239) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > > > > (FlowNode.java:392) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.event > > > > (FlowElementWrapper.java:227) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked > > > > (EventBus.java:97) > > > > > > > > > > > at > > > > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) > > > > > > > > > > > > > > > > > > > > > > I found that there are about 8 sites in OSG having the > > > > problem. > > > > > > > > > > > > > > > > > > > > > > Many thanks, > > > > > > > > > > > Jing > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Swift-user mailing list > > > > > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >