[Swift-user] Kickstart executable not found

Mihael Hategan hategan at mcs.anl.gov
Fri Aug 31 13:18:11 CDT 2007


On Fri, 2007-08-31 at 13:10 -0500, Jing Tie wrote:
> Hi Michael,
> 
> You said that this problem is caused by condor's bug. But the site
> GLOW(see below) can run the job successfully with condor jobmanager.
> Could you explain this?

I can't. Perhaps this site has the problem fixed in some way.

Mihael

> 
> Many thanks,
> Jing
> 
> On 8/20/07, Jing Tie <tiejing at gmail.com> wrote:
> > Hi,
> >
> > There is one site running the application successfully with
> > jobmanager-condor:
> >
> > site: GLOW
> > gatekeeper: cmsgrid01.hep.wisc.edu
> > app_dir: /afs/hep.wisc.edu/osg/app
> > data_dir: /afs/hep.wisc.edu/osg/data
> > condor_dir: /condor/bin
> > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R
> >
> > Maybe it has some special configurations or arguments.
> >
> > Jing
> >
> >
> >  On 8/20/07, Jing Tie <tiejing at gmail.com> wrote:
> > > Right, it's the problem of condor. After replacing jobmanager-condor
> > > with jobmanager, the job finished successfully.
> > >
> > > Thanks,
> > > Jing
> > >
> > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> > > > Right. The condor job manager has a bug. It does not properly quote
> > > > arguments. So you'll see strange things like this if you use it.
> > > >
> > > > Mihael
> > > >
> > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote:
> > > > > Sure.
> > > > >
> > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> > > > > > It puzzles me. Can you attach that file?
> > > > > >
> > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote:
> > > > > > > in $SWIFT_HOME/etc/swift.properties
> > > > > > >
> > > > > > >
> > > > > > > Jing
> > > > > > >
> > > > > > > On 8/19/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am working on SID application now. Job cwtsmall is a script
> > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs
> > runWaveletsAvg.R
> > > > > > > > > on input data 101_FB-epochs.Rdata, and should output
> > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to
> > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata
> > > > > > > > > these 28 files.
> > > > > > > > >
> > > > > > > > > But when I runed swift client with kickstart.enabled = false,
> > > > > > > >
> > > > > > > > Where did you set this?
> > > > > > > >
> > > > > > > > Mihael
> > > > > > > >
> > > > > > > > >  it had
> > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart
> > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found.
> > Details
> > > > > > > > > below:
> > > > > > > > >
> > > > > > > > > site: AGLT2
> > > > > > > > > gatekeeper: gate01.aglt2.org
> > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid
> > > > > > > > > data_dir: /atlas/data08/OSG/DATA
> > > > > > > > > condor_dir: /opt/condor/bin
> > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R
> > > > > > > > >
> > > > > > > > > output:
> > > > > > > > > Application exception: Job cwtsmall failed with an exit code
> > of 1024
> > > > > > > > >         sys:throw @ vdl-int.k, line: 109
> > > > > > > > >         vdl:checkexitcode @ vdl-int.k, line: 370
> > > > > > > > >         vdl:execute2 @ execute-default.k , line: 22
> > > > > > > > >         vdl:execute @ sid-wf1.kml, line: 20
> > > > > > > > >         wavelettransf @ sid-wf1.kml, line: 362
> > > > > > > > >         batchtrials @ sid-wf1.kml, line: 402
> > > > > > > > >         vdl:mains @ sid-wf1.kml, line: 399
> > > > > > > > > cwtsmall failed
> > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot
> > > > > > > > > The following errors have occurred:
> > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an
> > exit code of 1024)
> > > > > > > > >         Arguments: "scripts/runWaveletsAvg.R, 101, FB"
> > > > > > > > >         Host: NWICG_NotreDame
> > > > > > > > >         Directory:
> > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi
> > > > > > > > >         STDERR: Kickstart executable
> > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found
> > > > > > > > >         STDOUT:
> > > > > > > > > Errors detected. Cleanup not done.
> > > > > > > > > Execution completed with errors
> > > > > > > > >         sys:throw @ vdl.k, line: 140
> > > > > > > > >         vdl:mains @ sid-wf1.kml, line: 399
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.fail
> > (FlowNode.java:413)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post
> > (GenerateErrorNode.java:28)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent
> > (Sequential.java:33)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.events.EventBus.send
> > (EventBus.java:123)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent
> > (FlowNode.java:172)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren
> > (AbstractFunction.java:37)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.restart
> > (FlowNode.java:239)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
> > (FlowNode.java:392)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.FlowElementWrapper.event
> > (FlowElementWrapper.java:227)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.events.EventBus.sendHooked
> > (EventBus.java:97)
> > > > > > > > >         at
> > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69)
> > > > > > > > >
> > > > > > > > > I found that there are about 8 sites in OSG having the
> > problem.
> > > > > > > > >
> > > > > > > > > Many thanks,
> > > > > > > > > Jing
> > > > > > > > >
> > _______________________________________________
> > > > > > > > > Swift-user mailing list
> > > > > > > > > Swift-user at ci.uchicago.edu
> > > > > > > > >
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> >
> >
> 




More information about the Swift-user mailing list