[Swift-user] Kickstart executable not found

Jing Tie tiejing at gmail.com
Fri Aug 31 14:35:33 CDT 2007


Hi Mihael,

OSG troubleshooting group would like to help me with some running
issues on OSG sites. Is it possible for me to see the submit file that
swift generated?

Thanks,
Jing

On 8/31/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> On Fri, 2007-08-31 at 13:10 -0500, Jing Tie wrote:
> > Hi Michael,
> >
> > You said that this problem is caused by condor's bug. But the site
> > GLOW(see below) can run the job successfully with condor jobmanager.
> > Could you explain this?
>
> I can't. Perhaps this site has the problem fixed in some way.
>
> Mihael
>
> >
> > Many thanks,
> > Jing
> >
> > On 8/20/07, Jing Tie <tiejing at gmail.com> wrote:
> > > Hi,
> > >
> > > There is one site running the application successfully with
> > > jobmanager-condor:
> > >
> > > site: GLOW
> > > gatekeeper: cmsgrid01.hep.wisc.edu
> > > app_dir: /afs/hep.wisc.edu/osg/app
> > > data_dir: /afs/hep.wisc.edu/osg/data
> > > condor_dir: /condor/bin
> > > R_dir: /afs/hep.wisc.edu/osg/app/R-2.5.1/bin/R
> > >
> > > Maybe it has some special configurations or arguments.
> > >
> > > Jing
> > >
> > >
> > >  On 8/20/07, Jing Tie <tiejing at gmail.com> wrote:
> > > > Right, it's the problem of condor. After replacing jobmanager-condor
> > > > with jobmanager, the job finished successfully.
> > > >
> > > > Thanks,
> > > > Jing
> > > >
> > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> > > > > Right. The condor job manager has a bug. It does not properly quote
> > > > > arguments. So you'll see strange things like this if you use it.
> > > > >
> > > > > Mihael
> > > > >
> > > > > On Mon, 2007-08-20 at 00:43 -0500, Jing Tie wrote:
> > > > > > Sure.
> > > > > >
> > > > > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> > > > > > > It puzzles me. Can you attach that file?
> > > > > > >
> > > > > > > On Sun, 2007-08-19 at 21:37 -0500, Jing Tie wrote:
> > > > > > > > in $SWIFT_HOME/etc/swift.properties
> > > > > > > >
> > > > > > > >
> > > > > > > > Jing
> > > > > > > >
> > > > > > > > On 8/19/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > > > > > On Sat, 2007-08-18 at 18:24 -0500, Jing Tie wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I am working on SID application now. Job cwtsmall is a script
> > > > > > > > > > wavelet.sh on AGLT2 site. In the wavelet.sh, R runs
> > > runWaveletsAvg.R
> > > > > > > > > > on input data 101_FB-epochs.Rdata, and should output
> > > > > > > > > > 101-FBchannel1_cwt-avgResults.Rdata to
> > > > > > > > > > 101-FBchannel28_cwt- avgResults.Rdata
> > > > > > > > > > these 28 files.
> > > > > > > > > >
> > > > > > > > > > But when I runed swift client with kickstart.enabled = false,
> > > > > > > > >
> > > > > > > > > Where did you set this?
> > > > > > > > >
> > > > > > > > > Mihael
> > > > > > > > >
> > > > > > > > > >  it had
> > > > > > > > > > the exit code 1024 error. And the stderr.txt said: Kickstart
> > > > > > > > > > executable (101-FBchannel18_cwt-avgResults.Rdata) not found.
> > > Details
> > > > > > > > > > below:
> > > > > > > > > >
> > > > > > > > > > site: AGLT2
> > > > > > > > > > gatekeeper: gate01.aglt2.org
> > > > > > > > > > app_dir: /atlas/data08/OSG/APP/SIDGrid
> > > > > > > > > > data_dir: /atlas/data08/OSG/DATA
> > > > > > > > > > condor_dir: /opt/condor/bin
> > > > > > > > > > R_dir: /atlas/data08/OSG/APP/R-2.5.1/bin/R
> > > > > > > > > >
> > > > > > > > > > output:
> > > > > > > > > > Application exception: Job cwtsmall failed with an exit code
> > > of 1024
> > > > > > > > > >         sys:throw @ vdl-int.k, line: 109
> > > > > > > > > >         vdl:checkexitcode @ vdl-int.k, line: 370
> > > > > > > > > >         vdl:execute2 @ execute-default.k , line: 22
> > > > > > > > > >         vdl:execute @ sid-wf1.kml, line: 20
> > > > > > > > > >         wavelettransf @ sid-wf1.kml, line: 362
> > > > > > > > > >         batchtrials @ sid-wf1.kml, line: 402
> > > > > > > > > >         vdl:mains @ sid-wf1.kml, line: 399
> > > > > > > > > > cwtsmall failed
> > > > > > > > > > Provenance graph saved in sid-wf1-8cnxmo0qetg10.dot
> > > > > > > > > > The following errors have occurred:
> > > > > > > > > > 1. Application "cwtsmall" failed (Job cwtsmall failed with an
> > > exit code of 1024)
> > > > > > > > > >         Arguments: "scripts/runWaveletsAvg.R, 101, FB"
> > > > > > > > > >         Host: NWICG_NotreDame
> > > > > > > > > >         Directory:
> > > sid-wf1-8cnxmo0qetg10/cwtsmall-zeb72rfi
> > > > > > > > > >         STDERR: Kickstart executable
> > > > > > > > > > (101-FBchannel18_cwt-avgResults.Rdata) not found
> > > > > > > > > >         STDOUT:
> > > > > > > > > > Errors detected. Cleanup not done.
> > > > > > > > > > Execution completed with errors
> > > > > > > > > >         sys:throw @ vdl.k, line: 140
> > > > > > > > > >         vdl:mains @ sid-wf1.kml, line: 399
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail
> > > (FlowNode.java:413)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post
> > > (GenerateErrorNode.java:28)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent
> > > (Sequential.java:33)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.events.EventBus.send
> > > (EventBus.java:123)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent
> > > (FlowNode.java:172)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren
> > > (AbstractFunction.java:37)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart
> > > (FlowNode.java:239)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:280)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
> > > (FlowNode.java:392)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:331)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.FlowElementWrapper.event
> > > (FlowElementWrapper.java:227)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked
> > > (EventBus.java:97)
> > > > > > > > > >         at
> > > org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69)
> > > > > > > > > >
> > > > > > > > > > I found that there are about 8 sites in OSG having the
> > > problem.
> > > > > > > > > >
> > > > > > > > > > Many thanks,
> > > > > > > > > > Jing
> > > > > > > > > >
> > > _______________________________________________
> > > > > > > > > > Swift-user mailing list
> > > > > > > > > > Swift-user at ci.uchicago.edu
> > > > > > > > > >
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>



More information about the Swift-user mailing list