[Swift-user] Exception in getFile
Mihael Hategan
hategan at mcs.anl.gov
Mon Aug 20 14:43:54 CDT 2007
No. Swift will always try to stage out the output files if it has no
indication that something went wrong with the job. But if the filesystem
is broken, and the files are not actually there, well, that's what you
seem to be observing.
On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote:
> I see. Could this output be viewed as a sign?
>
> Completed job cwtsmall-gt3062gi cwtsmall with arguments
> [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
> Staging out
> sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
> 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS
>
> Thanks,
> Jing
>
> On 8/20/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> Local empty files may be created even if the remote files
> don't exist.
> So don't take that as a sign that the application has run.
>
> In the mean time I'll try to convince it to not create empty
> local
> files, if they don't exist remotely.
>
> Mihael
>
> On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote:
> > I think these files were from the job. Because I deleted all
> the
> > *Results.Rdata before the job submitting, and found these
> empty files
> > after the execution.
> >
> > output of the process of execution:
> > RunID: 3szhlhvg4seu0
> > cwtsmall started
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429)
> setting status
> > to Active
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429)
> setting status
> > to Completed
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)
> setting status
> > to Submitted
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)
> setting status
> > to Active
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)
> setting status
> > to Completed
> > ...
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453)
> setting
> > status to Completed
> > Staged in scripts/runWaveletsAvg.R to
> > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS
> > Running job cwtsmall-gt3062gi cwtsmall with arguments
> > [scripts/runWaveletsAvg.R, 101, FB] in
> > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS
> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)
> setting status
> > to Submitted
> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)
> setting status
> > to Active
> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)
> setting status
> > to Completed
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459)
> setting status
> > to Active
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459)
> setting status
> > to Completed
> > Completed job cwtsmall-gt3062gi cwtsmall with arguments
> > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
> > Staging out
> >
> sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
> > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462)
> setting
> > status to Active
> > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462)
> setting
> > status to Completed
> > ......
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557)
> setting
> > status to Active
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554)
> setting
> > status to Failed Exception in getFile
> > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560)
> setting
> > status to Submitted
> > ......
> >
> > Thanks,
> > Jing
> >
> > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> > But those are not from the same job.
> >
> > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote:
> > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to
> > > 101-FBchannel28_cwt-avgResults.Rdata 28 output
> files on the
> > swift
> > > client, but all the files were empty.
> > >
> > > Jing
> > >
> > >
> > > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov>
> wrote:
> > > On Mon, 2007-08-20 at 12:21 -0500, Jing
> Tie wrote:
> > > > Yes. There is no * avgResults.Rdata
> under shared
> > directory,
> > > only input
> > > > file, scripts, wrapper.sh and seq.sh.
> > >
> > > Did the job actually run?
> > >
> > > >
> > > > Jing
> > > >
> > > > On 8/20/07, Mihael Hategan <
> hategan at mcs.anl.gov>
> > wrote:
> > > > Not much we can do if the
> filesystem is
> > broken.
> > > > Did you check to confirm that
> the file is
> > not
> > > there?
> > > >
> > > > Mihael
> > > >
> > > > On Mon, 2007-08-20 at 12:07
> -0500, Jing
> > Tie wrote:
> > > > > Hi,
> > > > >
> > > > > Here is another problem. It
> seems like
> > something
> > > wrong with
> > > > GFS
> > > > > system.
> > > > >
> > > > > site: MIT_CMS
> > > > > gatekeeper: ce01.cmsaf.mit.edu
> > > > > app_dir: /osg/app
> > > > > data_dir: /osg/data
> > > > >
> condor_dir: /usr/local/condor/bin
> > > > > R_dir: /osg/app/R- 2.5.1/bin/R
> > > > >
> > > > > output:
> > > > > Application exception:
> Exception in
> > getFile
> > > > > task:transfer @
> vdl-int.k, line:
> > 235
> > > > > vdl:dostageout @
> vdl-int.k,
> > line: 378
> > > > > vdl:execute2 @
> > execute-default.k, line: 22
> > > > > vdl:execute @
> sid-wf1.kml , line:
> > 20
> > > > > wavelettransf @
> sid-wf1.kml,
> > line: 362
> > > > > batchtrials @
> sid-wf1.kml, line:
> > 402
> > > > > vdl:mains @
> sid-wf1.kml , line:
> > 399
> > > > > Caused by:
> > > >
> > >
> >
> org.globus.cog.abstraction.impl.file.FileResourceException:
> > > > > Exception in getFile
> > > > > Caused by:
> > > org.globus.ftp.exception.ServerException :
> Server
> > > > refused
> > > > > performing the request. Custom
> > message: (error
> > > code
> > > > 1) cwtsmall
> > > > > failed
> > > > > Provenance graph saved in
> > > sid-wf1-7thy5mbfh09e1.dot
> > > > > The following errors have
> occurred:
> > > > > 1. Application "cwtsmall"
> failed
> > (Exception in
> > > getFile
> > > > > Caused by:
> > > > > Server refused performing the
> request.
> > Custom
> > > > message: (error code
> > > > > 1)
> > > > > [Nested exception
> message: Nested
> > exception is
> > > > >
> > >
> >
> org.globus.ftp.exception.UnexpectedReplyCodeException :
> > > > > Custom message: Unexpected
> reply:
> > > > > 500-Command failed. :
> > > > >
> > >
> >
> globus_gridftp_server_file.c:globus_l_gfs_file_send:2190:
> > > > > 500-globus_l_gfs_file_open
> failed.
> > > > >
> > > >
> > >
> >
> 500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694:
> > > > > 500-globus_xio_register_open
> failed.
> > > > >
> > >
> >
> 500-globus_xio_file_driver.c:globus_l_xio_file_open:438:
> > > > > 500-Unable to open
> > > > >
> > > >
> > >
> >
> file /osgfs/data/sid-wf1-7thy5mbfh09e1/shared//101-FBchannel16_cwt- avgResults.Rdata
> > > > >
> > >
> >
> 500-globus_xio_file_driver.c:globus_l_xio_file_open:381:
> > > > > 500-System error in open: No
> such file
> > or
> > > directory
> > > > > 500-globus_xio: A system call
> failed: No
> > such file
> > > or
> > > > directory
> > > > > 500 End.])
> > > > > Arguments:
> > "scripts/runWaveletsAvg.R, 101,
> > > FB"
> > > > > Host: UCSDT2
> > > > > Directory:
> > > sid-wf1-7thy5mbfh09e1/cwtsmall-mb3l3rfi
> > > > > STDERR:
> > > > > STDOUT:
> > > > > Errors detected. Cleanup not
> done.
> > > > > Execution completed with
> errors
> > > > > sys:throw @ vdl.k,
> line: 140
> > > > > vdl:mains @
> sid-wf1.kml, line:
> > 399
> > > > > at
> > > > >
> > > >
> > >
> org.globus.cog.karajan.workflow.nodes.FlowNode.fail
> > (FlowNode.java:413)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowNode.fail(FlowNode.java:417)
> > > > > at
> > > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.GenerateErrorNode.post
> > > > > (GenerateErrorNode.java:28)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java :33)
> > > > > at
> > > > >
> > > >
> > org.globus.cog.karajan.workflow.nodes.FlowNode.event
> > > (FlowNode.java:334)
> > > > > at
> > > > >
> > > >
> > >
> org.globus.cog.karajan.workflow.events.EventBus.send
> > (EventBus.java:123)
> > > > > at
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.events.EventBus.sendHooked
> > > > > (EventBus.java:97)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren (AbstractFunction.java :37)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute
> (FlowContainer.java:63)
> > > > > at
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowNode.restart
> > > > > ( FlowNode.java :239)
> > > > > at
> > > > >
> > > >
> > >
> org.globus.cog.karajan.workflow.nodes.FlowNode.start
> > ( FlowNode.java :280)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
> (FlowNode.java:392)
> > > > > at
> > > >
> > >
> org.globus.cog.karajan.workflow.nodes.FlowNode.event
> > > > > ( FlowNode.java:331)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.FlowElementWrapper.event
> (FlowElementWrapper.java:227)
> > > > > at
> > > > >
> > > >
> > >
> > org.globus.cog.karajan.workflow.events.EventBus.send
> (EventBus.java:123)
> > > > > at
> > > >
> > >
> >
> org.globus.cog.karajan.workflow.events.EventBus.sendHooked
> > > > > ( EventBus.java:97)
> > > > > at
> > > > >
> > >
> >
> org.globus.cog.karajan.workflow.events.EventWorker.run
> > > > ( EventWorker.java:69)
> > > > >
> > > > > Many thanks,
> > > > > Jing
> > > > >
> > _______________________________________________
> > > > > Swift-user mailing list
> > > > > Swift-user at ci.uchicago.edu
> > > > >
> > >
> >
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > >
> > > >
> > >
> > >
> >
> >
>
>
More information about the Swift-user
mailing list