[Swift-user] Exception in getFile

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 20 14:58:47 CDT 2007


On Mon, 2007-08-20 at 14:55 -0500, Jing Tie wrote:
> I see. So at this point, the problem could be caused by two reasons:
> 1. GFS system is broken, and missed the output files;
> 2. Swift has problem to create output files.
> 
> Is it right?

Swift doesn't really create output files. It's the application that
does. So I don't see how (2) can be the problem.

There are other possibilities, including the application not actually
having run correctly, and thus not having produced the output files.

> 
> Thanks,
> Jing
> 
> On 8/20/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>         No. Swift will always try to stage out the output files if it
>         has no
>         indication that something went wrong with the job. But if the
>         filesystem
>         is broken, and the files are not actually there, well, that's
>         what you 
>         seem to be observing.
>         
>         On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote:
>         > I see. Could this output be viewed as a sign?
>         >
>         > Completed job cwtsmall-gt3062gi cwtsmall with arguments
>         > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS 
>         > Staging out
>         >
>         sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
>         > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS
>         >
>         > Thanks,
>         > Jing
>         >
>         > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
>         >         Local empty files may be created even if the remote
>         files
>         >         don't exist.
>         >         So don't take that as a sign that the application
>         has run. 
>         >
>         >         In the mean time I'll try to convince it to not
>         create empty
>         >         local
>         >         files, if they don't exist remotely.
>         >
>         >         Mihael
>         >
>         >         On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote: 
>         >         > I think these files were from the job. Because I
>         deleted all
>         >         the
>         >         > *Results.Rdata before the job submitting, and
>         found these
>         >         empty files
>         >         > after the execution. 
>         >         >
>         >         > output of the process of execution:
>         >         > RunID: 3szhlhvg4seu0
>         >         > cwtsmall started
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-1187633646429) 
>         >         setting status
>         >         > to Active
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-1187633646429)
>         >         setting status
>         >         > to Completed
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-1187633646432) 
>         >         setting status
>         >         > to Submitted
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-1187633646432)
>         >         setting status
>         >         > to Active
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-1187633646432) 
>         >         setting status
>         >         > to Completed
>         >         > ...
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-1-1187633646453)
>         >         setting
>         >         > status to Completed 
>         >         > Staged in scripts/runWaveletsAvg.R to
>         >         > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS
>         >         > Running job cwtsmall-gt3062gi cwtsmall with
>         arguments
>         >         > [scripts/runWaveletsAvg.R, 101, FB] in 
>         >         > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS
>         >         > Task(type=1,
>         identity=urn:0-0-0-1-0-1-0-1187633646457)
>         >         setting status
>         >         > to Submitted
>         >         > Task(type=1,
>         identity=urn:0-0-0-1-0-1-0-1187633646457) 
>         >         setting status
>         >         > to Active
>         >         > Task(type=1,
>         identity=urn:0-0-0-1-0-1-0-1187633646457)
>         >         setting status
>         >         > to Completed
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-1187633646459) 
>         >         setting status
>         >         > to Active
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-1187633646459)
>         >         setting status
>         >         > to Completed
>         >         > Completed job cwtsmall-gt3062gi cwtsmall with
>         arguments 
>         >         > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
>         >         > Staging out
>         >         >
>         >
>         sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
>         >         > 101-FBchannel15_cwt- avgResults.Rdata from MIT_CMS
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-7-1187633646462)
>         >         setting
>         >         > status to Active
>         >         > Task(type=4,
>         identity=urn:0-0-0-1-0-1-0-7-1187633646462) 
>         >         setting
>         >         > status to Completed
>         >         > ......
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-23-1187633646557)
>         >         setting
>         >         > status to Active 
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-22-1187633646554)
>         >         setting
>         >         > status to Failed Exception in getFile
>         >         > Task(type=2,
>         identity=urn:0-0-0-1-0-1-0-2-1187633646560) 
>         >         setting
>         >         > status to Submitted
>         >         > ......
>         >         >
>         >         > Thanks,
>         >         > Jing
>         >         >
>         >         > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov>
>         wrote:
>         >         >         But those are not from the same job.
>         >         >
>         >         >         On Mon, 2007-08-20 at 12:28 -0500, Jing
>         Tie wrote: 
>         >         >         > Yes. I saw
>         101-FBchannel1_cwt-avgResults.Rdata to
>         >         >         > 101-FBchannel28_cwt-avgResults.Rdata 28
>         output
>         >         files on the
>         >         >         swift 
>         >         >         > client, but all the files were empty.
>         >         >         >
>         >         >         > Jing
>         >         >         >
>         >         >         >
>         >         >         > On 8/20/07, Mihael Hategan <
>         hategan at mcs.anl.gov>
>         >         wrote:
>         >         >         >         On Mon, 2007-08-20 at 12:21
>         -0500, Jing
>         >         Tie wrote:
>         >         >         >         > Yes. There is no *
>         avgResults.Rdata
>         >         under shared
>         >         >         directory,
>         >         >         >         only input
>         >         >         >         > file, scripts, wrapper.sh and
>         seq.sh .
>         >         >         >
>         >         >         >         Did the job actually run?
>         >         >         >
>         >         >         >         >
>         >         >         >         > Jing 
>         >         >         >         >
>         >         >         >         > On 8/20/07, Mihael Hategan <
>         >         hategan at mcs.anl.gov>
>         >         >         wrote: 
>         >         >         >         >         Not much we can do if
>         the
>         >         filesystem is
>         >         >         broken.
>         >         >         >         >         Did you check to
>         confirm that 
>         >         the file is
>         >         >         not
>         >         >         >         there?
>         >         >         >         >
>         >         >         >         >         Mihael
>         >         >         >         >
>         
> 




More information about the Swift-user mailing list