[Swift-user] Exception in getFile

Jing Tie tiejing at gmail.com
Tue Aug 28 14:00:45 CDT 2007


Hi,

Could we know whether the problem is cause by 1 or 2 now?

1. GFS system is broken, and missed the output files;
2. the application not actually having run correctly, and thus not
having produced the output files.

Thanks,
Jing

On 8/20/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> On Mon, 2007-08-20 at 14:55 -0500, Jing Tie wrote:
> > I see. So at this point, the problem could be caused by two reasons:
> > 1. GFS system is broken, and missed the output files;
> > 2. Swift has problem to create output files.
> >
> > Is it right?
>
> Swift doesn't really create output files. It's the application that
> does. So I don't see how (2) can be the problem.
>
> There are other possibilities, including the application not actually
> having run correctly, and thus not having produced the output files.
>
> >
> > Thanks,
> > Jing
> >
> > On 8/20/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> >         No. Swift will always try to stage out the output files if it
> >         has no
> >         indication that something went wrong with the job. But if the
> >         filesystem
> >         is broken, and the files are not actually there, well, that's
> >         what you
> >         seem to be observing.
> >
> >         On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote:
> >         > I see. Could this output be viewed as a sign?
> >         >
> >         > Completed job cwtsmall-gt3062gi cwtsmall with arguments
> >         > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
> >         > Staging out
> >         >
> >         sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
> >         > 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS
> >         >
> >         > Thanks,
> >         > Jing
> >         >
> >         > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov> wrote:
> >         >         Local empty files may be created even if the remote
> >         files
> >         >         don't exist.
> >         >         So don't take that as a sign that the application
> >         has run.
> >         >
> >         >         In the mean time I'll try to convince it to not
> >         create empty
> >         >         local
> >         >         files, if they don't exist remotely.
> >         >
> >         >         Mihael
> >         >
> >         >         On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote:
> >         >         > I think these files were from the job. Because I
> >         deleted all
> >         >         the
> >         >         > *Results.Rdata before the job submitting, and
> >         found these
> >         >         empty files
> >         >         > after the execution.
> >         >         >
> >         >         > output of the process of execution:
> >         >         > RunID: 3szhlhvg4seu0
> >         >         > cwtsmall started
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-1187633646429)
> >         >         setting status
> >         >         > to Active
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-1187633646429)
> >         >         setting status
> >         >         > to Completed
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-1187633646432)
> >         >         setting status
> >         >         > to Submitted
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-1187633646432)
> >         >         setting status
> >         >         > to Active
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-1187633646432)
> >         >         setting status
> >         >         > to Completed
> >         >         > ...
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-1-1187633646453)
> >         >         setting
> >         >         > status to Completed
> >         >         > Staged in scripts/runWaveletsAvg.R to
> >         >         > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS
> >         >         > Running job cwtsmall-gt3062gi cwtsmall with
> >         arguments
> >         >         > [scripts/runWaveletsAvg.R, 101, FB] in
> >         >         > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS
> >         >         > Task(type=1,
> >         identity=urn:0-0-0-1-0-1-0-1187633646457)
> >         >         setting status
> >         >         > to Submitted
> >         >         > Task(type=1,
> >         identity=urn:0-0-0-1-0-1-0-1187633646457)
> >         >         setting status
> >         >         > to Active
> >         >         > Task(type=1,
> >         identity=urn:0-0-0-1-0-1-0-1187633646457)
> >         >         setting status
> >         >         > to Completed
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-1187633646459)
> >         >         setting status
> >         >         > to Active
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-1187633646459)
> >         >         setting status
> >         >         > to Completed
> >         >         > Completed job cwtsmall-gt3062gi cwtsmall with
> >         arguments
> >         >         > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
> >         >         > Staging out
> >         >         >
> >         >
> >         sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to
> >         >         > 101-FBchannel15_cwt- avgResults.Rdata from MIT_CMS
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-7-1187633646462)
> >         >         setting
> >         >         > status to Active
> >         >         > Task(type=4,
> >         identity=urn:0-0-0-1-0-1-0-7-1187633646462)
> >         >         setting
> >         >         > status to Completed
> >         >         > ......
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-23-1187633646557)
> >         >         setting
> >         >         > status to Active
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-22-1187633646554)
> >         >         setting
> >         >         > status to Failed Exception in getFile
> >         >         > Task(type=2,
> >         identity=urn:0-0-0-1-0-1-0-2-1187633646560)
> >         >         setting
> >         >         > status to Submitted
> >         >         > ......
> >         >         >
> >         >         > Thanks,
> >         >         > Jing
> >         >         >
> >         >         > On 8/20/07, Mihael Hategan < hategan at mcs.anl.gov>
> >         wrote:
> >         >         >         But those are not from the same job.
> >         >         >
> >         >         >         On Mon, 2007-08-20 at 12:28 -0500, Jing
> >         Tie wrote:
> >         >         >         > Yes. I saw
> >         101-FBchannel1_cwt-avgResults.Rdata to
> >         >         >         > 101-FBchannel28_cwt-avgResults.Rdata 28
> >         output
> >         >         files on the
> >         >         >         swift
> >         >         >         > client, but all the files were empty.
> >         >         >         >
> >         >         >         > Jing
> >         >         >         >
> >         >         >         >
> >         >         >         > On 8/20/07, Mihael Hategan <
> >         hategan at mcs.anl.gov>
> >         >         wrote:
> >         >         >         >         On Mon, 2007-08-20 at 12:21
> >         -0500, Jing
> >         >         Tie wrote:
> >         >         >         >         > Yes. There is no *
> >         avgResults.Rdata
> >         >         under shared
> >         >         >         directory,
> >         >         >         >         only input
> >         >         >         >         > file, scripts, wrapper.sh and
> >         seq.sh .
> >         >         >         >
> >         >         >         >         Did the job actually run?
> >         >         >         >
> >         >         >         >         >
> >         >         >         >         > Jing
> >         >         >         >         >
> >         >         >         >         > On 8/20/07, Mihael Hategan <
> >         >         hategan at mcs.anl.gov>
> >         >         >         wrote:
> >         >         >         >         >         Not much we can do if
> >         the
> >         >         filesystem is
> >         >         >         broken.
> >         >         >         >         >         Did you check to
> >         confirm that
> >         >         the file is
> >         >         >         not
> >         >         >         >         there?
> >         >         >         >         >
> >         >         >         >         >         Mihael
> >         >         >         >         >
> >
> >
>
>



More information about the Swift-user mailing list