I see. So at this point, the problem could be caused by two reasons:<br>1. GFS system is broken, and missed the output files;<br>2. Swift has problem to create output files.<br><br>Is it right?<br><br>Thanks,<br>Jing<br><br>
<div><span class="gmail_quote">On 8/20/07, <b class="gmail_sendername">Mihael Hategan</b> <<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
No. Swift will always try to stage out the output files if it has no<br>indication that something went wrong with the job. But if the filesystem<br>is broken, and the files are not actually there, well, that's what you
<br>seem to be observing.<br><br>On Mon, 2007-08-20 at 14:36 -0500, Jing Tie wrote:<br>> I see. Could this output be viewed as a sign?<br>><br>> Completed job cwtsmall-gt3062gi cwtsmall with arguments<br>> [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS
<br>> Staging out<br>> sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to<br>> 101-FBchannel15_cwt-avgResults.Rdata from MIT_CMS<br>><br>> Thanks,<br>> Jing<br>><br>> On 8/20/07, Mihael Hategan <
<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>> wrote:<br>> Local empty files may be created even if the remote files<br>> don't exist.<br>> So don't take that as a sign that the application has run.
<br>><br>> In the mean time I'll try to convince it to not create empty<br>> local<br>> files, if they don't exist remotely.<br>><br>> Mihael<br>><br>> On Mon, 2007-08-20 at 13:43 -0500, Jing Tie wrote:
<br>> > I think these files were from the job. Because I deleted all<br>> the<br>> > *Results.Rdata before the job submitting, and found these<br>> empty files<br>> > after the execution.
<br>> ><br>> > output of the process of execution:<br>> > RunID: 3szhlhvg4seu0<br>> > cwtsmall started<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429)
<br>> setting status<br>> > to Active<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646429)<br>> setting status<br>> > to Completed<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)
<br>> setting status<br>> > to Submitted<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)<br>> setting status<br>> > to Active<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1187633646432)
<br>> setting status<br>> > to Completed<br>> > ...<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-1-1187633646453)<br>> setting<br>> > status to Completed
<br>> > Staged in scripts/runWaveletsAvg.R to<br>> > sid-wf1-3szhlhvg4seu0/shared/scripts/ on MIT_CMS<br>> > Running job cwtsmall-gt3062gi cwtsmall with arguments<br>> > [scripts/runWaveletsAvg.R, 101, FB] in
<br>> > sid-wf1-3szhlhvg4seu0/cwtsmall-gt3062gi on MIT_CMS<br>> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)<br>> setting status<br>> > to Submitted<br>> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)
<br>> setting status<br>> > to Active<br>> > Task(type=1, identity=urn:0-0-0-1-0-1-0-1187633646457)<br>> setting status<br>> > to Completed<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459)
<br>> setting status<br>> > to Active<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-1187633646459)<br>> setting status<br>> > to Completed<br>> > Completed job cwtsmall-gt3062gi cwtsmall with arguments
<br>> > [scripts/runWaveletsAvg.R, 101, FB] on MIT_CMS<br>> > Staging out<br>> ><br>> sid-wf1-3szhlhvg4seu0/shared//101-FBchannel15_cwt-avgResults.Rdata to<br>> > 101-FBchannel15_cwt-
avgResults.Rdata from MIT_CMS<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462)<br>> setting<br>> > status to Active<br>> > Task(type=4, identity=urn:0-0-0-1-0-1-0-7-1187633646462)
<br>> setting<br>> > status to Completed<br>> > ......<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-23-1187633646557)<br>> setting<br>> > status to Active
<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-22-1187633646554)<br>> setting<br>> > status to Failed Exception in getFile<br>> > Task(type=2, identity=urn:0-0-0-1-0-1-0-2-1187633646560)
<br>> setting<br>> > status to Submitted<br>> > ......<br>> ><br>> > Thanks,<br>> > Jing<br>> ><br>> > On 8/20/07, Mihael Hategan <
<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>> wrote:<br>> > But those are not from the same job.<br>> ><br>> > On Mon, 2007-08-20 at 12:28 -0500, Jing Tie wrote:
<br>> > > Yes. I saw 101-FBchannel1_cwt-avgResults.Rdata to<br>> > > 101-FBchannel28_cwt-avgResults.Rdata 28 output<br>> files on the<br>> > swift
<br>> > > client, but all the files were empty.<br>> > ><br>> > > Jing<br>> > ><br>> > ><br>> > > On 8/20/07, Mihael Hategan <
<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>><br>> wrote:<br>> > > On Mon, 2007-08-20 at 12:21 -0500, Jing<br>> Tie wrote:<br>> > > > Yes. There is no *
avgResults.Rdata<br>> under shared<br>> > directory,<br>> > > only input<br>> > > > file, scripts, wrapper.sh and seq.sh
.<br>> > ><br>> > > Did the job actually run?<br>> > ><br>> > > ><br>> > > > Jing
<br>> > > ><br>> > > > On 8/20/07, Mihael Hategan <<br>> <a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>><br>> > wrote:
<br>> > > > Not much we can do if the<br>> filesystem is<br>> > broken.<br>> > > > Did you check to confirm that
<br>> the file is<br>> > not<br>> > > there?<br>> > > ><br>> > > > Mihael<br>
> > > ><br><br></blockquote></div><br>