[Swift-devel] Provider staging is failing

Mihael Hategan hategan at mcs.anl.gov
Fri Sep 3 21:34:33 CDT 2010


This should be fixed in trunk (swift r3600).

Two issues were addressed:
1. don't bother with creating input file dirs in the wrapper since the
stage-in process, which happens before the wrapper gets invoked, should
take care of that. If it does not, then the model is broken to begin
with.
2. don't add empty directories to the list

Please test.

Mihael

On Mon, 2010-08-30 at 22:52 -0600, wilde at mcs.anl.gov wrote:
> Nope - I was wrong again. The "-d |outdir" form has been generated all along. The problem was that this causes a mkdir -p in _swiftwrap.staging to be invoked with a null value. This was obscured in _swiftwrap, which had a jobdir in front of the null input dir, and was thus silently ignored by mkdir -p.
> 
> I committed a fix (skip mkdir if dir is null), but please keep an eye on _swiftwrap.staging in case it causes other issues.
> 
> There was also a typo in a var $STDER -> STDERR.
> 
> - Mike
> 
> 
> 
> ----- wilde at mcs.anl.gov wrote:
> 
> > ----- "Justin M Wozniak" <wozniak at mcs.anl.gov> wrote:
> > 
> > > I think that's ok.
> > 
> > Right: Mihael pointed out to me in IM that the exec'ed program is
> > /bin/bash with _swiftwrap.staging as an arg.
> > 
> > Digging deeper it looks like _swiftwrap.staging is getting run with
> > this command line:
> > 
> > /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.0001.out -err
> > stderr.txt -i -d '|outdir' -if data.txt -of outdir/f.0001.out -k
> > -cdmfile  -status provider -a data.txt
> > 
> > and the extra "|" separator in the -d 'outdir' arg (quotes mine) is
> > causing a spurious mkdir to get invoked for what would have been the
> > "in dirs" argument.That in turn is causing the ret code 254.
> > 
> > I think that extra | separator is not supposed to be there when there
> > are no input directories (as in this case). vdl-int.staging has:
> >   "-d", flatten(each(fileDirs)),
> > and I now suspect a null value for the dirs of stagein is not being
> > handled right, somewhere around:
> >    fileDirs := fileDirs(stagein, stageout)
> > 
> > - Mike
> > 
> > 
> > 
> > 
> > > Do you have the wrapper.log/info files?
> > > 
> > > On Mon, 30 Aug 2010, Michael Wilde wrote:
> > > 
> > > > _swiftwrap.staging didnt sem to get marked executable:
> > > 
> > > 
> > > > ----- "Michael Wilde" <wilde at mcs.anl.gov> wrote:
> > > >
> > > >> WIth proxy the stageins seem to complete. Then a get a 254 when
> > it
> > > >> tries to run; Im looking at that now:
> > > >>
> > > >> 1283218480.397 DEBUG 000000 CWD: /
> > > >> 1283218480.397 DEBUG 000000 Running /bin/bash
> > > >> 1283218480.397 DEBUG 000000 Directory:
> > > >> /home/wilde/swiftwork/catsn-20100830-2034-hotqv61h-o-cat-oy\
> > > >> un22yj
> > > >> 1283218480.397 DEBUG 000000 Command: _swiftwrap.staging -e
> > > /bin/cat
> > > >> -out outdir/f.0001.out -err st\
> > > >> derr.txt -i -d |outdir -if data.txt -of outdir/f.0001.out -k
> > > -cdmfile
> > > >> -status provider -a data.tx\
> > > >> t
> > > >> 1283218480.397 DEBUG 000000 Command: /bin/bash
> > _swiftwrap.staging
> > > -e
> > > >> /bin/cat -out outdir/f.0001.o\
> > > >> ut -err stderr.txt -i -d |outdir -if data.txt -of
> > outdir/f.0001.out
> > > -k
> > > >> -cdmfile  -status provider \
> > > >> -a data.txt
> > > >> 1283218480.397 DEBUG 000000 1283218479990 Forked process 17949.
> > > >> Waiting for its completion
> > > >> 1283218480.408 DEBUG 000000 Checking jobs status (1 active)
> > > >> 1283218480.408 DEBUG 000000 1283218479990 Checking pid 17949
> > > >> 1283218480.408 DEBUG 000000 1283218479990 Job 17949 still
> > running
> > > >> 1283218480.408 TRACE 000000  IN: len=2, actuallen=2, tag=4,
> > > flags=3,
> > > >> OK
> > > >> 1283218480.408 DEBUG 000000 Fin flag set
> > > >> 1283218480.508 DEBUG 000000 Checking jobs status (1 active)
> > > >> 1283218480.508 DEBUG 000000 1283218479990 Checking pid 17949
> > > >> 1283218480.508 DEBUG 000000 1283218479990 Child process 17949
> > > >> terminated. Status is 254.
> > > >>
> > > >>
> > > >> - Mike
> > > >>
> > > >> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> > > >>
> > > >>> On Mon, 2010-08-30 at 19:26 -0600, wilde at mcs.anl.gov wrote:
> > > >>>> I turned on the TRACE output level in worker.pl. I need to dig
> > > >>> deeper but it looks to me that the pathnames its trying to
> > fetch
> > > >> are
> > > >>> getting mangled/confused with the file:// portion of the URI:
> > > >>>>
> > > >>>> org.globus.cog.karajan.workflow.service.ProtocolException:
> > > >>> java.io.FileNotFoundException:
> > > >>>
> > > >>
> > >
> > /autonfs/home/wilde/./file:/localhost/home/wilde/swift/rev/trunk/bin/../libexec/_swiftwrap.staging
> > > >>> (No such file or directory)
> > > >>>>
> > > >>>> The file
> > > >>> "/home/wilde/swift/rev/trunk/bin/../libexec/_swiftwrap.staging"
> > > >> does
> > > >>> exist on the client side.
> > > >>>
> > > >>> Seems to. I gather "file" is broken.
> > > >>>
> > > >>> Can you try "proxy", and see if it fails? If not, I'll know a
> > bit
> > > >>> better
> > > >>> where to look.
> > > >>>
> > > >>> Mihael
> > > >>
> > > >> --
> > > >> Michael Wilde
> > > >> Computation Institute, University of Chicago
> > > >> Mathematics and Computer Science Division
> > > >> Argonne National Laboratory
> > > >
> > > >
> > > 
> > > -- 
> > > Justin M Wozniak
> > 
> > -- 
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list