[Swift-devel] Provider staging is failing
wilde at mcs.anl.gov
wilde at mcs.anl.gov
Sat Sep 4 06:51:48 CDT 2010
Thanks, Mihael - I will test.
Can you clarify how to use provider staging? My understanding is:
- set use.provider.staging=true in swift.properties
(this turns it on for all sites as far as I know; thats not desirable moving forward, but fine for now for testing. Perhaps it should be controlled solely by the stagingMethod tag in sites.xml?)
- set stagingMethod in sites.xml. At the moment, the file method is broken and proxy method works.
- set workdirectory in sites.xml to a *node local* directory
- <scratch> element in sites.xml is ignored (as far as I could tell from tests), so there is no way to specify that the jobdirectory should be local. Instead, one does this by setting the workdirectory to a node-local dir.
- the <filesystem> element in sites.xml is ignored
Is that correct, and is anything else needed to make provider staging work correctly?
- Mike
----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> This should be fixed in trunk (swift r3600).
>
> Two issues were addressed:
> 1. don't bother with creating input file dirs in the wrapper since
> the
> stage-in process, which happens before the wrapper gets invoked,
> should
> take care of that. If it does not, then the model is broken to begin
> with.
> 2. don't add empty directories to the list
>
> Please test.
>
> Mihael
>
> On Mon, 2010-08-30 at 22:52 -0600, wilde at mcs.anl.gov wrote:
> > Nope - I was wrong again. The "-d |outdir" form has been generated
> all along. The problem was that this causes a mkdir -p in
> _swiftwrap.staging to be invoked with a null value. This was obscured
> in _swiftwrap, which had a jobdir in front of the null input dir, and
> was thus silently ignored by mkdir -p.
> >
> > I committed a fix (skip mkdir if dir is null), but please keep an
> eye on _swiftwrap.staging in case it causes other issues.
> >
> > There was also a typo in a var $STDER -> STDERR.
> >
> > - Mike
> >
> >
> >
> > ----- wilde at mcs.anl.gov wrote:
> >
> > > ----- "Justin M Wozniak" <wozniak at mcs.anl.gov> wrote:
> > >
> > > > I think that's ok.
> > >
> > > Right: Mihael pointed out to me in IM that the exec'ed program is
> > > /bin/bash with _swiftwrap.staging as an arg.
> > >
> > > Digging deeper it looks like _swiftwrap.staging is getting run
> with
> > > this command line:
> > >
> > > /bin/bash _swiftwrap.staging -e /bin/cat -out outdir/f.0001.out
> -err
> > > stderr.txt -i -d '|outdir' -if data.txt -of outdir/f.0001.out -k
> > > -cdmfile -status provider -a data.txt
> > >
> > > and the extra "|" separator in the -d 'outdir' arg (quotes mine)
> is
> > > causing a spurious mkdir to get invoked for what would have been
> the
> > > "in dirs" argument.That in turn is causing the ret code 254.
> > >
> > > I think that extra | separator is not supposed to be there when
> there
> > > are no input directories (as in this case). vdl-int.staging has:
> > > "-d", flatten(each(fileDirs)),
> > > and I now suspect a null value for the dirs of stagein is not
> being
> > > handled right, somewhere around:
> > > fileDirs := fileDirs(stagein, stageout)
> > >
> > > - Mike
> > >
> > >
> > >
> > >
> > > > Do you have the wrapper.log/info files?
> > > >
> > > > On Mon, 30 Aug 2010, Michael Wilde wrote:
> > > >
> > > > > _swiftwrap.staging didnt sem to get marked executable:
> > > >
> > > >
> > > > > ----- "Michael Wilde" <wilde at mcs.anl.gov> wrote:
> > > > >
> > > > >> WIth proxy the stageins seem to complete. Then a get a 254
> when
> > > it
> > > > >> tries to run; Im looking at that now:
> > > > >>
> > > > >> 1283218480.397 DEBUG 000000 CWD: /
> > > > >> 1283218480.397 DEBUG 000000 Running /bin/bash
> > > > >> 1283218480.397 DEBUG 000000 Directory:
> > > > >> /home/wilde/swiftwork/catsn-20100830-2034-hotqv61h-o-cat-oy\
> > > > >> un22yj
> > > > >> 1283218480.397 DEBUG 000000 Command: _swiftwrap.staging -e
> > > > /bin/cat
> > > > >> -out outdir/f.0001.out -err st\
> > > > >> derr.txt -i -d |outdir -if data.txt -of outdir/f.0001.out -k
> > > > -cdmfile
> > > > >> -status provider -a data.tx\
> > > > >> t
> > > > >> 1283218480.397 DEBUG 000000 Command: /bin/bash
> > > _swiftwrap.staging
> > > > -e
> > > > >> /bin/cat -out outdir/f.0001.o\
> > > > >> ut -err stderr.txt -i -d |outdir -if data.txt -of
> > > outdir/f.0001.out
> > > > -k
> > > > >> -cdmfile -status provider \
> > > > >> -a data.txt
> > > > >> 1283218480.397 DEBUG 000000 1283218479990 Forked process
> 17949.
> > > > >> Waiting for its completion
> > > > >> 1283218480.408 DEBUG 000000 Checking jobs status (1 active)
> > > > >> 1283218480.408 DEBUG 000000 1283218479990 Checking pid 17949
> > > > >> 1283218480.408 DEBUG 000000 1283218479990 Job 17949 still
> > > running
> > > > >> 1283218480.408 TRACE 000000 IN: len=2, actuallen=2, tag=4,
> > > > flags=3,
> > > > >> OK
> > > > >> 1283218480.408 DEBUG 000000 Fin flag set
> > > > >> 1283218480.508 DEBUG 000000 Checking jobs status (1 active)
> > > > >> 1283218480.508 DEBUG 000000 1283218479990 Checking pid 17949
> > > > >> 1283218480.508 DEBUG 000000 1283218479990 Child process
> 17949
> > > > >> terminated. Status is 254.
> > > > >>
> > > > >>
> > > > >> - Mike
> > > > >>
> > > > >> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> > > > >>
> > > > >>> On Mon, 2010-08-30 at 19:26 -0600, wilde at mcs.anl.gov wrote:
> > > > >>>> I turned on the TRACE output level in worker.pl. I need to
> dig
> > > > >>> deeper but it looks to me that the pathnames its trying to
> > > fetch
> > > > >> are
> > > > >>> getting mangled/confused with the file:// portion of the
> URI:
> > > > >>>>
> > > > >>>> org.globus.cog.karajan.workflow.service.ProtocolException:
> > > > >>> java.io.FileNotFoundException:
> > > > >>>
> > > > >>
> > > >
> > >
> /autonfs/home/wilde/./file:/localhost/home/wilde/swift/rev/trunk/bin/../libexec/_swiftwrap.staging
> > > > >>> (No such file or directory)
> > > > >>>>
> > > > >>>> The file
> > > > >>>
> "/home/wilde/swift/rev/trunk/bin/../libexec/_swiftwrap.staging"
> > > > >> does
> > > > >>> exist on the client side.
> > > > >>>
> > > > >>> Seems to. I gather "file" is broken.
> > > > >>>
> > > > >>> Can you try "proxy", and see if it fails? If not, I'll know
> a
> > > bit
> > > > >>> better
> > > > >>> where to look.
> > > > >>>
> > > > >>> Mihael
> > > > >>
> > > > >> --
> > > > >> Michael Wilde
> > > > >> Computation Institute, University of Chicago
> > > > >> Mathematics and Computer Science Division
> > > > >> Argonne National Laboratory
> > > > >
> > > > >
> > > >
> > > > --
> > > > Justin M Wozniak
> > >
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list