[Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349

Michael Wilde wilde at mcs.anl.gov
Tue May 22 10:34:29 CDT 2012


Isnt this line problematic if you dont know where the wrapper script has you cd'ed to:

cp -v home/ketan/ketan_mars/MARS-LIC .
      ^^^

The relative path doesnt seem safe.

- Mike


----- Original Message -----
> From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift User" <swift-user at ci.uchicago.edu>
> Sent: Tuesday, May 22, 2012 10:18:11 AM
> Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/worker.pl line 1349
> Looking this further, I now have a wrapper in place which copies the
> licence file in the cwd before running the executable. However, the
> executable still gets into error as if the licence file is not
> present.
> 
> 
> When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and
> manually run the executable, it works.
> 
> 
> So, the question is does the _swiftwrap.staging does some internal
> cd'ing before calling the executable? I will take a look inside, but
> would be useful if someone knows this.
> 
> 
> The wrapper script is simply the following two lines:
> 
> 
> """
> cp -v home/ketan/ketan_mars/MARS-LIC .
> /home/ketan/ketan_mars/marsMain $1
> """
> 
> 
> Regards,
> Ketan
> 
> 
> On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < wilde at mcs.anl.gov >
> wrote:
> 
> 
> Im surprised that Swift isn't setting the current working dir (cwd) to
> be the job dir, but perhaps that's controlled by this property:
> 
> # Determines if Swift remote wrappers will be executed by specifying
> an
> # absolute path, or a path relative to the job initial working
> directory
> #
> # valid values: absolute, relative
> # wrapper.invocation.mode=absolute
> 
> Can you try your script with this property set to "relative"?
> 
> ...but looking at this further: I see that if youre using coasters
> with provider staging, the logic for job launch is quite different. We
> need to study this and get back to you. For now, best to force the
> right cd's with a wrapper. You might be able to remove the wrapper
> later, once we resolve how the job dir management should work in these
> various cases.
> 
> 
> - Mike
> 
> 
> ----- Original Message -----
> > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> 
> > To: "Michael Wilde" < wilde at mcs.anl.gov >
> > Cc: "Swift User" < swift-user at ci.uchicago.edu >
> > Sent: Monday, May 21, 2012 4:28:02 PM
> > Subject: Re: [Swift-user] Deep recursion on subroutine
> > "main::stageout" at /home/ketan/work/ worker.pl line 1349
> > Thanks Mike. Indeed the recursion was a warning.
> >
> >
> > I found the problem was that the binary could not find the licence
> > in
> > the cwd from where it was being called. This is an application
> > requirement that the licence file must be present in the cwd from
> > where the call is made.
> >
> >
> > However, Swift makes a dirtree in the workdir, stages the files and
> > calls the binary from *outside* of this tree. Is it possible to make
> > swift stage the licence file and put it on the top level without
> > writing a wrapper to do a cp. Again, the point of not wrapping the
> > binary into a script is to mimic the Hadoop setup as close as
> > possible.
> >
> >
> > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < wilde at mcs.anl.gov >
> > wrote:
> >
> >
> > Ketan, as far as I can tell, that message, coming from worker.pl ,
> > is
> 
> > just a warning.
> >
> > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on
> > subroutine "%s"
> >
> > (W recursion) This subroutine has called itself (directly or
> > indirectly) 100 times more than it has returned. This probably
> > indicates an infinite recursion, unless you're writing strange
> > benchmark programs, in which case it indicates something else."
> >
> > The stageout code in worker.pl is indeed recursive, and the warning
> > could be suppressed:
> >
> > "Try placing
> >
> > no warnings 'recursion';
> >
> > within the same scope as that code ..."
> >
> > Can you try a simple mod to catsn, using your ext mapper, to see if
> > it
> > is indeed failing due to the deeply recursive stageout?
> >
> > If you could dig a bit deeper into this, and see whether its really
> > failing when staging back so many files or failing for some other,
> > or
> > related, reason, that would be great.
> >
> > Thanks,
> >
> > - Mike
> >
> >
> >
> > ----- Original Message -----
> > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> > > To: "Swift User" < swift-user at ci.uchicago.edu >
> > > Sent: Monday, May 21, 2012 1:54:34 PM
> > > Subject: [Swift-user] Deep recursion on subroutine
> > > "main::stageout"
> 
> 
> > > at /home/ketan/work/ worker.pl line 1349
> > > Hi,
> > >
> > >
> > > I am trying to run the GE mars script on a bag of workstations. I
> > > tested the script for a sufficient number of tasks and seems to be
> > > working fine on localhost.
> > >
> > >
> > > However, it fails in this setup. I get the error message as
> > > follows
> > > after seemingly right invocation:
> > >
> > >
> > >
> > >
> > > Find: keepalive(120), reconnect - http://128.84.97.46:41287
> > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7
> > > Submitted:3
> > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8
> > > Active:2
> > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/
> > > worker.pl line 1349.
> > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/
> > > worker.pl line 1349.
> > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage
> > > out:7
> > >
> > >
> > > Obviously the staging out of results fails and seems that the
> > > number
> > > of files in the stageout stage is causing the error. The
> > > application
> > > needs to stage out about 120 files.
> > >
> > >
> > > One solution I could quickly think of is to wrap the app in a
> > > shell
> > > and zip the outputs making it just one staged out file.
> > >
> > >
> > > However, the current setup would still be useful since we are
> > > trying
> > > to compare the existing Hadoop solution with the Swift one.
> > >
> > >
> > > Is there any possible workaround, some env setting or so that I
> > > could
> > > try and get the stageout going?
> > >
> > >
> > > The logs are:
> > > http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log
> > > and http://www.mcs.anl.gov/~ketan/workerlogs.tgz
> > >
> > >
> > >
> > >
> > > Regards, --
> > > Ketan
> > >
> > >
> > >
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
> >
> >
> >
> > --
> > Ketan
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> 
> 
> 
> 
> --
> Ketan

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list