<div>The line works fine because Swift creates the dir tree starting at /home but in the swift.workdir. With -v, I could see the file gets copied to the cwd and is present there.</div><div><br></div><div>So, I assume that the wrapper script is not cd'ing me anywhere. So, it still is a mystery why the app complaint about the file not present when run from wrapper and it works when run manually in the same dir.</div>
<br><div class="gmail_quote">On Tue, May 22, 2012 at 11:34 AM, Michael Wilde <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Isnt this line problematic if you dont know where the wrapper script has you cd'ed to:<br>
<br>
cp -v home/ketan/ketan_mars/MARS-LIC .<br>
^^^<br>
<br>
The relative path doesnt seem safe.<br>
<div class="im"><br>
- Mike<br>
<br>
<br>
----- Original Message -----<br>
> From: "Ketan Maheshwari" <<a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a>><br>
> To: "Michael Wilde" <<a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a>><br>
> Cc: "Swift User" <<a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a>><br>
</div><div><div class="h5">> Sent: Tuesday, May 22, 2012 10:18:11 AM<br>
> Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/<a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
> Looking this further, I now have a wrapper in place which copies the<br>
> licence file in the cwd before running the executable. However, the<br>
> executable still gets into error as if the licence file is not<br>
> present.<br>
><br>
><br>
> When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and<br>
> manually run the executable, it works.<br>
><br>
><br>
> So, the question is does the _swiftwrap.staging does some internal<br>
> cd'ing before calling the executable? I will take a look inside, but<br>
> would be useful if someone knows this.<br>
><br>
><br>
> The wrapper script is simply the following two lines:<br>
><br>
><br>
> """<br>
> cp -v home/ketan/ketan_mars/MARS-LIC .<br>
> /home/ketan/ketan_mars/marsMain $1<br>
> """<br>
><br>
><br>
> Regards,<br>
> Ketan<br>
><br>
><br>
> On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
> Im surprised that Swift isn't setting the current working dir (cwd) to<br>
> be the job dir, but perhaps that's controlled by this property:<br>
><br>
> # Determines if Swift remote wrappers will be executed by specifying<br>
> an<br>
> # absolute path, or a path relative to the job initial working<br>
> directory<br>
> #<br>
> # valid values: absolute, relative<br>
> # wrapper.invocation.mode=absolute<br>
><br>
> Can you try your script with this property set to "relative"?<br>
><br>
> ...but looking at this further: I see that if youre using coasters<br>
> with provider staging, the logic for job launch is quite different. We<br>
> need to study this and get back to you. For now, best to force the<br>
> right cd's with a wrapper. You might be able to remove the wrapper<br>
> later, once we resolve how the job dir management should work in these<br>
> various cases.<br>
><br>
><br>
> - Mike<br>
><br>
><br>
> ----- Original Message -----<br>
> > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
><br>
> > To: "Michael Wilde" < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> > Cc: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
> > Sent: Monday, May 21, 2012 4:28:02 PM<br>
> > Subject: Re: [Swift-user] Deep recursion on subroutine<br>
</div></div>> > "main::stageout" at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
<div class="HOEnZb"><div class="h5">> > Thanks Mike. Indeed the recursion was a warning.<br>
> ><br>
> ><br>
> > I found the problem was that the binary could not find the licence<br>
> > in<br>
> > the cwd from where it was being called. This is an application<br>
> > requirement that the licence file must be present in the cwd from<br>
> > where the call is made.<br>
> ><br>
> ><br>
> > However, Swift makes a dirtree in the workdir, stages the files and<br>
> > calls the binary from *outside* of this tree. Is it possible to make<br>
> > swift stage the licence file and put it on the top level without<br>
> > writing a wrapper to do a cp. Again, the point of not wrapping the<br>
> > binary into a script is to mimic the Hadoop setup as close as<br>
> > possible.<br>
> ><br>
> ><br>
> > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> > wrote:<br>
> ><br>
> ><br>
> > Ketan, as far as I can tell, that message, coming from <a href="http://worker.pl" target="_blank">worker.pl</a> ,<br>
> > is<br>
><br>
> > just a warning.<br>
> ><br>
> > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on<br>
> > subroutine "%s"<br>
> ><br>
> > (W recursion) This subroutine has called itself (directly or<br>
> > indirectly) 100 times more than it has returned. This probably<br>
> > indicates an infinite recursion, unless you're writing strange<br>
> > benchmark programs, in which case it indicates something else."<br>
> ><br>
> > The stageout code in <a href="http://worker.pl" target="_blank">worker.pl</a> is indeed recursive, and the warning<br>
> > could be suppressed:<br>
> ><br>
> > "Try placing<br>
> ><br>
> > no warnings 'recursion';<br>
> ><br>
> > within the same scope as that code ..."<br>
> ><br>
> > Can you try a simple mod to catsn, using your ext mapper, to see if<br>
> > it<br>
> > is indeed failing due to the deeply recursive stageout?<br>
> ><br>
> > If you could dig a bit deeper into this, and see whether its really<br>
> > failing when staging back so many files or failing for some other,<br>
> > or<br>
> > related, reason, that would be great.<br>
> ><br>
> > Thanks,<br>
> ><br>
> > - Mike<br>
> ><br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
> > > To: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
> > > Sent: Monday, May 21, 2012 1:54:34 PM<br>
> > > Subject: [Swift-user] Deep recursion on subroutine<br>
> > > "main::stageout"<br>
><br>
><br>
> > > at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
> > > Hi,<br>
> > ><br>
> > ><br>
> > > I am trying to run the GE mars script on a bag of workstations. I<br>
> > > tested the script for a sufficient number of tasks and seems to be<br>
> > > working fine on localhost.<br>
> > ><br>
> > ><br>
> > > However, it fails in this setup. I get the error message as<br>
> > > follows<br>
> > > after seemingly right invocation:<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > Find: keepalive(120), reconnect - <a href="http://128.84.97.46:41287" target="_blank">http://128.84.97.46:41287</a><br>
> > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7<br>
> > > Submitted:3<br>
> > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8<br>
> > > Active:2<br>
> > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/<br>
> > > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > > Deep recursion on subroutine "main::stageout" at /home/ketan/work/<br>
> > > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage<br>
> > > out:7<br>
> > ><br>
> > ><br>
> > > Obviously the staging out of results fails and seems that the<br>
> > > number<br>
> > > of files in the stageout stage is causing the error. The<br>
> > > application<br>
> > > needs to stage out about 120 files.<br>
> > ><br>
> > ><br>
> > > One solution I could quickly think of is to wrap the app in a<br>
> > > shell<br>
> > > and zip the outputs making it just one staged out file.<br>
> > ><br>
> > ><br>
> > > However, the current setup would still be useful since we are<br>
> > > trying<br>
> > > to compare the existing Hadoop solution with the Swift one.<br>
> > ><br>
> > ><br>
> > > Is there any possible workaround, some env setting or so that I<br>
> > > could<br>
> > > try and get the stageout going?<br>
> > ><br>
> > ><br>
> > > The logs are:<br>
> > > <a href="http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log" target="_blank">http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log</a><br>
> > > and <a href="http://www.mcs.anl.gov/~ketan/workerlogs.tgz" target="_blank">http://www.mcs.anl.gov/~ketan/workerlogs.tgz</a><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > Regards, --<br>
> > > Ketan<br>
> > ><br>
> > ><br>
> > ><br>
> > > _______________________________________________<br>
> > > Swift-user mailing list<br>
> > > <a href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a><br>
> > > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br>
> ><br>
> > --<br>
> > Michael Wilde<br>
> > Computation Institute, University of Chicago<br>
> > Mathematics and Computer Science Division<br>
> > Argonne National Laboratory<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > Ketan<br>
><br>
> --<br>
> Michael Wilde<br>
> Computation Institute, University of Chicago<br>
> Mathematics and Computer Science Division<br>
> Argonne National Laboratory<br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> Ketan<br>
<br>
--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><font face="'courier new', monospace">Ketan</font><br><br><br>