Looking this further, I now have a wrapper in place which copies the licence file in the cwd before running the executable. However, the executable still gets into error as if the licence file is not present. <div><br></div>
<div>When I cd into this dir (swift.workdir/mars-20120519-1203-3l....) and manually run the executable, it works.</div><div><br></div><div>So, the question is does the _swiftwrap.staging does some internal cd'ing before calling the executable? I will take a look inside, but would be useful if someone knows this.</div>
<div><br></div><div>The wrapper script is simply the following two lines:</div><div><br></div><div>"""</div><div>cp -v home/ketan/ketan_mars/MARS-LIC .</div><div>/home/ketan/ketan_mars/marsMain $1<br>"""</div>
<div><br></div><div>Regards,</div><div>Ketan</div><div><br><div class="gmail_quote">On Mon, May 21, 2012 at 7:51 PM, Michael Wilde <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Im surprised that Swift isn't setting the current working dir (cwd) to be the job dir, but perhaps that's controlled by this property:<br>
<br>
# Determines if Swift remote wrappers will be executed by specifying an<br>
# absolute path, or a path relative to the job initial working directory<br>
#<br>
# valid values: absolute, relative<br>
# wrapper.invocation.mode=absolute<br>
<br>
Can you try your script with this property set to "relative"?<br>
<br>
...but looking at this further: I see that if youre using coasters with provider staging, the logic for job launch is quite different. We need to study this and get back to you. For now, best to force the right cd's with a wrapper. You might be able to remove the wrapper later, once we resolve how the job dir management should work in these various cases.<br>
<div class="im"><br>
- Mike<br>
<br>
<br>
----- Original Message -----<br>
> From: "Ketan Maheshwari" <<a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a>><br>
</div><div class="im">> To: "Michael Wilde" <<a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a>><br>
> Cc: "Swift User" <<a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a>><br>
> Sent: Monday, May 21, 2012 4:28:02 PM<br>
> Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/<a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
> Thanks Mike. Indeed the recursion was a warning.<br>
><br>
><br>
> I found the problem was that the binary could not find the licence in<br>
> the cwd from where it was being called. This is an application<br>
> requirement that the licence file must be present in the cwd from<br>
> where the call is made.<br>
><br>
><br>
> However, Swift makes a dirtree in the workdir, stages the files and<br>
> calls the binary from *outside* of this tree. Is it possible to make<br>
> swift stage the licence file and put it on the top level without<br>
> writing a wrapper to do a cp. Again, the point of not wrapping the<br>
> binary into a script is to mimic the Hadoop setup as close as<br>
> possible.<br>
><br>
><br>
> On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
</div>> Ketan, as far as I can tell, that message, coming from <a href="http://worker.pl" target="_blank">worker.pl</a> , is<br>
<div class="im HOEnZb">> just a warning.<br>
><br>
> Programing Perl sec 33, Diagnostic Messages: "Deep recursion on<br>
> subroutine "%s"<br>
><br>
> (W recursion) This subroutine has called itself (directly or<br>
> indirectly) 100 times more than it has returned. This probably<br>
> indicates an infinite recursion, unless you're writing strange<br>
> benchmark programs, in which case it indicates something else."<br>
><br>
> The stageout code in <a href="http://worker.pl" target="_blank">worker.pl</a> is indeed recursive, and the warning<br>
> could be suppressed:<br>
><br>
> "Try placing<br>
><br>
> no warnings 'recursion';<br>
><br>
> within the same scope as that code ..."<br>
><br>
> Can you try a simple mod to catsn, using your ext mapper, to see if it<br>
> is indeed failing due to the deeply recursive stageout?<br>
><br>
> If you could dig a bit deeper into this, and see whether its really<br>
> failing when staging back so many files or failing for some other, or<br>
> related, reason, that would be great.<br>
><br>
> Thanks,<br>
><br>
> - Mike<br>
><br>
><br>
><br>
> ----- Original Message -----<br>
> > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
> > To: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
> > Sent: Monday, May 21, 2012 1:54:34 PM<br>
> > Subject: [Swift-user] Deep recursion on subroutine "main::stageout"<br>
</div><div class="HOEnZb"><div class="h5">> > at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
> > Hi,<br>
> ><br>
> ><br>
> > I am trying to run the GE mars script on a bag of workstations. I<br>
> > tested the script for a sufficient number of tasks and seems to be<br>
> > working fine on localhost.<br>
> ><br>
> ><br>
> > However, it fails in this setup. I get the error message as follows<br>
> > after seemingly right invocation:<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Find: keepalive(120), reconnect - <a href="http://128.84.97.46:41287" target="_blank">http://128.84.97.46:41287</a><br>
> > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7<br>
> > Submitted:3<br>
> > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8 Active:2<br>
> > Deep recursion on subroutine "main::stageout" at /home/ketan/work/<br>
> > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > Deep recursion on subroutine "main::stageout" at /home/ketan/work/<br>
> > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage out:7<br>
> ><br>
> ><br>
> > Obviously the staging out of results fails and seems that the number<br>
> > of files in the stageout stage is causing the error. The application<br>
> > needs to stage out about 120 files.<br>
> ><br>
> ><br>
> > One solution I could quickly think of is to wrap the app in a shell<br>
> > and zip the outputs making it just one staged out file.<br>
> ><br>
> ><br>
> > However, the current setup would still be useful since we are trying<br>
> > to compare the existing Hadoop solution with the Swift one.<br>
> ><br>
> ><br>
> > Is there any possible workaround, some env setting or so that I<br>
> > could<br>
> > try and get the stageout going?<br>
> ><br>
> ><br>
> > The logs are:<br>
> > <a href="http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log" target="_blank">http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log</a><br>
> > and <a href="http://www.mcs.anl.gov/~ketan/workerlogs.tgz" target="_blank">http://www.mcs.anl.gov/~ketan/workerlogs.tgz</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Regards, --<br>
> > Ketan<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Swift-user mailing list<br>
> > <a href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a><br>
> > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br>
><br>
> --<br>
> Michael Wilde<br>
> Computation Institute, University of Chicago<br>
> Mathematics and Computer Science Division<br>
> Argonne National Laboratory<br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> Ketan<br>
<br>
--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><font face="'courier new', monospace">Ketan</font><br><br><br>
</div>