<div>Mike,</div><div><br></div><div>The jobdir and the workdir are the same right? At least that is what the pwd in my marswrapper shows.</div><div><br></div><div>The following is the stdout section of swiftwrap:</div><div>
_____________________________________________________________________________</div><div><br></div><div> stdout</div><div>_____________________________________________________________________________</div><div><br></div>
<div># pwd</div><div>/amd/camel/b/ketan/ketan_mars/swift.workdir/mars-20120522-1702-j6gtml62-k-marswrap-kcj9rork</div><div><br></div><div># cp -v home/ketan/ketan_mars/MARS-LIC .</div><div>`home/ketan/ketan_mars/MARS-LIC' -> `./MARS-LIC'</div>
<div><br></div><div># The error message thrown by mars"</div><div> <**> ERROR: *** Unable to open License Date File MARS-LIC ***</div><div>===================</div><div><br></div>This is why I said Mars is running as if the licence file is not present even though it is present. <div>
<br></div><div>Also, I do not see any symlinks here in the workdir. They are all real files.<br><div><br><div class="gmail_quote">On Tue, May 22, 2012 at 1:24 PM, Michael Wilde <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">If that path home/ketan/ketan_mars/MARS-LIC is being correctly copied to the workdir (and I stand corrected: thats exactly what should happen) then another possibility is that the program doesnt like getting a symlink for the license file? Can you test that case externally (outside of Swift) before we go further?<br>
<br>
You reported the problem as "...the executable still gets into error as if the licence file is not present."<br>
<br>
The license file will appear to the MARS executable (and the wrapper script) as a symlink (from the jobdir to the workdir, to use the terminology f the Swift User Guide).<br>
<br>
If that is indeed the problem, your wrapper script might be able to get around this with:<br>
cp MARS-LIC tmplic<br>
rm MARS-LIC<br>
mv tmplic MARS-LIC<br>
<br>
Exactly what error is MARS generating for this problem?<br>
<div class="im"><br>
- Mike<br>
<br>
----- Original Message -----<br>
> From: "Ketan Maheshwari" <<a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a>><br>
> To: "Michael Wilde" <<a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a>><br>
> Cc: "Swift User" <<a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a>><br>
</div><div class="im">> Sent: Tuesday, May 22, 2012 12:01:49 PM<br>
> Subject: Re: [Swift-user] Deep recursion on subroutine "main::stageout" at /home/ketan/work/<a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
</div><div><div class="h5">> The line works fine because Swift creates the dir tree starting at<br>
> /home but in the swift.workdir. With -v, I could see the file gets<br>
> copied to the cwd and is present there.<br>
><br>
><br>
> So, I assume that the wrapper script is not cd'ing me anywhere. So, it<br>
> still is a mystery why the app complaint about the file not present<br>
> when run from wrapper and it works when run manually in the same dir.<br>
><br>
> On Tue, May 22, 2012 at 11:34 AM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
> Isnt this line problematic if you dont know where the wrapper script<br>
> has you cd'ed to:<br>
><br>
> cp -v home/ketan/ketan_mars/MARS-LIC .<br>
> ^^^<br>
><br>
> The relative path doesnt seem safe.<br>
><br>
><br>
> - Mike<br>
><br>
><br>
> ----- Original Message -----<br>
> > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
> > To: "Michael Wilde" < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> > Cc: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
><br>
><br>
> > Sent: Tuesday, May 22, 2012 10:18:11 AM<br>
> > Subject: Re: [Swift-user] Deep recursion on subroutine<br>
</div></div>> > "main::stageout" at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
<div class="HOEnZb"><div class="h5">> > Looking this further, I now have a wrapper in place which copies the<br>
> > licence file in the cwd before running the executable. However, the<br>
> > executable still gets into error as if the licence file is not<br>
> > present.<br>
> ><br>
> ><br>
> > When I cd into this dir (swift.workdir/mars-20120519-1203-3l....)<br>
> > and<br>
> > manually run the executable, it works.<br>
> ><br>
> ><br>
> > So, the question is does the _swiftwrap.staging does some internal<br>
> > cd'ing before calling the executable? I will take a look inside, but<br>
> > would be useful if someone knows this.<br>
> ><br>
> ><br>
> > The wrapper script is simply the following two lines:<br>
> ><br>
> ><br>
> > """<br>
> > cp -v home/ketan/ketan_mars/MARS-LIC .<br>
> > /home/ketan/ketan_mars/marsMain $1<br>
> > """<br>
> ><br>
> ><br>
> > Regards,<br>
> > Ketan<br>
> ><br>
> ><br>
> > On Mon, May 21, 2012 at 7:51 PM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> > wrote:<br>
> ><br>
> ><br>
> > Im surprised that Swift isn't setting the current working dir (cwd)<br>
> > to<br>
> > be the job dir, but perhaps that's controlled by this property:<br>
> ><br>
> > # Determines if Swift remote wrappers will be executed by specifying<br>
> > an<br>
> > # absolute path, or a path relative to the job initial working<br>
> > directory<br>
> > #<br>
> > # valid values: absolute, relative<br>
> > # wrapper.invocation.mode=absolute<br>
> ><br>
> > Can you try your script with this property set to "relative"?<br>
> ><br>
> > ...but looking at this further: I see that if youre using coasters<br>
> > with provider staging, the logic for job launch is quite different.<br>
> > We<br>
> > need to study this and get back to you. For now, best to force the<br>
> > right cd's with a wrapper. You might be able to remove the wrapper<br>
> > later, once we resolve how the job dir management should work in<br>
> > these<br>
> > various cases.<br>
> ><br>
> ><br>
> > - Mike<br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
> ><br>
> > > To: "Michael Wilde" < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> ><br>
> > > Cc: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
> > > Sent: Monday, May 21, 2012 4:28:02 PM<br>
> > > Subject: Re: [Swift-user] Deep recursion on subroutine<br>
> > > "main::stageout" at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
><br>
><br>
> > > Thanks Mike. Indeed the recursion was a warning.<br>
> > ><br>
> > ><br>
> > > I found the problem was that the binary could not find the licence<br>
> > > in<br>
> > > the cwd from where it was being called. This is an application<br>
> > > requirement that the licence file must be present in the cwd from<br>
> > > where the call is made.<br>
> > ><br>
> > ><br>
> > > However, Swift makes a dirtree in the workdir, stages the files<br>
> > > and<br>
> > > calls the binary from *outside* of this tree. Is it possible to<br>
> > > make<br>
> > > swift stage the licence file and put it on the top level without<br>
> > > writing a wrapper to do a cp. Again, the point of not wrapping the<br>
> > > binary into a script is to mimic the Hadoop setup as close as<br>
> > > possible.<br>
> > ><br>
> > ><br>
> > > On Mon, May 21, 2012 at 3:35 PM, Michael Wilde < <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a><br>
> > > ><br>
> > > wrote:<br>
> > ><br>
> > ><br>
> > > Ketan, as far as I can tell, that message, coming from <a href="http://worker.pl" target="_blank">worker.pl</a> ,<br>
> > > is<br>
> ><br>
> > > just a warning.<br>
> > ><br>
> > > Programing Perl sec 33, Diagnostic Messages: "Deep recursion on<br>
> > > subroutine "%s"<br>
> > ><br>
> > > (W recursion) This subroutine has called itself (directly or<br>
> > > indirectly) 100 times more than it has returned. This probably<br>
> > > indicates an infinite recursion, unless you're writing strange<br>
> > > benchmark programs, in which case it indicates something else."<br>
> > ><br>
> > > The stageout code in <a href="http://worker.pl" target="_blank">worker.pl</a> is indeed recursive, and the<br>
> > > warning<br>
> > > could be suppressed:<br>
> > ><br>
> > > "Try placing<br>
> > ><br>
> > > no warnings 'recursion';<br>
> > ><br>
> > > within the same scope as that code ..."<br>
> > ><br>
> > > Can you try a simple mod to catsn, using your ext mapper, to see<br>
> > > if<br>
> > > it<br>
> > > is indeed failing due to the deeply recursive stageout?<br>
> > ><br>
> > > If you could dig a bit deeper into this, and see whether its<br>
> > > really<br>
> > > failing when staging back so many files or failing for some other,<br>
> > > or<br>
> > > related, reason, that would be great.<br>
> > ><br>
> > > Thanks,<br>
> > ><br>
> > > - Mike<br>
> > ><br>
> > ><br>
> > ><br>
> > > ----- Original Message -----<br>
> > > > From: "Ketan Maheshwari" < <a href="mailto:ketancmaheshwari@gmail.com">ketancmaheshwari@gmail.com</a> ><br>
> > > > To: "Swift User" < <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a> ><br>
> > > > Sent: Monday, May 21, 2012 1:54:34 PM<br>
> > > > Subject: [Swift-user] Deep recursion on subroutine<br>
> > > > "main::stageout"<br>
> ><br>
> ><br>
> > > > at /home/ketan/work/ <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349<br>
> > > > Hi,<br>
> > > ><br>
> > > ><br>
> > > > I am trying to run the GE mars script on a bag of workstations.<br>
> > > > I<br>
> > > > tested the script for a sufficient number of tasks and seems to<br>
> > > > be<br>
> > > > working fine on localhost.<br>
> > > ><br>
> > > ><br>
> > > > However, it fails in this setup. I get the error message as<br>
> > > > follows<br>
> > > > after seemingly right invocation:<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > Find: keepalive(120), reconnect - <a href="http://128.84.97.46:41287" target="_blank">http://128.84.97.46:41287</a><br>
> > > > Progress: time: Mon, 21 May 2012 14:43:18 -0400 Stage in:7<br>
> > > > Submitted:3<br>
> > > > Progress: time: Mon, 21 May 2012 14:43:19 -0400 Stage in:8<br>
> > > > Active:2<br>
> > > > Deep recursion on subroutine "main::stageout" at<br>
> > > > /home/ketan/work/<br>
> > > > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > > > Deep recursion on subroutine "main::stageout" at<br>
> > > > /home/ketan/work/<br>
> > > > <a href="http://worker.pl" target="_blank">worker.pl</a> line 1349.<br>
> > > > Progress: time: Mon, 21 May 2012 14:43:20 -0400 Active:3 Stage<br>
> > > > out:7<br>
> > > ><br>
> > > ><br>
> > > > Obviously the staging out of results fails and seems that the<br>
> > > > number<br>
> > > > of files in the stageout stage is causing the error. The<br>
> > > > application<br>
> > > > needs to stage out about 120 files.<br>
> > > ><br>
> > > ><br>
> > > > One solution I could quickly think of is to wrap the app in a<br>
> > > > shell<br>
> > > > and zip the outputs making it just one staged out file.<br>
> > > ><br>
> > > ><br>
> > > > However, the current setup would still be useful since we are<br>
> > > > trying<br>
> > > > to compare the existing Hadoop solution with the Swift one.<br>
> > > ><br>
> > > ><br>
> > > > Is there any possible workaround, some env setting or so that I<br>
> > > > could<br>
> > > > try and get the stageout going?<br>
> > > ><br>
> > > ><br>
> > > > The logs are:<br>
> > > > <a href="http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log" target="_blank">http://www.mcs.anl.gov/~ketan/mars-20120521-1443-d6q9lr0a.log</a><br>
> > > > and <a href="http://www.mcs.anl.gov/~ketan/workerlogs.tgz" target="_blank">http://www.mcs.anl.gov/~ketan/workerlogs.tgz</a><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > Regards, --<br>
> > > > Ketan<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > _______________________________________________<br>
> > > > Swift-user mailing list<br>
> > > > <a href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a><br>
> > > > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br>
> > ><br>
> > > --<br>
> > > Michael Wilde<br>
> > > Computation Institute, University of Chicago<br>
> > > Mathematics and Computer Science Division<br>
> > > Argonne National Laboratory<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > --<br>
> > > Ketan<br>
> ><br>
> > --<br>
> > Michael Wilde<br>
> > Computation Institute, University of Chicago<br>
> > Mathematics and Computer Science Division<br>
> > Argonne National Laboratory<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > Ketan<br>
><br>
> --<br>
> Michael Wilde<br>
> Computation Institute, University of Chicago<br>
> Mathematics and Computer Science Division<br>
> Argonne National Laboratory<br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> Ketan<br>
<br>
--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><font face="'courier new', monospace">Ketan</font><br><br><br>
</div></div>