Now trying with cdm. My cdm policy file contains a single line as follows:<div><br></div><div>rule .* DEFAULT /</div><div><br></div><div>This seems to be working at stage in because I immediately see my jobs starting. However, it fails immediately after with a message:</div>
<div><div>"Execution failed:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>The following output files were not created by the application:"</div></div><div><br></div><div>Followed by a list of outputs. I recall this could happen if absolute pathnames are not provided, so I updated my mappers.sh scripts with absolute pathnames including a double // in the beginning without success.</div>
<div><br></div><div>The run log do not show any specific indicators of error other than the above message.</div><div><br></div><div>I see a bunch of CDM_POLICY CDM_ACTION lines in the wrapper.log in one of the many jobdirs as follows:</div>
<div><div>CDM_POLICY: /home/train07/ketan_mars/swift/result52/mars.ot48 -> DEFAULT /</div><div>CDM_ACTION: /home/train07/ketan_mars/swift/swift.workdir/mars-20121023-1240-vbptd8i9/jobs/g/mars-gtln0yzk OUTPUT /home/train07/ketan_mars/swift/result52/mars.ot48 DEFAULT /</div>
</div><div><br></div><div>Not sure if something could've gone wrong here.<br><br>Attaching the log file and one of the job dirs.<br><br>Regards,</div><div>Ketan</div><div><br><div class="gmail_quote">On Tue, Oct 23, 2012 at 3:02 PM, Ketan Maheshwari <span dir="ltr"><<a href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Mike,<div><br></div><div>Thank you for your answers.</div><div><br></div><div>I tried catsnsleep with n=100 and s=10 and indeed the number of parallel jobs corresponded to the jobthrottle value. </div>
<div>Surprisingly, when I started the mars application immediately after this, it also started 32 jobs in parallel. However, the run failed with "too many open files" error after a while.</div>
<div><br></div><div>Now, I am trying cdm method. Will keep you posted.<div><div class="h5"><br><br><div class="gmail_quote">On Tue, Oct 23, 2012 at 2:36 PM, Michael Wilde <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ketan, looking further I see that your app has a large number of output files, O(100). Depending on their size, and the speed of the filesystem on which you are testing, that re-inforces my suspicion that low concurrency you are seeing is due to staging IO.<br>
<br>
If this is a local 32-core host, try running with your input and output data and workdirectory all on a local hard disk (or even /dev/shm if it has sufficient RAM/space). Then try using CDM direct as explained at:<br>
<br>
<a href="http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_specific_use_cases" target="_blank">http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_specific_use_cases</a><br>
<div><br>
- Mike<br>
<br>
----- Original Message -----<br>
</div><div><div>> From: "Michael Wilde" <<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>><br>
> To: "Ketan Maheshwari" <<a href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>><br>
> Cc: "Swift Devel" <<a href="mailto:swift-devel@ci.uchicago.edu" target="_blank">swift-devel@ci.uchicago.edu</a>><br>
> Sent: Tuesday, October 23, 2012 12:23:34 PM<br>
> Subject: Re: [Swift-devel] jobthrottle value does not correspond to number of parallel jobs on local provider<br>
> Hi Ketan,<br>
><br>
> In the log you attached I see this:<br>
><br>
> <profile key="jobThrottle" namespace="karajan">0.10</profile><br>
> <profile namespace="karajan" key="initialScore">100000</profile><br>
><br>
> You should leave initialScore constant, and set to a large number, no<br>
> matter what level of manual throttling you want to specify via<br>
> sites.xml. We always use 10000 for this value. Don't attempt to vary<br>
> the initialScore value for manual throttle: just use jobThrottle to<br>
> set what you want.<br>
><br>
> A jobThrottle value of 0.10 should run 11 jobs in parallel<br>
> (jobThrottle * 100) + 1 (for historical reasons related to the<br>
> automatic throttling algorithm).<br>
><br>
> If you are seeing less than that, one common cause is that the ratio<br>
> of your input staging times to your job run times is so high as to<br>
> make it impossible for Swift to keep the expected/desired number of<br>
> jobs in active state at once.<br>
><br>
> I suggest you test the throttle behavior with a simple app script like<br>
> "catsnsleep" (catsn with an artificial sleep to increase job<br>
> duration). If your settings (sites + cf) work for that test, then they<br>
> should work for the real app, within the staging constraints. Using<br>
> CDM "direct" mode is likely what you want here to eliminate<br>
> unnecessary staging on a local cluster.<br>
><br>
> In your test, what was this ratio? Can you also post your cf file and<br>
> the progress log from stdout/stderr?<br>
><br>
> - Mike<br>
><br>
> ----- Original Message -----<br>
> > From: "Ketan Maheshwari" <<a href="mailto:ketancmaheshwari@gmail.com" target="_blank">ketancmaheshwari@gmail.com</a>><br>
> > To: "Swift Devel" <<a href="mailto:swift-devel@ci.uchicago.edu" target="_blank">swift-devel@ci.uchicago.edu</a>><br>
> > Sent: Tuesday, October 23, 2012 10:34:25 AM<br>
> > Subject: [Swift-devel] jobthrottle value does not correspond to<br>
> > number of parallel jobs on local provider<br>
> > Hi,<br>
> ><br>
> ><br>
> > I am trying to run an experiment on a 32-core machine with the hope<br>
> > of<br>
> > running 8, 16, 24 and 32 jobs in parallel. I am trying to control<br>
> > these numbers of parallel jobs by setting the Karajan jobthrottle<br>
> > values in sites.xml to 0.07, 0.15, and so on.<br>
> ><br>
> ><br>
> > However, it seems that the values are not corresponding to what I<br>
> > see<br>
> > in the Swift progress text.<br>
> ><br>
> ><br>
> > Initially, when I set jobthrottle to 0.07, only 2 jobs started in<br>
> > parallel. Then I added the line setting "Initialscore" value to<br>
> > 10000,<br>
> > which improved the jobs to 5. After this a 10-fold increase in<br>
> > "initialscore" did not improve the jobs count.<br>
> ><br>
> ><br>
> > Furthermore, a new batch of 5 jobs get started only when *all* jobs<br>
> > from the old batch are over as opposed to a continuous supply of<br>
> > jobs<br>
> > from "site selection" to "stage out" state which happens in the case<br>
> > of coaster and other providers.<br>
> ><br>
> ><br>
> > The behavior is same in Swift 0.93.1 and latest trunk.<br>
> ><br>
> ><br>
> ><br>
> > Thank you for any clues on how to set the expected number of<br>
> > parallel<br>
> > jobs to these values.<br>
> ><br>
> ><br>
> > Please find attached one such log of this run.<br>
> > Thanks, --<br>
> > Ketan<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Swift-devel mailing list<br>
> > <a href="mailto:Swift-devel@ci.uchicago.edu" target="_blank">Swift-devel@ci.uchicago.edu</a><br>
> > <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>
><br>
> --<br>
> Michael Wilde<br>
> Computation Institute, University of Chicago<br>
> Mathematics and Computer Science Division<br>
> Argonne National Laboratory<br>
><br>
> _______________________________________________<br>
> Swift-devel mailing list<br>
> <a href="mailto:Swift-devel@ci.uchicago.edu" target="_blank">Swift-devel@ci.uchicago.edu</a><br>
> <a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>
<br>
--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><font face="'courier new', monospace">Ketan</font><br><br><br>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><font face="'courier new', monospace">Ketan</font><br><br><br>
</div>