<div dir="ltr"><div>Awesome, glad to hear.<br><br></div><div>I feel like it would be nice to have some lightweight fork mechanism to be able to get some isolation of user processes for situations like yours, since user code doesn't always behave nicely...<br>
</div><div><br></div>- Tim<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Apr 15, 2014 at 9:22 AM, Ketan Maheshwari <span dir="ltr"><<a href="mailto:ketan@mcs.anl.gov" target="_blank">ketan@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Tim,<div><br></div><div>I think I found the issue and got past it. In my C code, I forgot to close a file. Now in a new version the file gets closed after read. And this one seems to be scaling well. So far, on Vesta, I was able to scale to 10K processes on 625 nodes without any issue.</div>
<div><br></div><div>Thanks,</div><div>Ketan</div></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div class="">On Mon, Apr 14, 2014 at 8:45 PM, Tim Armstrong <span dir="ltr"><<a href="mailto:tim.g.armstrong@gmail.com" target="_blank">tim.g.armstrong@gmail.com</a>></span> wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div class="">
<div dir="ltr">
<div>It's hard to narrow it down from the info - that script seems fairly unlikely to cause problems.<br>
</div>
<div><br>
What optimisation level? STC/Turbine version? How many processes? How many ADLB servers? Is it every time you run or just intermittently?<br>
<br>
Can you confirm that it's not just getting stuck in the leaf function as well? E.g. log when it enters and exits.<br>
<br>
There is a rare race condition that can deadlock things that I'm just working on now, but it seems unlikely that you would be encountering that with that script.<br>
<br>
</div>
- Tim<br>
</div>
</div><div class="gmail_extra"><br>
<br>
<div class="gmail_quote"><div class=""><div><div>On Mon, Apr 14, 2014 at 6:09 PM, Ketan Maheshwari <span dir="ltr">
<<a href="mailto:ketan@mcs.anl.gov" target="_blank">ketan@mcs.anl.gov</a>></span> wrote:<br>
</div></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div><div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>Trying to scale up a simple leaf function on Vesta. It seems that the leaf function runs at max 259 times and beyond that either it does not return any results or crash, but I do not see any error messages or other indications either.</div>
<div><br>
</div>
<div>On Vesta, an example is at /home/ketan/turbine-output/2014/04/14/23/04/06</div>
<div><br>
</div>
<div>Any clue on this?</div>
<div><br>
</div>
<div>The Swift source looks as follows:</div>
<div><br>
</div>
<div>
<div>import io;</div>
<div><br>
</div>
<div>@dispatch=WORKER</div>
<div>(int v) leaf_main(string A[]) "leaf_main" "0.0" "leaf_main_wrap";</div>
<div>main</div>
<div>{</div>
<div> int rc[];</div>
<div> foreach i in [0:9999:1]{</div>
<div> rc[i] = leaf_main([fromint(i)]);</div>
<div> }</div>
<div>}</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Ketan</div>
</div>
<br></div></div></div></div><div class="">
_______________________________________________<br>
ExM-user mailing list<br>
<a href="mailto:ExM-user@lists.mcs.anl.gov" target="_blank">ExM-user@lists.mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/exm-user" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/exm-user</a><br>
<br>
</div></blockquote>
</div>
<br>
</div>
</div>
</blockquote></div><br></div>
</blockquote></div><br></div>