<div dir="ltr">Hi Tim,<div><br></div><div>I think I found the issue and got past it. In my C code, I forgot to close a file. Now in a new version the file gets closed after read. And this one seems to be scaling well. So far, on Vesta, I was able to scale to 10K processes on 625 nodes without any issue.</div>
<div><br></div><div>Thanks,</div><div>Ketan</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 14, 2014 at 8:45 PM, Tim Armstrong <span dir="ltr"><<a href="mailto:tim.g.armstrong@gmail.com" target="_blank">tim.g.armstrong@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div dir="ltr">
<div>It's hard to narrow it down from the info - that script seems fairly unlikely to cause problems.<br>
</div>
<div><br>
What optimisation level? STC/Turbine version? How many processes? How many ADLB servers? Is it every time you run or just intermittently?<br>
<br>
Can you confirm that it's not just getting stuck in the leaf function as well? E.g. log when it enters and exits.<br>
<br>
There is a rare race condition that can deadlock things that I'm just working on now, but it seems unlikely that you would be encountering that with that script.<br>
<br>
</div>
- Tim<br>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote"><div><div class="h5">On Mon, Apr 14, 2014 at 6:09 PM, Ketan Maheshwari <span dir="ltr">
<<a href="mailto:ketan@mcs.anl.gov" target="_blank">ketan@mcs.anl.gov</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
<div dir="ltr">Hi,
<div><br>
</div>
<div>Trying to scale up a simple leaf function on Vesta. It seems that the leaf function runs at max 259 times and beyond that either it does not return any results or crash, but I do not see any error messages or other indications either.</div>
<div><br>
</div>
<div>On Vesta, an example is at /home/ketan/turbine-output/2014/04/14/23/04/06</div>
<div><br>
</div>
<div>Any clue on this?</div>
<div><br>
</div>
<div>The Swift source looks as follows:</div>
<div><br>
</div>
<div>
<div>import io;</div>
<div><br>
</div>
<div>@dispatch=WORKER</div>
<div>(int v) leaf_main(string A[]) "leaf_main" "0.0" "leaf_main_wrap";</div>
<div>main</div>
<div>{</div>
<div> int rc[];</div>
<div> foreach i in [0:9999:1]{</div>
<div> rc[i] = leaf_main([fromint(i)]);</div>
<div> }</div>
<div>}</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Ketan</div>
</div>
<br></div></div>
_______________________________________________<br>
ExM-user mailing list<br>
<a href="mailto:ExM-user@lists.mcs.anl.gov" target="_blank">ExM-user@lists.mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/exm-user" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/exm-user</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote></div><br></div>