[ExM Users] scaling Turbine on Vesta
Ketan Maheshwari
ketan at mcs.anl.gov
Tue Apr 15 09:22:37 CDT 2014
Hi Tim,
I think I found the issue and got past it. In my C code, I forgot to close
a file. Now in a new version the file gets closed after read. And this one
seems to be scaling well. So far, on Vesta, I was able to scale to 10K
processes on 625 nodes without any issue.
Thanks,
Ketan
On Mon, Apr 14, 2014 at 8:45 PM, Tim Armstrong <tim.g.armstrong at gmail.com>wrote:
> It's hard to narrow it down from the info - that script seems fairly
> unlikely to cause problems.
>
> What optimisation level? STC/Turbine version? How many processes? How
> many ADLB servers? Is it every time you run or just intermittently?
>
> Can you confirm that it's not just getting stuck in the leaf function as
> well? E.g. log when it enters and exits.
>
> There is a rare race condition that can deadlock things that I'm just
> working on now, but it seems unlikely that you would be encountering that
> with that script.
>
> - Tim
>
>
> On Mon, Apr 14, 2014 at 6:09 PM, Ketan Maheshwari <ketan at mcs.anl.gov>wrote:
>
>> Hi,
>>
>> Trying to scale up a simple leaf function on Vesta. It seems that the
>> leaf function runs at max 259 times and beyond that either it does not
>> return any results or crash, but I do not see any error messages or other
>> indications either.
>>
>> On Vesta, an example is
>> at /home/ketan/turbine-output/2014/04/14/23/04/06
>>
>> Any clue on this?
>>
>> The Swift source looks as follows:
>>
>> import io;
>>
>> @dispatch=WORKER
>> (int v) leaf_main(string A[]) "leaf_main" "0.0" "leaf_main_wrap";
>> main
>> {
>> int rc[];
>> foreach i in [0:9999:1]{
>> rc[i] = leaf_main([fromint(i)]);
>> }
>> }
>>
>>
>> Thanks,
>> Ketan
>>
>> _______________________________________________
>> ExM-user mailing list
>> ExM-user at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140415/26614fdd/attachment-0001.html>
More information about the ExM-user
mailing list