[Swift-devel] Progress on Swift RAM usage problem?

David Kelly davidkelly at uchicago.edu
Fri Feb 7 10:38:46 CST 2014


For those interested in this problem, here is the latest heap plot of
Jason's long (and still running) Beagle job.


On Mon, Feb 3, 2014 at 3:29 AM, David Kelly <davidkelly at uchicago.edu> wrote:

> Hello,
>
> I've spent the weekend working on the popdiagts script. I looked around on
> Geyser's filesystem and was able to find some input files that I can use.
> Once I found the data and got the 39 arguments correct, I was able to
> reproduce the problem.
>
> I see a result that looks very similar to the initial report:
>
> Progress:  time: Mon, 03 Feb 2014 01:20:00 -0700  Active:1  Finished
> successfully:3
>
> /glade/u/home/davkelly/swift-0.94/cog/modules/swift/dist/swift-svn/bin/swift:
> line 177: 31567 Killed                  java -Xmx8096M
> -XX:+HeapDumpOnOutOfMemoryError
> -Djava.endorsed.dirs=/glade/u/home/davkelly/swift-0.94/cog/modules/s...
> To start, I ran Swift with the default of 1G heap size and within a few
> minutes I was able to see Swift being killed. A heap plot of a failing run:
>
> http://web.ci.uchicago.edu/~davidk/popdiagts-20140201-1458-i9hmaf0e.png
>
> I tried bumping up the max heap size, but I ran into the same problem
> within a few minutes. The amount of memory used never seems to get very
> high. Here is a plot with 8G:
>
> http://web.ci.uchicago.edu/~davidk/popdiagts-20140203-0059-g6a11m24.png
>
> I used jmap to generate several heap dumps during the run. They are about
> 100MB compressed, 400MB uncompressed, located at:
>
> http://web.ci.uchicago.edu/~davidk/heap1.gz
> http://web.ci.uchicago.edu/~davidk/heap2.gz
> http://web.ci.uchicago.edu/~davidk/heap3.gz
> http://web.ci.uchicago.edu/~davidk/heap4.gz
> http://web.ci.uchicago.edu/~davidk/heap5.gz
> http://web.ci.uchicago.edu/~davidk/heap6.gz
> http://web.ci.uchicago.edu/~davidk/heap7.gz
> http://web.ci.uchicago.edu/~davidk/heap8.gz
> http://web.ci.uchicago.edu/~davidk/heap9.gz
> http://web.ci.uchicago.edu/~davidk/heap10.gz
> http://web.ci.uchicago.edu/~davidk/heap11.gz
>
> I used Eclipse Memory Analyzer to look at the heaps. You can view an html
> histogram of the objects at:
>
> http://web.ci.uchicago.edu/~davidk/heap-histogram/index.html
>
> It's possible that there was a sudden spike in memory at the end that the
> logs missed, but I don't think that's what's going on here.
>
> As I was running the script, I opened top and saw the Swift CPU usage on
> the Geyser head node get extremely high, up to 700%. I think it's getting
> killed due to a kernel CPU throttle.
>
> I went through the script line by line until I could narrow down where the
> problem was. I whittling away at it until I could get a small, readable,
> and data-independent test script that shows the problem.
>
> Here it is:
> ----
> type file;
> app (file out) createFile() {
>    createFile @filename(out);
> }
>
> app (file out) createFileGivenArray (file fileArray[]) {
>    createFile @filename(out);
> }
>
> file myArray[];
> file myFile;
>
> foreach f,i in [1:2] {
>    myArray[i] = createFile();
> }
>
> myFile = createFileGivenArray(myArray);
> -----
>
> On Midway you'll see the CPU usage on the head node jump to about 200%
> while the first app runs. If you repeat that pattern many times (like the
> original script does) you'll see CPU usage go even higher.
>
> I've filed this as Bug 1195 (
> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1195 ). The package to
> reproduce this is at http://web.ci.uchicago.edu/~davidk/popdiag.tar.gz.
>
>
>
> On Tue, Jan 28, 2014 at 3:29 PM, Wilde, Michael J. <wilde at mcs.anl.gov>wrote:
>
>>   *From:*
>> David Kelly [davidkelly at uchicago.edu]
>>  *Sent:*
>> Tuesday, January 28, 2014 2:47 PM
>>  *...*
>>   I don't have too many updates on Sheri's problem. I was able to run
>> the older standalone example I had on Geyser and did not see any issues
>> with excessive amounts of resident memory being used.
>>   ...
>>
>>
>>  I think the failure was exceeding the Java heap size, not an RSS
>> problem, right?
>>
>>     I think we might be better off shifting the way we approach this
>> problem. It's difficult to run these apps, and to run them in the same way
>> the users do. There's also a long delay getting responses. I think we'd be
>> better off focusing on adding comprehensive memory tests to the test suite,
>> measuring, plotting, and then documenting solutions/strategies into the
>> user guide. It will take some time, but I think it's the best approach
>> since everything would be under our own control, and it would provide
>> solutions for all users.
>>
>>     That sounds good, while we are waiting for debugging info from
>> users. But we should still strive to reproduce problems that users are
>> encountering, and on giving them code updates with additional debugging
>> hooks or possible remedies to test.
>>
>>  - Mike
>>
>>
>> On Tue, Jan 28, 2014 at 12:36 PM, Wilde, Michael J. <wilde at mcs.anl.gov>wrote:
>>
>>>  Yadu, David, can you send updates on this to Swift devel, and lets
>>> talk this afternoon at 3PM to discuss?
>>>
>>>  Thanks,
>>>
>>>    - Mike
>>>
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140207/ccaa7637/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heap-plot.png
Type: image/png
Size: 12661 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140207/ccaa7637/attachment.png>


More information about the Swift-devel mailing list