[Swift-devel] Tracking swift heap overflow culprit

Thu Jan 30 19:02:37 CST 2014

(Moving this to my MCS email addr and to swift-devel)

Mihael, what we are trying to do here is not (initially) change anything in
Swift memory usage.

We just want to understand the costs in memory of normal Swift operations,
eg, call a function with N args and M returns; map a file; create an array
of 1000 1MB strings; etc.

Then, for any program execution, we want to be able to trace - at some
useful level of granularity - the consumption of Java memory caused by
these normal Swift activities.

For example, if a user writes a function that is going to create - and hold
- 10MB of memory, due to its local variables, then having 10,000 of those
active at once would consume - and hold - 100GB of RAM.

My suspicion is that this is exactly what e.g. Sheri's code is doing.  Any
I further suspect that once we identify what procedures are using most of
the memory in what way, then we can tune the user code to use much less
memory,

We can - by experiment - develop a cost table for common Swift operations.
But Sheri's code is the most complex Swift scripts that exist. Each has a
few K lines of swift code.  WIthout some auomated memory usage stats that
correlate mem consumption to source code, it will be hard to find the
culprits.

So the question is not how to make Swift use less memor (although thats
always desirable), but rather first just to create the tools to know how
much a give program run uses for what.

Can you suggest affordable ways to get this info?

Thanks,

- Mike

On Thu, Jan 30, 2014 at 5:22 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Weeeellll,
>
> Stuff eats memory. Some stuff can be made to take less memory, some
> stuff can be made to take less memory at the expense of performance, and
> some stuff just needs to be there. And then once in a while there's
> stuff that doesn't need to be there at all.
>
> I somewhat routinely have to deal with the first two. It's not an easy
> problem, because it only becomes obvious what eats memory at large
> scales when you actually have a large scale run, and that's difficult to
> analyze both because of technicalities (such as it takes lots of ram to
> analyze things) and because it's hard to distinguish signal from noise
> when there's a lot of stuff. But, again, that's something I generally
> keep in mind with every commit.
>
> It is however, mostly attributable to design choices. We sacrificed
> scalabiilty for convenience initially, because juggling with concurrency
> was difficult, and the scales we were looking at were generally pretty
> small. Things change though.
>
> There's the last possibility also. And that is that we have a situation
> that doesn't normally occur and shouldn't occur that is probably a bug
> and that happened this once. If that's the case we should find and fix
> that. So, is that the case?
>
> Mihael
>
>
> On Thu, 2014-01-30 at 17:06 -0600, Yadu Nand wrote:
> > Hi Mike, Mihael
> >
> > I talked to Mihael about the RAM issue and he said that having heap
> > dumps can help but he wasn't sure if that alone is sufficient to pin
> > point what is using memory excessively.
> >
> > Here's what I did :
> > * Force the apps to dump the heap and analyse it offline with jhat.
> >   I've used jhat on one such dump from a memory stress test. If the
> > dump is very large
> >  the user could just start jhat which starts a webserver on port 7000
> > which we can access.
> >
> > Here's one of the dump analysis from jhat :
> > http://swift.rcc.uchicago.edu:7000/histo/
> > http://swift.rcc.uchicago.edu:7000/showInstanceCounts/includePlatform/
> >
> > * jmap can be used to get maps of the jvm while it is running : Here's
> > a snap on a stress run with 10^6 + 1 ints held in a swift array:
> >
> > [yadunand at midway001 data_stress]$ jmap -histo:live 31135 | head -n 10
> >
> >  num     #instances         #bytes  class name
> > ----------------------------------------------
> >    1:       1000001       56000056  org.griphyn.vdl.mapping.DataNode
> >    2:       1030601       32979232  java.util.HashMap$Entry
> >    3:       2014652       32234432  java.lang.Integer
> >    4:       1000015       24000360  org.griphyn.vdl.type.impl.FieldImpl
> >    5:         14672        4903992  [Ljava.util.HashMap$Entry;
> >    6:         29872        4149184  <constMethodKlass>
> >    7:         29872        3831968  <methodKlass>
> >
> > These together with the live heap tracking commit from Mihael should
> > be able to give a better picture of what is going on with the user's
> > run. This again would require the user to run an extra script.
> >
> > As for Sheri's case there was a core dump, and if we could get her to
> > run jhat on her side, I think that would open up some extra detail
> > into what is consuming the memory.
> >
> > Please let me know if this might be something that is worth a shot.
> >
> > Thanks,
> > Yadu
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140130/cf94f495/attachment.html>