[mpich-discuss] [announce] corefilter tool
saffroy at gmail.com
Sat Dec 6 03:57:43 CST 2008
This is rather off-topic on this list, but I think some people on this
list, in particular HPC developers who struggle with huge core files on
Linux cluster, might find this tool useful:
Below is a copy of the README. Enjoy!
saffroy at gmail.com
The goal of corefilter is to limit the size of core dumps on Linux,
while preserving some usefulness for debugging purposes.
The problem we're trying to solve is as follows: sometimes one does
not want to let their program produce full sized core dumps, ie. what
you get with "ulimit -c unlimited" (in bash shell).
There can be several reasons; one I encountered was that when you have
a large parallel program crashing, it can result in tens (or hundreds)
of computing nodes writing hundreds of gigabytes to a file server
with a single gigabit pipe, and that can take a long time to
complete. And on a cluster running this kind of program, time costs a
lot! So when your program crashes, whenever possible you want just
enough information to debug your program. Some programs produce a
simple stack trace: it's better than nothing, but can be hard to use,
and many programs don't have such facilities.
So, if the core dump is too large, why not simply use "ulimit -c" with
a certain fixed size? In theory, that's exactly what we want; but in
practice, the resulting core dumps cannot be used for debugging!
That's because of how the core file limit is used: the kernel will
simply write process memory pages to the core dump file in order,
until the limit is reached, and then just stop. The end result is that
the core dump usually does not contain the stack! And some other
valuable information about loaded libraries is lost as well.
Enter core_pattern: this pseudo-file (under /proc/sys/kernel) gives
the administrator some control over the generation of core dumps. Of
particular interest is the feature described in this post:
Simply said, a user program can be piped each and every core dump
produced on the system, and write out whatever it wants, wherever it
And the corefilter tool is one such program, with the specific goal of
reducing the size of core dumps in a smarter way, which was inspired
by the coredumper project:
As mentioned above, corefilter is meant to be used through
core_pattern, typically the administrator will do:
# gcc -Wall -O corefilter.c -o corefilter
# cp corefilter /bin/
# echo '|/bin/corefilter -c %c -p %p -g %g -u %u' > \
(and later on, this can be made permanent in sysctl.conf)
So, what happens now? Every time a program crashes and the kernel
decides a core dump should be written, corefilter is invoked, and it
finds the full core dump written to its standard input (courtesy of
the kernel). Then corefilter takes into account the user's core file
limit for the crashing program (remember limits are per-process, and
inherited), and writes (approximately) the same amount of data to a
core file in the program's current working directory, said data being
a carefully chosen subset of the full core dump read on stdin.
In other words, once installed, corefilter "fixes" the way core dumps
are generated by the kernel when the user sets a limit to their size.
This program is free software, distributed under the terms of the
GNU General Public License version 2.
Jean-Marc Saffroy <saffroy at gmail.com>
More information about the mpich-discuss