[petsc-users] question about partitioning usage

Wed Jan 25 14:55:40 CST 2012

Jed, you are generally right, although I do not capture the context of
your remarks in this very case.
Maybe the context you are missing is that on my own linux box things
run smoothly - I start getting problems on platforms like Windows or
Cray, where I can not run a debugger (easily).
Regarding parmetis assert fail, I have run the failing code in the
debugger and submitted the stack trace, as reported previously.
The question here was where is Petsc using parmetis internally
elsewhere than on my explicit call, and why still after having
destroyed the MatPartitioning object. I can of course find it all out
myself in a few days by studying the code or using a debugger - I
thought about such internal issue I'd just ask.

Thanks for your understanding and patience,
Dominik

On Wed, Jan 25, 2012 at 9:41 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Wed, Jan 25, 2012 at 13:20, Dominik Szczerba <dominik at itis.ethz.ch>
> wrote:
>>
>> I create it to partition my input unstructured mesh, then I destroy
>> it. It is so separated from the rest of the code, that I could even
>> delegate it to a separate program.
>> But then, as shown in the other thread, I somehow get an assertion
>> fail from parmetis much later in the code, when creating solver
>> contexts. Why is this happening if the MatPartitioning object was
>> destroyed?
>
>
> Dominik, you've been at this game for a while now, so you should be familiar
> with your tools. In any software project, when you encounter an error, you
> should ask the question "how did I get here?" With languages that have a
> call stack, the first step to answering that question is to look at the
> stack trace. There is a well-known tool for producing stack traces and
> related tasks, it's called a debugger.
>
> Now PETSc noticed early on that teaching every user to use a debugger (and
> dealing with the issues on clusters) is too much work, consequently, PETSc
> manages its own call stack so that it can produce stack traces
> automatically. Third-party packages usually don't do this, so they usually
> just return error codes. Asserts are useful as a developer because it's very
> little typing and the debugger catches them, but they are horrible in
> production because the program just exits without telling you why there was
> an error. If the programmer was nice, they would have placed a comment at
> that line of code explaining what might cause the assertion to fail, but you
> still have to dig up the source code and go to that line, and many
> programmers aren't that careful.
>
> Of course failing asserts don't usually forcefully exit, they call abort()
> which raises SIGABRT and, as it turns out, you can catch SIGABRT (though you
> cannot block it). However, SIGABRT is the same signal that is usually used
> to instruct a program to dump core, so in order to not clutter the ability
> to get core, PETSc does not catch SIGABRT. (At least I think this is the
> rationale, perhaps it should be reconsidered.) Consequently, PETSc doesn't
> automatically give stack traces when third-party libraries call abort().
>
> Note that the only thing worse than calling assert()/abort() from a library
> is to call exit() in an error condition, and unfortunately, this is more
> common than you might thing. Again, PETSc could register a callback with
> atexit(), but this would interfere with users' ability to exit intentionally
> (before calling PetscFinalize()) without seeing a PETSc error message. In
> any case, the current decision was not to use atexit() either.
>
> What this all adds up to for you is that you should use a debugger to get a
> stack trace if you want to find out how you reached a failed assert() in
> some third-party package. You know that PETSc doesn't require that package,
> so it's not using it by default for anything. Presumably, you also know that
> "somehow get an assertion fail from parmetis much later in the code, when
> creating solver contexts" is pretty vague. How did you determine that it was
> when creating solver contexts? Even if that claim is correct, are we
> expected to guess what kind of solvers you have and how they are configured,
> such that a partitioner might be called? You have to help us help you.
>
> So don't be afraid to use the debugger. Read enough of the documentation
> that you can fluently work with conditional breakpoints and with watchpoints
> (both values and locations). Other times, use valgrind with --db-attach=yes
> to get the debugger to the correct place. I can understand that it looks
> overwhelming if you are starting out, but if you have made it through your
> first year of serious development and aren't familiar with these tools yet,
> you have already lost more time than it takes to learn the tools.
>
> When developing new code, have a hard rule that there always be a run-time
> parameter to change the problem size, and always make the smallest problem
> size something that runs in less than 10 seconds without optimization. I
> have occasionally ended up in circumstances where these rules were not
> followed and I have regretted it every time. Similarly, always develop code
> in a friendly development environment. To me, that means that debuggers and
> valgrind must work, disk latency must be low enough that I don't notice it
> in Emacs, and the source code has to be indexed so that I can move around
> quickly. It also means that the build system has to be fast, waiting more
> than a few seconds for compilation when you make a simple change to a C file
> is unacceptable.
>
> If you follow these guidelines, I think you will end up answering your own
> questions in less time than it takes to ask them, and when find that you
> need to ask, you will have plenty of relevant information that hopefully we
> won't need several rounds of email ping-pong to get oriented.