<div class="gmail_quote">On Wed, Jan 25, 2012 at 13:20, Dominik Szczerba <span dir="ltr"><<a href="mailto:dominik@itis.ethz.ch">dominik@itis.ethz.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":17f">I create it to partition my input unstructured mesh, then I destroy<br>
it. It is so separated from the rest of the code, that I could even<br>
delegate it to a separate program.<br>
But then, as shown in the other thread, I somehow get an assertion<br>
fail from parmetis much later in the code, when creating solver<br>
contexts. Why is this happening if the MatPartitioning object was<br>
destroyed?</div></blockquote></div><br><div>Dominik, you've been at this game for a while now, so you should be familiar with your tools. In any software project, when you encounter an error, you should ask the question "how did I get here?" With languages that have a call stack, the first step to answering that question is to look at the stack trace. There is a well-known tool for producing stack traces and related tasks, it's called a debugger.</div>
<div><br></div><div>Now PETSc noticed early on that teaching every user to use a debugger (and dealing with the issues on clusters) is too much work, consequently, PETSc manages its own call stack so that it can produce stack traces automatically. Third-party packages usually don't do this, so they usually just return error codes. Asserts are useful as a developer because it's very little typing and the debugger catches them, but they are horrible in production because the program just exits without telling you why there was an error. If the programmer was nice, they would have placed a comment at that line of code explaining what might cause the assertion to fail, but you still have to dig up the source code and go to that line, and many programmers aren't that careful.</div>
<div><br></div><div>Of course failing asserts don't usually forcefully exit, they call abort() which raises SIGABRT and, as it turns out, you can catch SIGABRT (though you cannot block it). However, SIGABRT is the same signal that is usually used to instruct a program to dump core, so in order to not clutter the ability to get core, PETSc does not catch SIGABRT. (At least I think this is the rationale, perhaps it should be reconsidered.) Consequently, PETSc doesn't automatically give stack traces when third-party libraries call abort().</div>
<div><br></div><div>Note that the only thing worse than calling assert()/abort() from a library is to call exit() in an error condition, and unfortunately, this is more common than you might thing. Again, PETSc could register a callback with atexit(), but this would interfere with users' ability to exit intentionally (before calling PetscFinalize()) without seeing a PETSc error message. In any case, the current decision was not to use atexit() either.</div>
<div><br></div><div>What this all adds up to for you is that you should use a debugger to get a stack trace if you want to find out how you reached a failed assert() in some third-party package. You know that PETSc doesn't require that package, so it's not using it by default for anything. Presumably, you also know that "somehow get an assertion fail from parmetis much later in the code, when creating solver contexts" is pretty vague. How did you determine that it was when creating solver contexts? Even if that claim is correct, are we expected to guess what kind of solvers you have and how they are configured, such that a partitioner might be called? You have to help us help you.</div>
<div><br></div><div>So don't be afraid to use the debugger. Read enough of the documentation that you can fluently work with conditional breakpoints and with watchpoints (both values and locations). Other times, use valgrind with --db-attach=yes to get the debugger to the correct place. I can understand that it looks overwhelming if you are starting out, but if you have made it through your first year of serious development and aren't familiar with these tools yet, you have already lost more time than it takes to learn the tools.</div>
<div><br></div><div>When developing new code, have a hard rule that there always be a run-time parameter to change the problem size, and always make the smallest problem size something that runs in less than 10 seconds without optimization. I have occasionally ended up in circumstances where these rules were not followed and I have regretted it every time. Similarly, always develop code in a friendly development environment. To me, that means that debuggers and valgrind must work, disk latency must be low enough that I don't notice it in Emacs, and the source code has to be indexed so that I can move around quickly. It also means that the build system has to be fast, waiting more than a few seconds for compilation when you make a simple change to a C file is unacceptable.</div>
<div><br></div><div>If you follow these guidelines, I think you will end up answering your own questions in less time than it takes to ask them, and when find that you need to ask, you will have plenty of relevant information that hopefully we won't need several rounds of email ping-pong to get oriented.</div>