[petsc-dev] use of hash table vs array in various places in PETSc

Mon Sep 19 10:31:32 CDT 2011

On Sep 19, 2011, at 9:03 AM, Jed Brown wrote:

> On Mon, Sep 19, 2011 at 14:43, Barry Smith <bsmith at mcs.anl.gov> wrote:
>   In our case of MatGetSubMatrices() there are two main use cases.
> 
>   1) something like 80+ percent of the indices are non-null, maybe even 99 percent. Clearly an array makes sense here and there is no reason to use a hash table
> 
>    2) something like at most for large problems a couple percent of the indices are non-null. An array is not scalable for large problems.
> 
> Can we declare the expected sizes and make an automatic choice, perhaps with manual override?

  Yes, of course, that was my plan.

>  
> 
>    3) have two different (but almost identical routines), one that is hardwired with the array lookup one hardwired with hash table look up (yuck)
> 
>   In order for the (1) case to be efficient (that is many lookups are done and are successful) I don't want to have "two" implementations handled by the usual virtual function business (too much overhead is introduced)
> 
> It is not necessarily the case that an if statement is less overhead than a virtual function. It depends whether the compiler decides to inline the whole thing and which branch of the if statement is taken. (The if statement implicitly makes a choice about which branch will be preferred by which jumps it inserts in the assembly. The virtual function is more symmetric. Now _PETSc_ virtual functions with string lookup are way expensive, and PetscFunctionBegin is significant when running in debug mode.

   Of course we would not have a string lookup but it would be the object->function(...) type of virtual function we use; I don't see how any compiler would ever be able to inline this; it seems to me that this might trash the instruction cache or perhaps not, if it doesn't trash the instruction cache then maybe it would not be anymore expensive then the if () test? But it does have to push the arguments on the stack before the function call and clean them off afterwards (though presumable cleaning them off is basically free) or will it be smart enough to use registers for the arguments (but then would it back up the previous values in the registers which takes time, or would it ..... or would it ..... . Hmm so maybe just using the two classes/virtual function approach is fine and I shouldn't do the uglier if () stuff??? My gut feeling (which is likely wrong) is that replacing an array access with a function call is always a bad idea :-(. 

   Barry

>