[petsc-dev] PetscHashIJ scaling problem?

Jed Brown jedbrown at mcs.anl.gov
Sun Nov 10 13:13:25 CST 2013


Dmitry Karpeyev <dkarpeev at gmail.com> writes:

> On Fri, Nov 8, 2013 at 11:56 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>>   PetscHashICreate(table);
>>   for (i = 0; i < N; ++i) {
>>     for (j = 0; j < N; ++j) {
>>       PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j);
>>       khint_t ret,idx = kh_put(HASHI,table,key,&ret);
>>       if (ret == 1) kh_val(table,idx) = newp++;
>>     }
>>   }
>>   PetscHashIDestroy(table);
>>
>> we get numbers that are about twice as good (no weird chaining and only
>> one lookup):
>>
> One of the original use cases for PetscHash was pseudograph manipulation
> (multiple edges allowed),
> which required the 'multivalued' option that might slow things down
> unnecessarily in the simpler
> use cases.

How do you distinguish the multiple edges?  Perhaps that indicator
should become part of the key.  Alternatively, the hash value can be a
"set" data structure.  In any case, this is not the right interface for
a plain (i,j) -> value hash.

> PetscHash also tried to hide the idiosyncrasies of the khash API
> anticipating that it might have to be
> replaced at the backend with another hash table implementation. 

Okay, though in that case, the equivalent of kh_put() would just be
implemented in terms of the lookup followed by an add.  I don't see a
reason to pessimize around a useful function that is trivial to
implement in terms of other primitives.

> It's possible both the PetscHash API and the implementation need to be
> revised.  The API itself could probably be simplified somewhat.  If we
> were to stick to khash long-term, maybe then the PetscHash API should
> more directly reflect khash's preferred use pattern - put-test-val.
>
> Initially PetscHash was all macros, but then some of it was switched
> over to PETSC_STATIC_INLINE functions.  Even if these are being
> inlined by the compiler, there may be some call overhead, which may be
> avoidable.  

That should be removed, though not in debug mode.

> Besides, it may be more natural to have macros to wrap the use case in
> the snippet above.  And there are no error codes being returned from
> inside PetscHash at the moment.  That might have to change, however,
> if the bare realloc() calls in khash are replaced by something like
> PetscRealloc().

Upstream khash now has macros kcalloc, kmalloc, krealloc, and kfree,
which you can define before including the header.  If we have any other
portability changes, I would be inclined to contribute them upstream and
define the allocation routines so we can use our tracing malloc.  The
library is still developed upstream, e.g.,

  2013-05-02 (0.2.8):

       * Use quadratic probing. When the capacity is power of 2, stepping function
         i*(i+1)/2 guarantees to traverse each bucket. It is better than double
         hashing on cache performance and is more robust than linear probing.

         In theory, double hashing should be more robust than quadratic probing.
         However, my implementation is probably not for large hash tables, because
         the second hash function is closely tied to the first hash function,
         which reduce the effectiveness of double hashing.

       Reference: http://research.cs.vt.edu/AVresearch/hashing/quadratic.php

> If the implementation of PetscHash is to be replaced, it would be a
> good idea to do some representative testing of the proposed new
> implementation.  In particular, I don't have a good feel for how
> double hashing compares to linear probing or to chaining.  The fastest
> implementation above uses a smallish fixed-size table, so that might
> not be an entirely fair comparison with khash et al. I don't think the
> author claims the test to be comprehensive anyhow.

Agreed, it's not trying to be representative and the performance could
depend significantly on that specific workflow.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20131110/49d92cc1/attachment.sig>


More information about the petsc-dev mailing list