[petsc-dev] PetscHashIJ scaling problem?

Fri Nov 8 11:56:33 CST 2013

Jed Brown <jedbrown at mcs.anl.gov> writes:
> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000                                                                           
> 0.870 real   0.857 user   0.010 sys   99.67 cpu
>
> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000                                                                           
> 2.762 real   2.690 user   0.073 sys   100.03 cpu
> 0.870 * 2^2 = 3.48
>
> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000                                                                           
> 5.176 real   4.963 user   0.213 sys   100.01 cpu
> 0.870 * 3^2 = 7.83
> 2.762 * (3/2)^2 = 4.59
>
> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000
> 8.183 real   7.893 user   0.280 sys   99.88 cpu
> 0.870 * 4^2 = 13.92
> 2.762 * (4/2)^2 = 7.63
> 5.176 * (4/3)^2 = 8.95
>
> This looks stable at about 2 million lookups and insertions per second
> (or 4M hash operations).

If we use the khash API the way it was intended (via kh_put and kh_val;
ignore the ugly API inconsistency)

  PetscHashICreate(table);
  for (i = 0; i < N; ++i) {
    for (j = 0; j < N; ++j) {
      PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j);
      khint_t ret,idx = kh_put(HASHI,table,key,&ret);
      if (ret == 1) kh_val(table,idx) = newp++;
    }
  }
  PetscHashIDestroy(table);

we get numbers that are about twice as good (no weird chaining and only
one lookup):

$ for n in 1000 2000 3000 4000; do time mpich-clang-optg/lib/ex26-obj/ex26 -N $n; done
0.462 real   0.453 user   0.007 sys   99.59 cpu
1.476 real   1.457 user   0.020 sys   100.01 cpu
2.791 real   2.767 user   0.023 sys   99.96 cpu
3.927 real   3.893 user   0.037 sys   100.06 cpu

The debug-mode performance is also tons better.

I see another hash table shootout by a good developer:

http://preshing.com/20110603/hash-table-performance-tests/

The fastest in his comparison uses simple chaining instead of double
hashing (which khash uses).  I don't have a direct comparison of the
two, and Jeff's version is hashing strings, so it would have to be
modified/simplified to compare directly.

Does your algorithm need lots better than 4M hash lookups per second, or
do we just need to optimize the current implementation?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20131108/66aa6757/attachment.sig>