<div dir="ltr">On Fri, Nov 8, 2013 at 11:56 AM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">Jed Brown <<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>> writes:<br>


> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000<br>

> 0.870 real   0.857 user   0.010 sys   99.67 cpu<br>

><br>

> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000<br>

> 2.762 real   2.690 user   0.073 sys   100.03 cpu<br>

> 0.870 * 2^2 = 3.48<br>

><br>

> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000<br>

> 5.176 real   4.963 user   0.213 sys   100.01 cpu<br>

> 0.870 * 3^2 = 7.83<br>

> 2.762 * (3/2)^2 = 4.59<br>

><br>

> $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000<br>

> 8.183 real   7.893 user   0.280 sys   99.88 cpu<br>

> 0.870 * 4^2 = 13.92<br>

> 2.762 * (4/2)^2 = 7.63<br>

> 5.176 * (4/3)^2 = 8.95<br>

><br>

> This looks stable at about 2 million lookups and insertions per second<br>

> (or 4M hash operations).<br>

<br>

</div>If we use the khash API the way it was intended (via kh_put and kh_val;<br>

ignore the ugly API inconsistency)<br>

<br>

  PetscHashICreate(table);<br>

  for (i = 0; i < N; ++i) {<br>

    for (j = 0; j < N; ++j) {<br>

      PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j);<br>

      khint_t ret,idx = kh_put(HASHI,table,key,&ret);<br>

      if (ret == 1) kh_val(table,idx) = newp++;<br>

    }<br>

  }<br>

  PetscHashIDestroy(table);<br>

<br>

we get numbers that are about twice as good (no weird chaining and only<br>

one lookup):<br>

<br>

$ for n in 1000 2000 3000 4000; do time mpich-clang-optg/lib/ex26-obj/ex26 -N $n; done<br>

0.462 real   0.453 user   0.007 sys   99.59 cpu<br>

1.476 real   1.457 user   0.020 sys   100.01 cpu<br>

2.791 real   2.767 user   0.023 sys   99.96 cpu<br>

3.927 real   3.893 user   0.037 sys   100.06 cpu<br>

<br>

The debug-mode performance is also tons better.<br>

<br>

I see another hash table shootout by a good developer:<br>

<br>

<a href="http://preshing.com/20110603/hash-table-performance-tests/" target="_blank">http://preshing.com/20110603/hash-table-performance-tests/</a><br>

<br>

The fastest in his comparison uses simple chaining instead of double<br>

hashing (which khash uses).  I don't have a direct comparison of the<br>

two, and Jeff's version is hashing strings, so it would have to be<br>

modified/simplified to compare directly.<br>

<br>

Does your algorithm need lots better than 4M hash lookups per second, or<br>

do we just need to optimize the current implementation?<br>

</blockquote></div><br>Okay, here is the clearly quadratic performance, and its in next. Build SNES ex12 and run using</div><div class="gmail_extra"><br></div><div class="gmail_extra">/PETSc3/petsc/petsc-dev/arch-c-opencl-opt-next/lib/ex12-obj/ex12 -run_type perf -refinement_limit 0.00000625 -variable_coefficient field -petscspace_order 1 -mat_petscspace_order 0 -show_initial 0 -show_solution 0 -petscfe_type basic -log_summary -interpolate</div>

<div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">-refinement_limit 0.000625     DMPlexInterpolate       4 1.0 1.7443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 12  0  0  0 24  12  0  0  0 24     0</div>

<div class="gmail_extra">-refinement_limit 0.0003125    DMPlexInterpolate       4 1.0 3.3111e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 13  0  0  0 24  13  0  0  0 24     0</div><div class="gmail_extra">-refinement_limit 0.0000625    DMPlexInterpolate       4 1.0 2.3465e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 21  0  0  0 24  21  0  0  0 24     0</div>

<div class="gmail_extra">-refinement_limit 0.00003125   DMPlexInterpolate       4 1.0 7.7508e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 30  0  0  0 24  30  0  0  0 24     0</div><div class="gmail_extra">-refinement_limit 0.000015625  DMPlexInterpolate       4 1.0 2.7267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 42  0  0  0 24  42  0  0  0 24     0</div>

<div class="gmail_extra">-refinement_limit 0.0000078125 DMPlexInterpolate       4 1.0 1.0175e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 58  0  0  0 24  58  0  0  0 24     0</div><div class="gmail_extra">-refinement_limit 0.00000625   DMPlexInterpolate       4 1.0 3.8912e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 72  0  0  0 24  72  0  0  0 24     0</div>

<div><br></div><div>You can see the quadratic complexity kick in, and this is what people are complaining about. Does this mean</div><div>we have to ditch kh_hash, or have I done something wrong.</div><div><br></div><div>

   Matt</div><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener

</div></div>