[ccan] ccan/htable: Multiple questions

Ahmed Samy f.fallen45 at gmail.com
Thu Jun 12 17:37:55 EST 2014


Hey Rusty,

Thanks a lot for the explanation, it helped me write a simple code that
simulates your hash table:
https://gist.github.com/decltype/49f08c78ac40bf0124cc

Cheers!


On Wed, Jun 11, 2014 at 7:09 AM, Rusty Russell <rusty at rustcorp.com.au>
wrote:

> Ahmed Samy <f.fallen45 at gmail.com> writes:
> > Hey Rusty,
> >
> > I am having difficulties understanding your hashtable implementation and
> > have some questions for you if you don't mind:
>
> Hi Ahmed,
>
>         Sure, they're a bit obscure.  I've cc'd the mailing list so
> this gets archived...
>
> > - The following bit twiddling hacks are kind of unfamiliar to me, I
> > couldn't make sense out of them:
> >
> > static inline uintptr_t get_extra_ptr_bits(const struct htable *ht,
> >    uintptr_t e)
> > {
> > return e & ht->common_mask;
> > }
> >
> > static inline void *get_raw_ptr(const struct htable *ht, uintptr_t e)
> > {
> > return (void *)((e & ~ht->common_mask) | ht->common_bits);
> > }
> >
> > static inline uintptr_t make_hval(const struct htable *ht,
> >   const void *p, uintptr_t bits)
> > {
> > return ((uintptr_t)p & ~ht->common_mask) | bits;
> > }
>
> These three are related.  The idea behind htable (and why it's so fast)
> is because we use redundant bits in the pointers to hide more hash bits.
>
> In a normal hash table to find an entry, you hash it to find the bucket
> it would be in, then if that bucket's pointer isn't NULL, follow it to
> see if it's the entry we want.  This involves loading two cache lines:
> the bucket and the entry.
>
> By stashing some hash bits in the pointer, we can often know that an
> entry isn't correct without dereferencing the pointer.  eg. with 10
> extra hash bits, we'll eliminate 1023 of 1024 failed lookups.
>
> But what bits of the pointer can we use?  To do that, we track what bits
> are common to all the pointers in the hash (ht->common_mask).  For a
> 64-bit system that can be a lot of bits, but even for 32-bit pointers we
> can expect a few bits: the lower bits and some upper bits if everything
> in the hash table is on the heap.
>
> > static inline uintptr_t get_hash_ptr_bits(const struct htable *ht,
> >   size_t hash)
> > {
> > /* Shuffling the extra bits (as specified in mask) down the
> >  * end is quite expensive.  But the lower bits are redundant, so
> >  * we fold the value first. */
> > return (hash ^ (hash >> ht->bits))
> > & ht->common_mask & ~ht->perfect_bit;
> > }
>
> This is a bit trickier.  It decides *which* hash bits we try to stash in
> the pointer.  The lower bits are redundant, since they already determine
> which hash bucket it's in.  But spreading the rest of the bits out
> according to the common_mask is computationally expensive, so we simply
> xor the hash and the shifted hash, which mixes a little and is cheap
> (and means that those high pointer bits are still useful).
>
> ht->perfect_bit is another bit we steal from the pointer.  If a hash
> bucket is full, we try the next hash bucket, until we find an empty
> one.  ht->perfect_bit is set if a hash entry is in the first bucket it
> can be (see commit 78e983a7a6e5250ebf963d5d93fe34c1d27d5a39).  Knowing
> this means we don't have to move the entry when we do a delete.
>
> This costs a little on insert, but really helps on deletion from a
> crowded table.  I hacked it out (patch below) and re-benchmarked.  The
> most noticable improvement is on a table with many deleted markers:
>
> 64 bit with perfect bit:
>   Initial re-inserting: 76-92(77.5867+/-2.9) ns
> 32 bit with perfect bit:
>   Initial re-inserting: 74-81(75.35+/-1.7) ns
>
> 64 bit without perfect bit:
>   Initial re-inserting: 107-111(107.75+/-0.94) ns
> 32 bit without perfect bit:
>   Initial re-inserting: 102-110(105.45+/-2.1) ns
>
> Hope that helps!
> Rusty.
>
> Subject: Remove perfect_bit.
>
> diff --git a/ccan/htable/htable.c b/ccan/htable/htable.c
> index b7e65d1..687b141 100644
> --- a/ccan/htable/htable.c
> +++ b/ccan/htable/htable.c
> @@ -39,7 +39,7 @@ static inline uintptr_t get_hash_ptr_bits(const struct
> htable *ht,
>          * end is quite expensive.  But the lower bits are redundant, so
>          * we fold the value first. */
>         return (hash ^ (hash >> ht->bits))
> -               & ht->common_mask & ~ht->perfect_bit;
> +               & ht->common_mask;
>  }
>
>  void htable_init(struct htable *ht,
> @@ -49,12 +49,12 @@ void htable_init(struct htable *ht,
>         *ht = empty;
>         ht->rehash = rehash;
>         ht->priv = priv;
> -       ht->table = &ht->perfect_bit;
> +       ht->table = &ht->common_bits;
>  }
>
>  void htable_clear(struct htable *ht)
>  {
> -       if (ht->table != &ht->perfect_bit)
> +       if (ht->table != (void *)&ht->table)
>                 free((void *)ht->table);
>         htable_init(ht, ht->rehash, ht->priv);
>  }
> @@ -65,9 +65,9 @@ static size_t hash_bucket(const struct htable *ht,
> size_t h)
>  }
>
>  static void *htable_val(const struct htable *ht,
> -                       struct htable_iter *i, size_t hash, uintptr_t
> perfect)
> +                       struct htable_iter *i, size_t hash)
>  {
> -       uintptr_t h2 = get_hash_ptr_bits(ht, hash) | perfect;
> +       uintptr_t h2 = get_hash_ptr_bits(ht, hash);
>
>         while (ht->table[i->off]) {
>                 if (ht->table[i->off] != HTABLE_DELETED) {
> @@ -75,7 +75,6 @@ static void *htable_val(const struct htable *ht,
>                                 return get_raw_ptr(ht, ht->table[i->off]);
>                 }
>                 i->off = (i->off + 1) & ((1 << ht->bits)-1);
> -               h2 &= ~perfect;
>         }
>         return NULL;
>  }
> @@ -84,14 +83,14 @@ void *htable_firstval(const struct htable *ht,
>                       struct htable_iter *i, size_t hash)
>  {
>         i->off = hash_bucket(ht, hash);
> -       return htable_val(ht, i, hash, ht->perfect_bit);
> +       return htable_val(ht, i, hash);
>  }
>
>  void *htable_nextval(const struct htable *ht,
>                      struct htable_iter *i, size_t hash)
>  {
>         i->off = (i->off + 1) & ((1 << ht->bits)-1);
> -       return htable_val(ht, i, hash, 0);
> +       return htable_val(ht, i, hash);
>  }
>
>  void *htable_first(const struct htable *ht, struct htable_iter *i)
> @@ -116,15 +115,13 @@ void *htable_next(const struct htable *ht, struct
> htable_iter *i)
>  static void ht_add(struct htable *ht, const void *new, size_t h)
>  {
>         size_t i;
> -       uintptr_t perfect = ht->perfect_bit;
>
>         i = hash_bucket(ht, h);
>
>         while (entry_is_valid(ht->table[i])) {
> -               perfect = 0;
>                 i = (i + 1) & ((1 << ht->bits)-1);
>         }
> -       ht->table[i] = make_hval(ht, new, get_hash_ptr_bits(ht,
> h)|perfect);
> +       ht->table[i] = make_hval(ht, new, get_hash_ptr_bits(ht, h));
>  }
>
>  static COLD bool double_table(struct htable *ht)
> @@ -143,17 +140,7 @@ static COLD bool double_table(struct htable *ht)
>         ht->max = ((size_t)3 << ht->bits) / 4;
>         ht->max_with_deleted = ((size_t)9 << ht->bits) / 10;
>
> -       /* If we lost our "perfect bit", get it back now. */
> -       if (!ht->perfect_bit && ht->common_mask) {
> -               for (i = 0; i < sizeof(ht->common_mask) * CHAR_BIT; i++) {
> -                       if (ht->common_mask & ((size_t)1 << i)) {
> -                               ht->perfect_bit = (size_t)1 << i;
> -                               break;
> -                       }
> -               }
> -       }
> -
> -       if (oldtable != &ht->perfect_bit) {
> +       if (oldtable != &ht->common_bits) {
>                 for (i = 0; i < oldnum; i++) {
>                         if (entry_is_valid(e = oldtable[i])) {
>                                 void *p = get_raw_ptr(ht, e);
> @@ -181,7 +168,7 @@ static COLD void rehash_table(struct htable *ht)
>                         continue;
>                 if (e == HTABLE_DELETED)
>                         ht->table[h] = 0;
> -               else if (!(e & ht->perfect_bit)) {
> +               else {
>                         void *p = get_raw_ptr(ht, e);
>                         ht->table[h] = 0;
>                         ht_add(ht, p, ht->rehash(p, ht->priv));
> @@ -208,7 +195,6 @@ static COLD void update_common(struct htable *ht,
> const void *p)
>
>                 ht->common_mask = ~((uintptr_t)1 << i);
>                 ht->common_bits = ((uintptr_t)p & ht->common_mask);
> -               ht->perfect_bit = 1;
>                 return;
>         }
>
> @@ -227,10 +213,9 @@ static COLD void update_common(struct htable *ht,
> const void *p)
>                 ht->table[i] |= bitsdiff;
>         }
>
> -       /* Take away those bits from our mask, bits and perfect bit. */
> +       /* Take away those bits from our mask, bits. */
>         ht->common_mask &= ~maskdiff;
>         ht->common_bits &= ~maskdiff;
> -       ht->perfect_bit &= ~maskdiff;
>  }
>
>  bool htable_add(struct htable *ht, size_t hash, const void *p)
> diff --git a/ccan/htable/htable.h b/ccan/htable/htable.h
> index ed668e7..f5cb336 100644
> --- a/ccan/htable/htable.h
> +++ b/ccan/htable/htable.h
> @@ -19,7 +19,6 @@ struct htable {
>         size_t elems, deleted, max, max_with_deleted;
>         /* These are the bits which are the same in all pointers. */
>         uintptr_t common_mask, common_bits;
> -       uintptr_t perfect_bit;
>         uintptr_t *table;
>  };
>
> @@ -40,7 +39,7 @@ struct htable {
>   *     static struct htable ht = HTABLE_INITIALIZER(ht, rehash, NULL);
>   */
>  #define HTABLE_INITIALIZER(name, rehash, priv)                         \
> -       { rehash, priv, 0, 0, 0, 0, 0, -1, 0, 0, &name.perfect_bit }
> +       { rehash, priv, 0, 0, 0, 0, 0, -1, 0, &name.common_bits }
>
>  /**
>   * htable_init - initialize an empty hash table.
> diff --git a/ccan/htable/test/run.c b/ccan/htable/test/run.c
> index 7fc05e2..ac1d2d4 100644
> --- a/ccan/htable/test/run.c
> +++ b/ccan/htable/test/run.c
> @@ -98,14 +98,13 @@ static bool check_mask(struct htable *ht, uint64_t
> val[], unsigned num)
>  int main(int argc, char *argv[])
>  {
>         unsigned int i, weight;
> -       uintptr_t perfect_bit;
>         struct htable ht;
>         uint64_t val[NUM_VALS];
>         uint64_t dne;
>         void *p;
>         struct htable_iter iter;
>
> -       plan_tests(29);
> +       plan_tests(25);
>         for (i = 0; i < NUM_VALS; i++)
>                 val[i] = i;
>         dne = i;
> @@ -175,20 +174,6 @@ int main(int argc, char *argv[])
>         find_vals(&ht, val, NUM_VALS);
>         ok1(!htable_get(&ht, hash(&dne, NULL), objcmp, &dne));
>
> -       /* Corner cases: wipe out the perfect bit using bogus pointer. */
> -       htable_clear(&ht);
> -       htable_add(&ht, 0, (void *)((uintptr_t)&val[NUM_VALS-1]));
> -       ok1(ht.perfect_bit);
> -       perfect_bit = ht.perfect_bit;
> -       htable_add(&ht, 0, (void *)((uintptr_t)&val[NUM_VALS-1]
> -                                  | perfect_bit));
> -       ok1(ht.perfect_bit == 0);
> -       htable_del(&ht, 0, (void *)((uintptr_t)&val[NUM_VALS-1] |
> perfect_bit));
> -
> -       /* Enlarging should restore it... */
> -       add_vals(&ht, val, 0, NUM_VALS-1);
> -
> -       ok1(ht.perfect_bit != 0);
>         htable_clear(&ht);
>
>         return exit_status();
> diff --git a/ccan/htable/tools/speed.c b/ccan/htable/tools/speed.c
> index dce3fdf..cb24f1d 100644
> --- a/ccan/htable/tools/speed.c
> +++ b/ccan/htable/tools/speed.c
> @@ -74,8 +74,6 @@ static size_t perfect(const struct htable *ht)
>                         continue;
>                 if (hash_bucket(ht, ht->rehash(get_raw_ptr(ht,
> ht->table[i]),
>                                                ht->priv)) == i) {
> -                       assert((ht->table[i] & ht->perfect_bit)
> -                              == ht->perfect_bit);
>                         placed_perfect++;
>                 }
>         }
>



Ahmed, C/C++/Java/Assembly/Python/Lua Developer
https://github.com/decltype/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/ccan/attachments/20140612/224e687d/attachment-0001.html>


More information about the ccan mailing list