Skip to main content
Background Image

nCine Dev Update 7

Updates from the second half of December 2018

·3 mins
Dev Update - This article is part of a series.
Part 7: This Article

Just a few weeks after the last update here I’m again to write the next one in which I’m going to show you the performance of my new hash table implementation.

It is based on the Leapfrog Probing article by Jeff Preshing, where the author explains a new probing algorithm for collision resolution when using open addressing.

My version is not multi-threaded but in return it is able to remove elements, a task that turned out to be quite more challenging than what I expected in the beginning. 😅 I also took the opportunity to change the default hash function to FNV-1a.

My previous implementation was based on separate chaining with list head cells, a pretty common technique that allows for unlimited number of stored elements in the face of many potential cache misses when the load factor increases.

The following results are a courtesy of the Google Benchmark library, the latest third-party software integrated in the nCine. They have been recorded on an Intel i7-8550U, running Arch Linux with the performance scaling governor and a fixed frequency of 800 MHz.

The tests have been performed on different implementations, they all have in common an unsigned int for the key and the value and 1024 buckets:

  • StaticHashMap is the new open addressing hash map in a version that takes the number of buckets as a template argument, similarly to std::array
  • HashMap is the new open addressing hash map in a more classic version that allocates on the heap
  • HashMapList is the old hash map with separate chaining
  • unordered_map is the standard hash table implementation from the STL

Some tests have a number after their name representing either the number of elements already in the hash table, like for Copy, Retrieve or Clear, or the number of times the operation is repeated, like for Insert, Remove or ReverseRemove.

BenchmarkStaticHashMapHashMapHashMapListunordered_map
Creation443 ns34106 ns86589 ns13143 ns
Copy/2562859 ns31501 ns78531 ns1656125 ns
Copy/5122912 ns31004 ns745993 ns3600840 ns
Copy/7682895 ns34312 ns1075128 ns4931709 ns
Insert/2564725 ns9670 ns9313 ns1581495 ns
Insert/5128423 ns19566 ns674695 ns3169524 ns
Insert/76813896 ns33050 ns1007072 ns6504032 ns
Retrieve/25666 ns67 ns75 ns72 ns
Retrieve/51271 ns74 ns81 ns72 ns
Retrieve/76876 ns77 ns83 ns73 ns
Clear/2561733 ns22215 ns42527 ns1541229 ns
Clear/5121733 ns21789 ns701556 ns3078237 ns
Clear/7681734 ns22076 ns1025812 ns4609838 ns
Remove/2565083 ns25895 ns35680 ns1565479 ns
Remove/5129459 ns38793 ns703407 ns3136167 ns
Remove/76814828 ns53607 ns1040247 ns4662149 ns
ReverseRemove/2564906 ns25564 ns35849 ns1560456 ns
ReverseRemove/5129620 ns38567 ns700163 ns3113046 ns
ReverseRemove/76814527 ns51159 ns1031670 ns4669154 ns

Last but not least, I have replaced all the occurrences of the classic rand() function with a new PCG32 random number generator based on the official minimal C implementation.

It has a higher statistical quality while also being a lot faster. In my microbenchmarks, with the same CPU settings as the hash map ones, it is able to generate 1024 integers in around 14203 ns, as opposed to 74303 ns when using simple rand() calls.

Dev Update - This article is part of a series.
Part 7: This Article