nCine Dev Update 7 | encelo.github.io

Just a few weeks after the last update here I’m again to write the next one in which I’m going to show you the performance of my new hash table implementation.

It is based on the Leapfrog Probing article by Jeff Preshing, where the author explains a new probing algorithm for collision resolution when using open addressing.

My version is not multi-threaded but in return it is able to remove elements, a task that turned out to be quite more challenging than what I expected in the beginning. I also took the opportunity to change the default hash function to FNV-1a.

My previous implementation was based on separate chaining with list head cells, a pretty common technique that allows for unlimited number of stored elements in the face of many potential cache misses when the load factor increases.

The following results are a courtesy of the Google Benchmark library, the latest third-party software integrated in the nCine. They have been recorded on an Intel i7-8550U, running Arch Linux with the performance scaling governor and a fixed frequency of 800 MHz.

The tests have been performed on different implementations, they all have in common an unsigned int for the key and the value and 1024 buckets:

StaticHashMap is the new open addressing hash map in a version that takes the number of buckets as a template argument, similarly to std::array
HashMap is the new open addressing hash map in a more classic version that allocates on the heap
HashMapList is the old hash map with separate chaining
unordered_map is the standard hash table implementation from the STL

Some tests have a number after their name representing either the number of elements already in the hash table, like for Copy, Retrieve or Clear, or the number of times the operation is repeated, like for Insert, Remove or ReverseRemove.

Benchmark	StaticHashMap	HashMap	HashMapList	unordered_map
Creation	443 ns	34106 ns	86589 ns	13143 ns
Copy/256	2859 ns	31501 ns	78531 ns	1656125 ns
Copy/512	2912 ns	31004 ns	745993 ns	3600840 ns
Copy/768	2895 ns	34312 ns	1075128 ns	4931709 ns
Insert/256	4725 ns	9670 ns	9313 ns	1581495 ns
Insert/512	8423 ns	19566 ns	674695 ns	3169524 ns
Insert/768	13896 ns	33050 ns	1007072 ns	6504032 ns
Retrieve/256	66 ns	67 ns	75 ns	72 ns
Retrieve/512	71 ns	74 ns	81 ns	72 ns
Retrieve/768	76 ns	77 ns	83 ns	73 ns
Clear/256	1733 ns	22215 ns	42527 ns	1541229 ns
Clear/512	1733 ns	21789 ns	701556 ns	3078237 ns
Clear/768	1734 ns	22076 ns	1025812 ns	4609838 ns
Remove/256	5083 ns	25895 ns	35680 ns	1565479 ns
Remove/512	9459 ns	38793 ns	703407 ns	3136167 ns
Remove/768	14828 ns	53607 ns	1040247 ns	4662149 ns
ReverseRemove/256	4906 ns	25564 ns	35849 ns	1560456 ns
ReverseRemove/512	9620 ns	38567 ns	700163 ns	3113046 ns
ReverseRemove/768	14527 ns	51159 ns	1031670 ns	4669154 ns

Last but not least, I have replaced all the occurrences of the classic rand() function with a new PCG32 random number generator based on the official minimal C implementation.

It has a higher statistical quality while also being a lot faster. In my microbenchmarks, with the same CPU settings as the hash map ones, it is able to generate 1024 integers in around 14203 ns, as opposed to 74303 ns when using simple rand() calls.

Tags: nCine