cuckoo-filter alternatives and similar packages
Based on the "Data Structures" category.
Alternatively, view cuckoo-filter alternatives based on common mentions on social networks and blogs.
-
gods
GoDS (Go Data Structures) - Sets, Lists, Stacks, Maps, Trees, Queues, and much more -
go-datastructures
A collection of useful, performant, and threadsafe Go datastructures. -
golang-set
A simple, battle-tested and generic set type for the Go language. Trusted by Docker, 1Password, Ethereum and Hashicorp. -
willf/bloom
Go package implementing Bloom filters, used by Milvus and Beego. -
roaring
Roaring bitmaps in Go (golang), used by InfluxDB, Bleve, DataDog -
gocache
☔️ A complete Go cache library that brings you multiple ways of managing your caches -
boomfilters
Probabilistic data structures for processing continuous, unbounded streams. -
gostl
Data structure and algorithm library for go, designed to provide functions similar to C++ STL -
hyperloglog
HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom -
trie
Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching. -
ttlcache
An in-memory cache with item expiration and generics [Moved to: https://github.com/jellydator/ttlcache] -
go-geoindex
Go native library for fast point tracking and K-Nearest queries -
go-adaptive-radix-tree
Adaptive Radix Trees implemented in Go -
Bloomfilter
Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go -
hilbert
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves. -
goconcurrentqueue
Go concurrent-safe, goroutine-safe, thread-safe queue -
levenshtein
Go implementation to calculate Levenshtein Distance. -
ring
Package ring provides a high performance and thread safe Go implementation of a bloom filter. -
go-rquad
:pushpin: State of the art point location and neighbour finding algorithms for region quadtrees, in Go -
memlog
A Kafka log inspired in-memory and append-only data structure -
set
A simple Set data structure implementation in Go (Golang) using LinkedHashMap. -
nan
Zero allocation Nullable structures in one library with handy conversion functions, marshallers and unmarshallers -
goset
Set is a useful collection but there is no built-in implementation in Go lang. -
hide
ID type with marshalling to/from hash to prevent sending IDs to clients.
InfluxDB - Power Real-Time Data Analytics at Scale
Do you think we are missing an alternative of cuckoo-filter or a related project?
README
cuckoo-filter
cuckoo-filter go implement. Config by you
transplant from efficient/cuckoofilter
[中文文档](./README_ZH.md)
Overview
Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use:
"Cuckoo Filter: Practically Better Than Bloom" in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky
Implementation details
The paper cited above leaves several parameters to choose.
- Bucket size(b): Number of fingerprints per bucket
- Fingerprints size(f): Fingerprints bits size of hashtag
In other implementation:
- seiflotfy/cuckoofilter use b=4, f=8 bit, which correspond to a false positive rate of
r ~= 0.03
. - panmari/cuckoofilter use b=4, f=16 bit, which correspond to a false positive rate of
r ~= 0.0001
. - irfansharif/cfilter can adjust b and f, but only can adjust f to 8x, which means it is in Bytes.
In this implementation, you can adjust b and f to any value you want, and the Semi-sorting Buckets mentioned in paper is also available, which can save 1 bit per item.
Why custom is important?
According to paper
- Different bucket size result in different filter loadfactor, which means occupancy rate of filter
- Different bucket size is suitable for different target false positive rate
- To keep a false positive rate, bigger bucket size, bigger fingerprint size
Given a target false positive rate of r
when r > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 < r ≤ 0.002, four entries per bucket minimizes space.
with a bucket size b
, they suggest choosing the fingerprint size f
using
f >= log2(2b/r) bits
as the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8
To know more about parameter choosing, refer to paper's section 5
Note: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits
Example usage:
package main
import (
"fmt"
"github.com/linvon/cuckoo-filter"
)
func main() {
cf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked)
fmt.Println(cf.Info())
fmt.Println(cf.FalsePositiveRate())
a := []byte("A")
cf.Add(a)
fmt.Println(cf.Contain(a))
fmt.Println(cf.Size())
b := cf.Encode()
ncf, _ := cuckoo.Decode(b)
fmt.Println(ncf.Contain(a))
cf.Delete(a)
fmt.Println(cf.Size())
}