# Notes on Bloom Filter

Here are some notes after listening to a coursera lecture of Algorithm 1 by Professor Roughgarden at Stanford.

The goal of a Bloom filter is to provide a space efficient hash for some data entries. The insertion and validation of an entry are very efficient. But a trade-off is that deletion is not possible and it is possible to have a small but finite rate of false positive during validation.

The main idea of a Bloom filter is very simple. Its core is composed of a number (let say $k$) of hash functions and an array of $n$ binary numbers $A$. The hash functions will map any data entry to a number from $1$ to $n$. Given an entry $d$, a bloom filter simply set the $h_i(d)$-bit of $A$ as one during insertion ($i=1,\cdots,k$). During validation, a bloom filter will decide an input entry $e$ has been inserted before if $h_i(e)=1$ for $i=1,\cdots,k$.

One can easily see that it is impossible to have false negative but it can have false positive. Given an input $e$ during validation, without loss of generality, let say the $h_i(e)=i$ for $i=1,\cdots,k$. Let say if we have already $|S|$ entries inserted to the bloom filter, then $Pr(A(1)=1)=1-Pr(A(1)=0)=1-((n-1)/n)^{k|S|}$$\approx 1-e^{k|S|/n}$ for large $n$. Let us denote $b\triangleq n/|S|$ as the number of bit used per entry. Then, $Pr(A(1)=1)=1-e^{k/b}$ and the false positive rate is simply $Pr(A(1)=1\quad \&\quad A(2)=1 \quad\&\cdots \&\quad A(k)=1)=(1-e^{k/b})^k$.

##### Update on May 20, 2018

I came across reviewing a paper on Bloom filter and so went check out this note written years back. I always have difficulty remember the concept of Bloom filter. I guess mostly because I am from the Electrical Engineering (signal processing background) and Bloom filter is very much different from the “filters” I learned from in school. But looking from the original meaning of “filter”, i.e., trying to sift through something. After all Bloom filter is indeed a “filter”. I guess even more so than the filters in signal processing.

And as for “Bloom”, I originally thought that it has something to do with “broom”. No kidding. I am not a native speaker and I mixed up the two words for years and didn’t realize that. It turns out “Bloom” just came from the last name of the original creator.