Notes on Bloom Filter

Here are some notes after listening to a coursera lecture of Algorithm 1 by Professor Roughgarden at Stanford.

The goal of a Bloom filter is to provide a space efficient hash for some data entries. The insertion and validation of an entry are very efficient. But a trade-off is that deletion is not possible and it is possible to have a small but finite rate of false positive during validation.

The main idea of a Bloom filter is very simple. Its core is composed of a number (let say k) of hash functions and an array of n binary numbers A. The hash functions will map any data entry to a number from 1 to n. Given an entry d, a bloom filter simply set the h_i(d)-bit of A as one during insertion (i=1,\cdots,k). During validation, a bloom filter will decide an input entry e has been inserted before if h_i(e)=1 for i=1,\cdots,k.

One can easily see that it is impossible to have false negative but it can have false positive. Given an input e during validation, without loss of generality, let say the h_i(e)=i for i=1,\cdots,k. Let say if we have already |S| entries inserted to the bloom filter, then Pr(A(1)=1)=1-Pr(A(1)=0)=1-((n-1)/n)^{k|S|}\approx 1-e^{k|S|/n} for large n. Let us denote b\triangleq n/|S| as the number of bit used per entry. Then, Pr(A(1)=1)=1-e^{k/b} and the false positive rate is simply Pr(A(1)=1\quad \&\quad A(2)=1 \quad\&\cdots \&\quad A(k)=1)=(1-e^{k/b})^k.

Update on May 20, 2018

I came across reviewing a paper on Bloom filter and so went check out this note written years back. I always have difficulty remember the concept of Bloom filter. I guess mostly because I am from the Electrical Engineering (signal processing background) and Bloom filter is very much different from the “filters” I learned from in school. But looking from the original meaning of “filter”, i.e., trying to sift through something. After all Bloom filter is indeed a “filter”. I guess even more so than the filters in signal processing.

And as for “Bloom”, I originally thought that it has something to do with “broom”. No kidding. I am not a native speaker and I mixed up the two words for years and didn’t realize that. It turns out “Bloom” just came from the last name of the original creator.

Leave a Reply

Your email address will not be published. Required fields are marked *