Here are some notes after listening to a coursera lecture of Algorithm 1 by Professor Roughgarden at Stanford.
The goal of a Bloom filter is to provide a space efficient hash for some data entries. The insertion and validation of an entry are very efficient. But a trade-off is that deletion is not possible and it is possible to have a small but finite rate of false positive during validation.
The main idea of a Bloom filter is very simple. Its core is composed of a number (let say ) of hash functions and an array of binary numbers . The hash functions will map any data entry to a number from to . Given an entry , a bloom filter simply set the -bit of as one during insertion (). During validation, a bloom filter will decide an input entry has been inserted before if for .
One can easily see that it is impossible to have false negative but it can have false positive. Given an input during validation, without loss of generality, let say the for . Let say if we have already entries inserted to the bloom filter, then for large . Let us denote as the number of bit used per entry. Then, and the false positive rate is simply .
Update on May 20, 2018
I came across reviewing a paper on Bloom filter and so went check out this note written years back. I always have difficulty remember the concept of Bloom filter. I guess mostly because I am from the Electrical Engineering (signal processing background) and Bloom filter is very much different from the “filters” I learned from in school. But looking from the original meaning of “filter”, i.e., trying to sift through something. After all Bloom filter is indeed a “filter”. I guess even more so than the filters in signal processing.
And as for “Bloom”, I originally thought that it has something to do with “broom”. No kidding. I am not a native speaker and I mixed up the two words for years and didn’t realize that. It turns out “Bloom” just came from the last name of the original creator.