Lecture 12

We reviewed the usual hash paradigm, familiar from ECS 110: recall the words: universe, hash table, slots, collisions, collision resolution by chaining, open addressing. We talked about the uniform hashing assumption where one models the hash function as being random. We described a use for hash functions: implementing the Dictionary ADT, which supports Lookup, Insert and Delete, with a semantics we specified. Under the uniform hash model, you can implement a dictionary in O(1) expected time per operation. You do this by making sure that the number of slots in your hash table, M, is Theta(n), where n is the number of items that get inserted into the hash table. We reviewed how to do this.

We discussed a problem with the above: the uniform hash model is just an abstraction. What we'd like is a concrete way to get O(1) expected time per operation, for any sequence of Dictionary operations. We could realize the uniform hashing assumption by considering a family of all hash functions, choosing a random one, but even naming a random element from this family takes too many bits! We want a way to specify a small, concrete set of hash functions which will preserve O(1) time per Dictionary operation.

To this end, we defined what is a Universal-2 family of hash functions, and an epsilon-Universal-2 family of hash functions. The latter means that for all distinct x,x' in the universe, the probability of collision, when you choose a random hash function from the family, is at most epsilon. Universal-2 means (1/M)-Universal-2, which is the best you can hope for. We proved that if you start with a Universal-2 family, where M=Theta(n), you get O(1) expected time per operation. We emphasized that this time holds for any sequence of keys you insert into your hash table; the expectation is over the choice of hash functions, not the keys you are hashing.points in the universe.

Finally we gave a concrete construction for an epsilon-Universal-2 hash family. I gave the linear map, where you take the inner product of the key you are hashing, x=x1...xN, for each xI a 32-bit words, with a=a1...aN, where a names the particular hash function from the family. We showed that this was 2^{-32}-Universal-2. I sketched why it remained n2^{-32}-Universal-2 even if you ignore the carry bits and do all the additions using 64-bit numbers.