The Pigeonhole Principle and Hash Tables

Overview

This blog post describes what the pigeonhole principle states, demonstrates how to prove its correctness by contrapositive, then applies the pigeonhole principle to the hash table data structure. After reading this post, you should understand how to prove the pigeonhole principle, and recognize situations in computer science where it applies.

Statement of the Principle

The pigeonhole principle states: if you try to put pigeons into holes, but have one more pigeon than you have holes, then at least one hole will have at least two pigeons in it. Stated more formally:


Assume you have $n \ge 1$ pigeonholes. If you have $n+1$ pigeons in these holes, then there exists at least one hole that contains two or more pigeons.


The pigeonhole principle uses the phrase “at least”, because it doesn’t dictate an exact distribution of the pigeons. Consider if you have 7 pigeons and 6 holes:

  • You could have with 5 holes containing 1 pigeon each, and the final hole containing 2 pigeons.
  • You could also have 1 hole containing 4 pigeons, a second hole containing 3 pigeons, and 4 empty holes.

Those are just two possibilities among many. But, in all of them, at least one hole contains two or more pigeons.

Breaking Down the Principle

While the principle does seem self-evident, let’s set up a proof by contrapositive to clearly demonstrate its truth. We’ll begin by breaking the statement of the principle into two parts. First:


Assume you have $n \ge 1$ pigeonholes.


When we want to prove a principle or theorem that starts with an assumption, that assumption is a statement of truth about our world. We don’t need to prove that we have n ≥ 1 holes; we simply do.

The second part of the principle is an “if-then” statement:


If you have $n+1$ pigeons in these holes, then there exists at least one hole that contains two or more pigeons.


This is the part of the principle that requires proof.

Considering a Direct Proof

It’s worth considering the possibility of proving this “if-then” statement directly: assume the “if” part of the statement is true, and from that assumption show that the “then” part follows. That is, start by assuming there are n+1 pigeons in the holes, and show — however the pigeons are arranged — that there’s at least one hole with two or more pigeons.

Unfortunately, as we discussed above, there are a very large number of ways of placing our n+1 pigeons into n holes (it would make a good exercise to determine just how many ways). The difficulty of enumerating all these possibilities makes demonstrating that the “then” part of the statement follows from the “if” part very difficult. Let’s look for a different way to structure our proof.

Proof by Contrapositive

When proving an “if-then” statement directly is overly complex, it’s worth considering if it’s easier to prove the contrapositive. Let’s rephrase the part of the principle requiring proof in the contrapositive — instead of trying to prove “if A then B”, we’ll instead prove “if not B, then not A”:


$\textit{Original}$: If you have $n+1$ pigeons in these holes, then there exists at least one hole that contains two or more pigeons.

$\textit{Contrapositive}$: If there is no hole that has two or more pigeons, then there cannot be $n+1$ pigeons among those holes.


Let’s prove the contrapositive statement (which is itself an “if-then” statement):


Assume that there is no hole with two or more pigeons. That is equivalent to assuming that every hole has 0 pigeons or 1 pigeon.

Recall, as a statement of fact, that there are $n$ pigeon holes. Since every hole has 0 pigeons or 1 pigeon, the number of pigeons in these holes is between $0 \cdot n$ and $1 \cdot n$.

More formally, let $p$ be the number of pigeons we have. We have bounds on $p$: $0 \cdot n \le p \le 1 \cdot n$. Note, $1 \cdot n = n$, and for any integer $n$ we know $n < n+1$. That is, $p \le 1 \cdot n = n < n+1$. We have $p < n+1$, which means there are fewer than $n+1$ pigeons among the holes.

Application to Hash Tables

Now that we’ve proved the correctness of the pigeonhole principle (if you have one more pigeon than you have holes, at least one hole has two or more pigeons), let’s consider it’s application to a hash table.

In a hash table, data values are stored in a finite number of buckets that are each looked up by a unique key. As a practical example, a hash table could contain statistics about daily traffic at airports. Data gets looked up using the unique three-letter code of the airport as the key (e.g., “YVR” for Vancouver International), and the value returned from the hash table is a document containing statistics on daily traffic at Vancouver International.

A mapping from the string “YVR” to a bucket number in which to store the statistics data is performed by a hash function, which attempts to place the value (the statistics) into a unique bucket for each possible key (the airport code). In practice, no hash function is perfect, and there will be collisions (multiple values placed into the same bucket). Hash table implementations often deal with this issue by growing the hash table — that is, increasing the number of buckets that the hash function can point to when given any key.

But the pigeon hole principle tells us: even with a perfect hash function, which has mapped n ≥ 1 different keys to n buckets with no collisions, adding one more value with a new key to the hash table will cause a collision. That’s because you can’t map n+1 keys to n buckets without at least two keys being mapped to one bucket.

Summary

The pigeonhole principle is a statement of something relatively self-evident: you can’t put n+1 pigeons into n pigeonholes without at least one hole having more than one pigeon. Despite it seeming obvious on its face, we can still prove that it’s correct using a proof by contrapositive.

We also explored how the pigeonhole principle dictates that there will necessarily be collisions in a hash table if there are more keys used than buckets for the values. While we focused on hash tables in this post, the pigeonhole principle appears in other areas of computer science as well.

For more tips, and to arrange for personalized tutoring for yourself or your study group, check out Vancouver Computer Science Tutoring.