Mathematical Security of Complex Passwords - Vancouver Computer Science Tutoring

I thought I’d do something a little different with this month’s blog post. Instead of discussing a programming or logic tip, I thought we could dive into an interesting problem I came across. The problem is about passwords, so naturally it touches on concepts in computer security, and it also involves some combinatorics. But I don’t want that to seem daunting — it’s a fun problem, and we’ll break it down together step by step.

The problem centres around complex password requirements. We’ve probably all seen some variant of the “mix of characters” requirement when we’ve created a new account online. One of its simpler incarnations is this one:

Passwords must contain at least one uppercase letter, one lowercase letter, one number, and one symbol.

The question we’ll explore is: how secure, mathematically, are passwords with this complexity requirement? That is, how many bits of security do they provide? (This concept is analogous to how many bits of security are provided by a 128- or 256-bit cryptographic key.) And, how does that compare to passwords that don’t have this complexity requirement?

Password Construction Details

Let’s get precise with just how we’re going to construct our passwords. We’ll construct our passwords using the following characters:

The lowercase letters, [a-z];
The uppercase letters, [A-Z];
The numbers, [0-9]; and,
Some set of special symbols.

We could use whatever set of special symbols we like. For example, we could use the 32 symbols in the ASCII standard. But, it’ll make a few of our upcoming equations easier to read if we choose exactly 10 special symbols for our character set.

To make things a little less abstract, we could imagine the symbols in our passwords being the 10 symbols we can get from pressing “shift” and hitting a number key. For me, an anglophone Canadian typing on my laptop keyboard, that’s:

! @ # $ % ^ & * ( )

But whatever symbols we might choose, what’s actually important is the count. Our passwords will be made up of the 26 lowercase letters, 26 uppercase letters, 10 numbers, and 10 symbols — a total of 72 possible characters.

In order to discuss the security impact of the password complexity requirement — that there be at least one of each of the four types of character — we need some names for passwords generated with or without this requirement. Let’s call them:

Simple passwords: these are passwords created from the 72 characters, where there are no additional requirements about which characters are used. A simple password could contain only lowercase letters, for example, or a mix of all four types of character.
Complex passwords: these are passwords created from the 72 characters, but there’s a special requirement that’s enforced. Specifically, there must be at least one lowercase letter, one uppercase letter, one number, and one symbol. So, you couldn’t have a complex password of all lowercase letters.

Our goal will be to compare the mathematical level of security offered by complex passwords to that offered by simple passwords.

Bits of Security

When we talk about the level of security offered by a password or by a cryptographic key, we’re fundamentally looking at how many different values that password or that key could have. Consider a cryptographic key that could have any of $2^{128}$ possible values, chosen in a way that’s unbiased (uniformly random over all possible values) and completely unpredictable. The level of security provided by this key, measured as the number of bits of security it provides, is the $\log_2{}$ of how many possible keys there are: $\log_2{\left(2^{128}\right)} = 128$ bits of security.

We can generalize this idea to any sort of password of length n. Let’s say we have an algorithm in a password-generator app that takes length n as input, and outputs a password that’s n characters long. Let’s assume the output of the algorithm is unbiased (all possible passwords of length n have equal probability of being generated) and completely unpredictable. The number of bits of security provided by a password generated in this manner is the $\log_2{}$ of how many possible passwords there are.

Let’s apply that idea to two specific algorithms that generate our two password types. The first algorithm will take input n and output an n-character unpredictable, unbiased simple password (that is, all possible n-character simple passwords are equally likely to be generated). If we define

$\textrm{p}_{\textrm{s}}\left(n\right)$ = the number of simple passwords of length $n$,

then the number of bits of security provided by a randomly generated simple password of length $n \ge 1$ is $\log_2{\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]}$.

Our second algorithm also takes length n as input, but it outputs an n-character unpredictable, unbiased complex password (in this case, all n-character complex passwords are equally likely to be generated). We’ll restrict ourselves to $n \ge 4$, since it’s not possible for a complex password to have fewer than four characters. If we also define

$\textrm{p}_{\textrm{c}}\left(n\right)$ = the number of complex passwords of length $n$,

then the number of bits of security provided by a complex password of length $n \ge 4$ is $\log_2{\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]}$.

Our goal, ultimately, will be to compare $\log_2{\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]}$ and $\log_2{\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]}$ — that is, compare how many bits of security simple vs. complex passwords of length n provide. Naturally, to do that, we’re going to have to compute $\textrm{p}_{\textrm{s}}\left(n\right)$ and $\textrm{p}_{\textrm{c}}\left(n\right)$.

The Paradox of Fewer Complex Passwords

Before we delve into computations, though, let’s stop to observe something unexpected. According to our definition of bits of security, complex passwords are less secure than simple passwords.

We’ll restrict ourselves to password lengths $n \ge 4$, so that both complex and simple passwords exist — that is, $\textrm{p}_{\textrm{c}}\left(n\right)>0$ and $\textrm{p}_{\textrm{s}}\left(n\right)>0$. In this case, complex passwords provide $\log_2{\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]}$ bits of security, and simple passwords offer $\log_2{\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]}$ bits of security. But, $\textrm{p}_{\textrm{c}}\left(n\right) < \textrm{p}_{\textrm{s}}\left(n\right)$ for all password lengths $n \ge 4$, meaning complex passwords provide fewer bits of security.

Why is $\textrm{p}_{\textrm{c}}\left(n\right) < \textrm{p}_{\textrm{s}}\left(n\right)$ for all $n \ge 4$? It’s because every valid complex password is itself a simple password; but, there are some simple passwords that are not valid complex passwords. A string of all lowercase letters could be generated as a simple password, but it could never be generated as a complex password.

Paradoxically, complex passwords offer fewer bits of security, because there are fewer possible passwords.

So, why do so many websites require complex passwords? It’s (an attempt) to prevent people from using easily guessable passwords. The tradeoff for complex passwords providing fewer bits of security is that no one can use their dog’s name as their password.

Whether complexity requirements actually make people choose better passwords than their dog’s name, though, is a different question that we won’t tackle in this blog post. Bruce Schneier wrote back in 2014 about complex passwords still being easily guessable, and what he wrote back then is still relevant today. Modern NIST password standards even dictate that complexity requirements not be used. But, since complexity requirements are still in place pretty much everywhere, let’s run the math on their security impact anyway.

Security of Simple Passwords

Let’s start with the math on simple passwords. They’re constructed from 72 characters with no complexity requirements. So, given password length n, there are

$\textrm{p}_{\textrm{s}}\left(n\right) = 72^n$

possible simple passwords. An unbiased simple password of length n, therefore, offers

$\log_2{\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]}=\log_2{\left(72^n\right)}=n\cdot\log_2{\left(72\right)}$

bits of security. To give some concrete examples for simple passwords of lengths 8, 21, and 42:

$\log_2{\left[\textrm{p}_{\textrm{s}}\left(8\right)\right]} = 8\cdot \log_2{\left(72\right)}\approx 49.359$ bits of security
$\log_2{\left[\textrm{p}_{\textrm{s}}\left(21\right)\right]} = 21\cdot \log_2{\left(72\right)}\approx 129.568$ bits of security
$\log_2{\left[\textrm{p}_{\textrm{s}}\left(42\right)\right]} = 42\cdot \log_2{\left(72\right)}\approx 259.136$ bits of security

So, a simple password of length 8 offers a paltry 49.359 bits of security. A simple password of length 21 is required to get at least 128 bits of security, and a simple password that’s 42 characters long achieves 256 bits of security.

Counting Valid Complex Passwords

In comparison to working with simple passwords, it’s a more involved problem to compute how many valid complex passwords of length $n \ge 4$ there are. To do so, we’ll consider all possible simple passwords of length n, then figure out how many of them we have to discard as invalid.

Let’s start by defining two sets:

$\mathcal{U}$: the universal set of all simple passwords of length n, where $\left|\mathcal{U}\right| = 72^n$
$\mathcal{D}$: the discard set, $\mathcal{D} \subsetneq \mathcal{U}$, of all simple passwords of length n that are not valid complex passwords.

The value we’re trying to compute — the number of valid complex passwords — is:

$\textrm{p}_{\textrm{c}}\left(n\right) = \left|\mathcal{U}\right| – \left|\mathcal{D}\right| = 72^n – \left|\mathcal{D}\right|$.

Let’s work out what the size of $\mathcal{D}$ is by considering what elements are in $\mathcal{D}$. There are four reasons we might discard a password as invalid: it lacks an uppercase character, a lowercase character, a number, or a symbol. Let’s take a closer look at the four subsets of $\mathcal{D}$ created by those criteria:

$\mathcal{P}$: simple passwords that lack any of the 26 uppercase letters
$\mathcal{Q}$: simple passwords that lack any of the 26 lowercase letters
$\mathcal{R}$: simple passwords that lack any of the 10 numbers
$\mathcal{S}$: simple passwords that lack any of the 10 symbols

Every discarded password in $\mathcal{D}$ belongs to at least one of $\mathcal{P}$, $\mathcal{Q}$, $\mathcal{R}$, or $\mathcal{S}$. So,

$\mathcal{D}=\mathcal{P}\cup\mathcal{Q}\cup\mathcal{R}\cup\mathcal{S}$.

Note that there’s no simple password that lacks all of an uppercase letter, a lowercase letter, a number, and a symbol. Those are the only possible characters a simple password can have. So, $\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}=\emptyset$.

However, any individual simple password could still be discarded as non-complex for multiple reasons. For example, it could lack both a lowercase letter and a number. That means, unfortunately, that

$\left|\mathcal{D}\right|\neq\left|\mathcal{P}\right|+\left|\mathcal{Q}\right|+\left|\mathcal{R}\right|+\left|\mathcal{S}\right|$.

To compute $\left|\mathcal{D}\right|$, we instead have to apply the inclusion-exclusion principle. That principle, from combinatorics, says that to compute the size of $\mathcal{D}$, we have to:

Sum the sizes of the four subsets:

$\left|\mathcal{P}\right|+\left|\mathcal{Q}\right|+\left|\mathcal{R}\right|+\left|\mathcal{S}\right|$

Subtract the sizes of the two-way intersections:

$-\left|\mathcal{P}\cap\mathcal{Q}\right|-\left|\mathcal{P}\cap\mathcal{R}\right|-\left|\mathcal{P}\cap\mathcal{S}\right|-\left|\mathcal{Q}\cap\mathcal{R}\right|-\left|\mathcal{Q}\cap\mathcal{S}\right|-\left|\mathcal{R}\cap\mathcal{S}\right|$

Add back in the sizes of the three-way intersections:

$+\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\right|+\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{S}\right|+\left|\mathcal{P}\cap\mathcal{R}\cap\mathcal{S}\right|+\left|\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|$

And finally, subtract off the size of the four-way intersection:

$-\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\cap\mathcal{Q}\right|$

Introducing some symbols to represent those three sums, we can express that computation as:

$\left|D\right|=\Sigma_\textrm{one}-\Sigma_\textrm{two}+\Sigma_\textrm{three}-\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|$,

where

\begin{align*}
\Sigma_\textrm{one} &= \left|\mathcal{P}\right|+\left|\mathcal{Q}\right|+\left|\mathcal{R}\right|+\left|\mathcal{S}\right| \\
\Sigma_\textrm{two} &= \left|\mathcal{P}\cap\mathcal{Q}\right|+\left|\mathcal{P}\cap\mathcal{R}\right|+\left|\mathcal{P}\cap\mathcal{S}\right|+\left|\mathcal{Q}\cap\mathcal{R}\right|+\left|\mathcal{Q}\cap\mathcal{S}\right|+\left|\mathcal{R}\cap\mathcal{S}\right| \\
\Sigma_\textrm{three} &= \left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\right|+\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{S}\right|+\left|\mathcal{P}\cap\mathcal{R}\cap\mathcal{S}\right|+\left|\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|.
\end{align*}

Because the four-way intersection is empty, $\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|=0$. So,

$\left|D\right|=\Sigma_\textrm{one}-\Sigma_\textrm{two}+\Sigma_\textrm{three}$.

That leaves us three sums to work out, starting with

$\Sigma_\textrm{one}=\left|\mathcal{P}\right|+\left|\mathcal{Q}\right|+\left|\mathcal{R}\right|+\left|\mathcal{S}\right|$.

These subsets, $\mathcal{P}$, $\mathcal{Q}$, $\mathcal{R}$, and $\mathcal{S}$, are those simple passwords that aren’t valid complex passwords because they are missing one of the four requisite character types. Considering each in turn:

$\mathcal{P}$ consists of passwords made up of anything but the 26 uppercase characters, so $\left|\mathcal{P}\right|=\left(72-26\right)^n=46^n$
$\mathcal{Q}$ similarly excludes the 26 lowercase characters, so $\left|\mathcal{Q}\right|=\left(72-26\right)^n=46^n$
$\mathcal{R}$ excludes the 10 numbers, so $\left|\mathcal{R}\right|=\left(72-10\right)^n=62^n$
$\mathcal{S}$ excludes the 10 symbols, so $\left|\mathcal{S}\right|=\left(72-10\right)^n=62^n$

Adding those terms together gives us:

$\Sigma_\textrm{one} = 2 \cdot 62^n+2\cdot 46^n$.

Next, let’s look at the second sum,

$\Sigma_\textrm{two}=\left|\mathcal{P}\cap\mathcal{Q}\right|+\left|\mathcal{P}\cap\mathcal{R}\right|+\left|\mathcal{P}\cap\mathcal{S}\right|+\left|\mathcal{Q}\cap\mathcal{R}\right|+\left|\mathcal{Q}\cap\mathcal{S}\right|+\left|\mathcal{R}\cap\mathcal{S}\right|$.

These are the two-way intersections — simple passwords that cannot be complex passwords because they are missing two of the requisite character types. Going through each term in the sum:

$\mathcal{P}\cap\mathcal{Q}$ excludes the 26 uppercase and 26 lowercase letters, so $\left|\mathcal{P}\cap\mathcal{Q}\right|=(72-26-26)^n=20^n$
$\mathcal{P}\cap\mathcal{R}$ excludes the 26 uppercase letters and 10 numbers, so $\left|\mathcal{P}\cap\mathcal{R}\right|=(72-26-10)^n=36^n$
$\mathcal{P}\cap\mathcal{S}$ excludes the 26 uppercase letters and 10 symbols, so $\left|\mathcal{P}\cap\mathcal{S}\right|=(72-26-10)^n=36^n$
$\mathcal{Q}\cap\mathcal{R}$ excludes the 26 lowercase letters and 10 numbers, so $\left|\mathcal{Q}\cap\mathcal{R}\right|=(72-26-10)^n=36^n$
$\mathcal{Q}\cap\mathcal{S}$ excludes the 26 lowercase letters and 10 symbols, so $\left|\mathcal{Q}\cap\mathcal{S}\right|=(72-26-10)^n=36^n$
$\mathcal{R}\cap\mathcal{S}$ excludes the 10 numbers and 10 symbols, so $\left|\mathcal{R}\cap\mathcal{S}\right|=(72-10-10)^n=52^n$

Bringing those terms together, we have:

$\Sigma_\textrm{two}=52^n+4\cdot 36^n+20^n$.

Finally, the three-way intersections, which lack three of the requisite character types. These are the elements in the sum

$\Sigma_\textrm{three}=\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\right|+\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{S}\right|+\left|\mathcal{P}\cap\mathcal{R}\cap\mathcal{S}\right|+\left|\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|$.

Going through each:

$\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}$ excludes the 26 uppercase and 26 lowercase letters, and the 10 numbers, so $\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{R}\right|=(72-26-26-10)^n=10^n$
$\mathcal{P}\cap\mathcal{Q}\cap\mathcal{S}$ excludes the 26 uppercase and 26 lowercase letters, and the 10 symbols, so $\left|\mathcal{P}\cap\mathcal{Q}\cap\mathcal{S}\right|=(72-26-26-10)^n=10^n$
$\mathcal{P}\cap\mathcal{R}\cap\mathcal{S}$ excludes the 26 uppercase letters, 10 numbers, and 10 symbols, so $\left|\mathcal{P}\cap\mathcal{R}\cap\mathcal{S}\right|=(72-26-10-10)^n=26^n$
$\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}$ excludes the 26 lowercase letters, 10 numbers, and 10 symbols, so $\left|\mathcal{Q}\cap\mathcal{R}\cap\mathcal{S}\right|=(72-26-10-10)^n=26^n$

Putting together those terms, we get:

$\Sigma_\textrm{three} = 2\cdot 26^n+2\cdot 10^n$.

With those three sums computed, we can return to computing the size of $\mathcal{D}$:

\begin{align*}
\left|\mathcal{D}\right| &= \Sigma_\textrm{one}-\Sigma_\textrm{two}+\Sigma_\textrm{three} \\
&= \left(2\cdot 62^n+2\cdot 46^n\right)-\left(52^n+4\cdot 36^n+20^n\right)+\left(2\cdot 26^n+2\cdot 10^n\right) \\
&= 2\cdot 62^n-52^n+2\cdot 46^n-4\cdot 36^n+2\cdot 26^n-20^n+2\cdot 10^n
\end{align*}

Finally, we can use that size to figure out $\textrm{p}_{\textrm{c}}\left(n\right)$, the number of valid complex passwords:

\begin{align*}
\textrm{p}_{\textrm{c}}\left(n\right) &= \left|\mathcal{U}\right| – \left|\mathcal{D}\right| \\
&= 72^n-\left|\mathcal{D}\right| \\
&= 72^n-\left(2\cdot 62^n-52^n+2\cdot 46^n-4\cdot 36^n+2\cdot 26^n-20^n+2\cdot 10^n\right) \\
&= 72^n-2\cdot 62^n+52^n-2\cdot 46^n+4\cdot 36^n-2\cdot 26^n+20^n-2\cdot 10^n
\end{align*}

for $n \ge 4$.

Comparing Actual Numbers

We now have two equations that describe the number of simple and complex passwords of length $n \ge 4$:

\begin{align*}
\textrm{p}_{\textrm{s}}\left(n\right) &= 72^n \\
\textrm{p}_{\textrm{c}}\left(n\right) &= 72^n-2\cdot 62^n+52^n-2\cdot 46^n+4\cdot 36^n-2\cdot 26^n+20^n-2\cdot 10^n
\end{align*}

It’s surprising, when the password length is short, just how much the complexity requirement reduces the number of available passwords. Let’s consider the number of passwords of length 8:

\begin{align*}
\textrm{p}_{\textrm{s}}\left(8\right) &= 722,\!204,\!136,\!308,\!736 \\
\textrm{p}_{\textrm{c}}\left(8\right) &= 309,\!780,\!614,\!707,\!200
\end{align*}

There are more than twice as many simple passwords of length 8 as there are complex passwords of the same length. When we measure the bits of security provided by a password (the $\log_2$ of how many possible passwords there are), a multiplicative factor of 2 times as many passwords corresponds to 1 additional bit of security. So this difference in the number of simple and complex passwords — a factor of slightly more than 2 — is reflected in a simple password of length 8 offering roughly 1.221 extra bits of security over a complex password:

\begin{align*}
\log_2\left[\textrm{p}_{\textrm{s}}\left(8\right)\right] &\approx 49.359 \\
\log_2\left[\textrm{p}_{\textrm{c}}\left(8\right)\right] &\approx 48.138
\end{align*}

But as the password length n gets larger, $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$ approaches $\log_2\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]$. Let’s return to the simple password lengths that provided 128 and 256 bits of security. With $n=21$, a simple password provides approximately 0.129 more bits of security than a complex one:

\begin{align*}
\log_2\left[\textrm{p}_{\textrm{s}}\left(21\right)\right] &\approx 129.568 \\
\log_2\left[\textrm{p}_{\textrm{c}}\left(21\right)\right] &\approx 129.439
\end{align*}

If we further increase our password length to $n=42$, $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$ continues to approach $\log_2\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]$. Now, the difference in the level of security provided by simple and complex passwords falls to roughly 0.005 bits:

\begin{align*}
\log_2\left[\textrm{p}_{\textrm{s}}\left(42\right)\right] &\approx 259.136 \\
\log_2\left[\textrm{p}_{\textrm{c}}\left(42\right)\right] &\approx 259.131
\end{align*}

Proof of Security-Level Convergence

It seems like $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$ and $\log_2\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]$ converge as n increases, so let’s confirm that they actually do. We’ll show that by demonstrating $\textrm{p}_{\textrm{c}}\left(n\right)$ and $\textrm{p}_{\textrm{s}}\left(n\right)$ themselves converge. Consider the ratio of the two functions as the password length, n, goes to infinity:

\begin{align*}
\lim_{n\to\infty}{\cfrac{\textrm{p}_{\textrm{c}}\left(n\right)}{\textrm{p}_{\textrm{s}}\left(n\right)}}.
\end{align*}

That limit converges to 1 because the dominant term of both polynomial equations is $72^n$. All the remaining terms in $\textrm{p}_{\textrm{c}}\left(n\right)$, when divided by $72^n$, go to 0:

\begin{align*}
\lim_{n\to\infty}{\cfrac{\textrm{p}_{\textrm{c}}\left(n\right)}{\textrm{p}_{\textrm{s}}\left(n\right)}} &= \lim_{n\to\infty}{\left[\cfrac{72^n-2\cdot 62^n+52^n-2\cdot 46^n+4\cdot 36^n-2\cdot 26^n+20^n-2\cdot 10^n}{72^n}\right]} \\
&= \lim_{n\to\infty}{\left[\left(\frac{72}{72}\right)^n-2\cdot\left(\frac{62}{72}\right)^n+\cdots-2\cdot\left(\frac{10}{72}\right)^n\right]} \\
&= \lim_{n\to\infty}{\left[\left(\frac{72}{72}\right)^n\right]}-2\cdot\lim_{n\to\infty}{\left[\left(\frac{62}{72}\right)^n\right]}+\cdots-2\cdot\lim_{n\to\infty}{\left[\left(\frac{10}{72}\right)^n\right]} \\
&= \left(1\right)-2\cdot\left(0\right)+\cdots-2\cdot\left(0\right) \\
&= 1
\end{align*}

Because the ratio of the two functions goes to 1, the two functions $\textrm{p}_{\textrm{c}}\left(n\right)$ and $\textrm{p}_{\textrm{s}}\left(n\right)$ converge. So too, then, do $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$ and $\log_2\left[\textrm{p}_{\textrm{s}}\left(n\right)\right]$.

Meaning of Convergence

Let’s step back and take a broader view of that math. What does it actually tell us, that $\textrm{p}_{\textrm{c}}\left(n\right)$ and $\textrm{p}_{\textrm{s}}\left(n\right)$ converge as n increases? It means that, as our password length increases, the number of complex passwords that can be generated approaches the number of simple passwords that can be generated.

Consider if we were to generate a short simple password, say with length $n=8$. The odds that it has one lowercase letter, one uppercase letter, one number, and one symbol are less that 50% $\left(\textrm{p}_{\textrm{c}}\left(8\right)/\textrm{p}_{\textrm{s}}\left(8\right)\approx 0.429\right)$. Because so few simple passwords are also complex passwords, there are substantially fewer complex passwords; hence, they provide substantially less security.

But, as n gets large, the odds that a randomly generated simple password meets the complexity requirement get higher. If we were to generate a simple password with length $n=21$, the probability that it’s also a complex password is more than 90% $\left(\textrm{p}_{\textrm{c}}\left(21\right)/\textrm{p}_{\textrm{s}}\left(21\right)\approx 0.914\right)$. With length $n=42$, the probability is more than 99% $\left(\textrm{p}_{\textrm{c}}\left(42\right)/\textrm{p}_{\textrm{s}}\left(42\right)\approx 0.996\right)$. Intuitively, if we generate 42 random characters for a password, odds are really good that we’ve generated one of each of the four types of character somewhere in there.

As password length increases, the number of complex passwords approaches the number of simple ones. As that happens, the security levels (the $\log_2$ of the number of passwords) of the two types of password approach each other. What this tells us, ultimately, is that if we use sufficiently long passwords, the mathematical security advantage that simple passwords have over complex passwords disappears.

That said, complexity requirements weren’t introduced to protect users who already use sufficiently long random passwords, like a 42-character simple password. Good security policies also have to protect the well-intentioned user who just wants to use their dog’s name as their password, so they can get back to whatever actual work they were doing before they had to fuss with passwords. With that in mind, and considering everything you’ve read about the mathematics of complex passwords, what are your thoughts: do complexity requirements actually contribute anything to user security? There’s probably a great survey paper in that question, if you were looking for a paper topic in a computer security course!

Practical Exercise

The utility bc(1) is an arbitrary-precision calculator on the UNIX command line, and you can use it to compute $\textrm{p}_{\textrm{c}}\left(n\right)$ and $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$ for large values of n. As an exercise, define a function in bc that computes $\textrm{p}_{\textrm{c}}\left(n\right)$. The “define” keyword (used to define functions) is described in the man page, and the body of this function can be written in one line. Once you’ve done that, you can use the built-in l2() function to compute $\log_2\left[\textrm{p}_{\textrm{c}}\left(n\right)\right]$.

Note: you’ll have to launch bc with the -l flag in order to access the logarithm functions like l2() for computing $\log_2$:

$ bc -l

I encourage all students to try out bc and run a few calculations with it. It’s one more quick utility you can add your UNIX toolkit.

For more discussions, and to arrange for personalized tutoring for yourself or your study group, check out Vancouver Computer Science Tutoring.