Ever since Euclid first proved that there are infinitely many prime numbers, mathematicians have found ever more creative ways to prove the same result, and also various stronger theorems that imply it. Dirichlet’s Theorem, for example, states that ifm and n are relatively prime integers, then there are infinitely many prime numbers of the form mk + n for some integer k. It is also known that the sum of the reciprocals of the prime numbers diverges, that the sum

\displaystyle \sum_{\substack{p \leq n \\ p \text{ prime}}} \frac{1}{p} \sim \log(\log(n))

and that the number of prime numbers less than n is asymptotically equal to \displaystyle \frac{n}{\log(n)}. In this blog post, we will continue this proud tradition by proving that there are infinitely many prime numbers which have your phone number somewhere in their digits, and which simultaneously have a prime number of digits.

To do so, we will look at the convergence of two different sums: that of the reciprocals of the primes with a prime number of digits, and that of the reciprocals of the natural numbers which do not contain your phone number amongst their digits. As we shall show, the former diverges, while the latter converges.

Amongst the first examples of divergent series which students are introduced to is the harmonic series

\displaystyle \sum_{n=1}^\infty \frac{1}{n}

This can be shown to diverge by grouping the terms into groups of powers of 2: We have that for n lying between 2^k and 2^{k+1}, the reciprocal of n is at least 2^{-1 -k}. Thus it follows that

\displaystyle \sum_{n=2^k+1}^{2^{k+1}} \frac{1}{n} \geq \sum_{n=2^k+1}^{2^{k+1}} \frac{1}{2^{k+1}} = \frac{2^k}{2^{k+1}} = \frac{1}{2}

From this it follows that

\displaystyle \sum_{n=1}^{2^k} \frac{1}{n} \geq 1 + \frac{k}{2}

and so the sum

\displaystyle \sum_{n=1}^\infty \frac{1}{n}

diverges.

We will use a similar argument to show that a slightly modified sequence of reciprocals converges. We will again group numbers based on their relative sizes, but will show that there are few enough numbers of each size that the series will converge.

We prove the following proposition:

Proposition 1

Let S be any string of (decimal) digits, and consider the set T of natural numbers which do not contain S in their decimal expansion. Then the sum of the reciprocals of the elements of T converges.

To prove this, we consider for each whole number k the sum of the reciprocal of those elements of T which have a number of digits from mk+1 to m(k+1), where m is the number of digits in the string S. The digits of such a natural number can be broken up into k+1 blocks of m consecutive digits. There are 10^m possible strings of m digits, and each of the blocks of digits could potentially be any one of these 10^m strings, except for S. Thus there are at most 10^m - 1 options for each block of digits, and so at most (10^m - 1)^{k+1} elements of T which have from mk+1 to m(k+1) digits.

A natural number with at least mk+1 digits is greater than or equal to 10^{mk}, and so its reciprocal is at most 10^{-mk}. Thus if we consider the sum of the reciprocals of the elements of T which have from mk+1 to m(k+1) digits, we see that it is bounded above by

\displaystyle \frac{(10^m - 1)^{k+1}}{10^{mk}} = (10^m - 1) \cdot \left(\frac{10^m - 1}{10^m}\right)^k

The sum of the reciprocals of all of the elements of T is then bounded above by the geometric series

\displaystyle \sum_{k=0}^\infty (10^m - 1) \cdot \left(\frac{10^m - 1}{10^m}\right)^k

which converges to 10^{m} (10^m - 1).

\square

It is perhaps surprising that we can make the harmonic series converge merely by removing the reciprocals of those numbers which do not contain some string of digits S. At least as surprising is that if we were instead to remove the reciprocals of all of the numbers which are not prime, then the resulting series would diverge. As stated at the start of the post, it is even true that the sum of the reciprocals of the prime numbers with a prime number of digits diverges. We will in fact prove an even stronger result. To do so, we will make use of the Prime Number Theorem which was proven independently by Hadamard and by de la Vallée-Poussin in 1896.

Prime Number Theorem

Let \pi(n) be the number of prime numbers which are less than n. Then

\displaystyle \pi(n) \sim \frac{n}{\log(n)}

That is to say

\displaystyle \lim_{n \to \infty} \pi(n) \Big\slash \frac{n}{\log(n)} = 1

This allows us to prove the following proposition:

Proposition 2

Let S_0 be the set of natural numbers, and for each n, let S_{n+1} be the set of prime numbers which have a number of digits which is an element of S_n. Then for each n, the sum of the reciprocals of the elements of S_n diverges.

The proof is by induction on the number n.

The proposition holds for n = 0, since the sum of the reciprocals of the elements of S_0 is just the harmonic series, which we earlier showed to diverge. Now suppose that for some number n, the sum of the reciprocals of the elements of S_n diverges. We will show that the same is true for n+1.

By the Prime Number Theorem, there is some natural number N such that for all n \geq N, we have that

\displaystyle \frac{1}{2} \cdot \frac{n}{\log(n)} \leq \pi(n) \leq \frac{3}{2} \cdot \frac{n}{\log(n)}

 Consider those prime numbers with m digits. The number of such prime numbers is \pi(10^m) - \pi(10^{m-1}). Provided that 10^{m-1} \geq N, we see that this is equal to at least

\displaystyle \frac{1}{2} \cdot \frac{10^m}{m \log(10)} - \frac{3}{2} \cdot \frac{10^{m-1}}{(m-1) \log(10)}

\displaystyle = \frac{10^{m-1}}{2\log(10)} \cdot \frac{10(m-1) - 3m}{m(m-1)}

Such a prime number is at most 10^m, and so its reciprocal is at least 10^{-m}. It follows that the sum of the reciprocals of all such prime numbers is equal to at least

\displaystyle \frac{1}{20 \log(10)} \cdot \frac{7m - 10}{m(m-1)}

For m \geq 4, we have that 7m - 10 \geq 6(m-1), and so this is bounded below by

\displaystyle \frac{3}{10 \log(10)} \cdot \frac{1}{m}

Thus provided that 10^{m-1} \geq N, and m \geq 4, we have that the sum of the reciprocals of the primes with m digits is at least \frac{c}{m}, where c is the constant

\displaystyle c = \frac{3}{10 \log(10)}

The elements of S_{n+1} have a number of digits which is in S_n, and so it follows that the sum of the reciprocals of the elements of S_{n+1} is bounded below by

\displaystyle \sum_{\substack{m \in S_n\\ 10^{m-1} \geq 1000N}} \frac{c}{m}

which diverges by the inductive hypothesis. Thus the sum of the reciprocals of the elements of S_{n+1} diverges, and the result is proven.

\square

It now follows easily that there are infinitely many prime numbers which contain your phone number amongst their digits, and which simultaneously have a prime number of digits.

Suppose that this were not the case. Let P be the set of prime numbers with a prime number of digits which do not contain your phone number amongst their digits. By taking the string S to be the digits of your phone number in the first proposition, we see that the sum of the reciprocals of the elements of P converges. But P contains all but finitely many of the elements of the set S_2 from the second proposition, and so the sum of the reciprocals of the elements of P diverges, a contradiction.

Of course, there is nothing special about decimal notation. It should be clear that the results above hold for any base.

In the special case in which the string of digits S is taken to be the string ``9", the sequence described in the first proposition above is known as the Kempner Series. The second proposition was brought to my attention by this thread on reddit.

How clear is this post?