Who remembers the Mills number?

One of the undetermined numbers in Les nombres remarquables is the Mills number (or numbers; this is not uniquely defined, as will be clear from the description below). I had somehow forgotten all about it, although I have now the memory that it was quite popular in the olden days (at least, I seem to remember that it cropped up in every other conversation back when I was reading that book 20 or more years ago), and I had not heard anything about it for about that long.

So, the Mills number is that (or any of the) amazing real number A>1 with the property that

\lfloor A^{3^n}\rfloor

is a prime number for every positive integer n.

As one can expect, the doubly-exponential growth means that it would be pointless to try to use this to produce prime numbers. And one may guess that the proof of the existence of such a number has little to do with primes, and should apply to many other sequences of positive integers.

This is indeed so, but not in a completely trivial manner. More precisely, what the proof shows is that, given an infinite subset S of integers, and a real number c>1, one can find B, depending of course on the set and on c, such that

\lfloor B^{c^n}\rfloor \in S\text{ for all } n\geq 1,

provided the set S has the property that, for some real number


and all large enough x, the intersection

[x,x+x^{\theta}]\cap S

is not empty. In other words, since θ<1, there must be some element of the set in all “short” intervals (from some point on), where “short” has the usual meaning in analytic number theory: the length is a power less than 1 of the left-hand extremity.

(Note that the relation between c and θ shows that, if we know a suitable value of θ for S, then we can always find a value of c that works, always assuming θ<1.)

What about primes, then? Do primes exist in short intervals? The answer is, indeed, yes, and it has been known to be so since the work of Hoheisel in 1930, but this is by no means a triviality! Indeed, if one looks at the problem from too far away, analyzing the number of primes in such an interval with the “explicit formulas” in terms of zeros of the Riemann zeta function, then one gets the impression that one will prove

\pi(x+x^{\theta})-\pi(x)\sim \frac{x^{\theta}}{\log x}

(which is the expected answer, because of the Prime Number Theorem) only for θ>1/2, and only by knowing that

\zeta(s)=0\Rightarrow \mathrm{Re}(s)>\theta

which we know only for θ=1. This means that, from the point of view of immediate consequences of the location of zeros of the zeta function, having primes in short intervals is comparable with having a zero-free strip.

From this point of view, we see that the existence of the Mills number is quite an interesting fact. Moreover, the smaller the value of c one can take, the shorter the intervals we manage to find primes in. The value c=3 which I quoted at the beginning is possible because the current best result about primes in short intervals states that, for x large enough,


contains the “right number” of primes. (In fact, this allows any c>12/5). This result is due to Huxley, and hasn’t been improved since 1972; however, if one wants only the existence of a positive proportion among the right number of primes, Baker and Harman have the record value 0.534 (this was in 1996, and allows c>2.14…).

All the proofs since Hoheisel’s time depend crucially on a way to get around the Riemann Hypothesis known as “density theorems” for the zeros. This is a fairly inconvenient name, since “density” might suggest “lots and lots of zeros everywhere”, whereas the intent and purpose of density theorems is to show that, although there might be zeros off the critical line, or even close to 1 (which is were they would fight against the pole of the Riemann zeta function, which is the White Knight that tries to produce primes, glorious primes), there can not be too many. The precise argument is presented in Chapter 10 of my book with H. Iwaniec. Note that density theorems have many other applications: certain particularly subtle ones for Dirichlet characters (“log-free density theorems”), the first of which was proved by Linnik, are crucial to the known proofs of his marvelous theorem according to which, for some absolute constant C>0, the smallest prime P(q,a) congruent to a modulo q, for a coprime with q, satisfies

P(q,a)\leq q^C.

(The best result here allows you to take any C>5.5 for q large enough, due to Heath-Brown; the Generalized Riemann Hypothesis gives this for any C>2). If this remains too mundane — some people do not like primes in arithmetic progressions –, note that you need similar theorems for cusp forms to give an upper bound of the right order of magnitude for the rank of the Jacobian J0(q) of the modular curve X0(q) for q prime, a result of P. Michel and myself.

Now for the proof of the existence of the Mills number, in the generality of a set S containing elements in short intervals. I won’t give all details, but here’s a sketch:

(1) define b(1) to be the smallest element of S above the point after which all short intervals contain at least one element of S;

(2) define inductively b(n+1) to be such that


(3) show, using the condition

c(\theta-1)> 1,

that if we define

x_n=b(n)^{c^{-n}},\ y_n=(1+b(n))^{c^{-n}},

we then have


and deduce that the limit B of xn exists, and gives the desired general Mills number…

Problems from the archive

(The title of this post is based on WBGO‘s nice late-Sunday show “Jazz from the archive”, which I used to follow when I was at Rutgers; the physical archive in question was that of the Institute of Jazz Studies of Rutgers University; the post itself is partly motivated by seeing Gil Kalai’s examples of his own “early” problems…)

The following problem of classical analysis of functions of one-complex variable has puzzled me for a long time (between 15 and 20 years), though I never really spent a lot of time on it — it has never had anything to do with my “real” research:

Let f,g be entire functions, and assume that for all r>0 we have
What can we say about f and g? Specifically, do there exist two real numbers a and b such that
for all z?

(Actually, the “natural” conclusion to ask would be: does there exist a linear operator T, from the space of entire functions to itself, such that Tf=g and such that


for all entire functions φ; indeed, the function g=Tf satisfies the condition any such “joint isometry” for the vector space of entire functions equipped with the sup-norms on all circles centered at the origin; but I have the impression that it is known that any such operator is of the form stated above).

My guess has been that the answer is Yes, but I must say the evidence is not outstandingly strong; mostly, the analogue is known when one uses L2 norms on the circles, since

\frac{1}{2\pi}\int_0^{2\pi}{|f(re^{it})|^2dt}=\sum_{n\geq 0}{|a_n|^2r^n}

if an are the Taylor coefficients of f. If those coincide with the same values for another entire function g (with coefficients bn) we get by unicity of Taylor expansions that

|a_n|=|b_n|,\ so\ b_n=e^{i\theta_n}a_n

for all n, and the linear operator

\sum_{n\geq 0}{c_nz^n}\mapsto \sum_{n\geq 0}{c_ne^{i\theta_n}z^n}

is of course a joint isometry for the L2 norms on all circles, mapping f to g.

The only result I’ve seen that seems potentially helpful (though I never looked particularly hard; it was mentioned in an obituary notice of the Bulletin of the LMS, I think, that I read rather by chance around 1999) is a result of O. Blumenthal — who is today much better known for his work on Hilbert modular forms, his name surviving in Hilbert-Blumenthal abelian varieties. In this paper from 1907, Blumenthal studies the structure of the sup norms in general, and computes them explicitly for (some) polynomials of degree 2. Although the uniqueness question above is not present in his paper, one can deduce by inspection that it holds for these polynomials at least.

(I wouldn’t be surprised at all if this question had in fact been solved around that period of time, where complex functions were extensively studied in France and Germany in particular; that the few persons to whom I have mentioned it had not heard of it is not particularly surprising since this is one mathematical topic where what was the height of fashion is now become very obscure).

Linear operators which you can write down are continuous

This is a follow-up to a comment I made on Tim Gowers’s post concerning the use of Zorn’s lemma (which I encourage students, in particular, to read if they have not done so yet). The issue was whether it is possible to write down concretely an unbounded (or in other words, not continuous) linear operator on a Banach space. I mentioned that I had been explained a few years ago that this is not in fact possible, in the same sense that it is not possible to “write down” a non-measurable function: indeed, any measurable linear map

T\ :\ U\rightarrow V

where U is a Banach space and V is a normed vector space, is automatically continuous.

Incidentally, I had asked this to my colleague É. Matheron in Bordeaux because the question had arisen while translating W. Appel’s book of mathematics for physicists: in the chapters on unbounded linear operators (which are important in Quantum Mechanics), he had observed that those operators could often be written down, but only in the sense of a partially defined operator, defined on a dense subspace, and we wondered if the dichotomy “either unbounded and not everywhere defined, or everywhere defined, and continuous”, was a real theorem or not. In the sense that measurability is much weaker than the (not well defined) notion of “concretely given”, it is indeed a theorem.

Not only did Matheron tell me of this automatic continuity result, he gave me a copy of a short note of his (“A useful lemma concerning subseries convergence”, Bull. Austral. Math. Soc. 63 (2001), no. 2, 273–277), where this result is proved very quickly, as a consequence of a simple lemma which also implies a number of other well-known facts of functional analysis (the Banach-Steinhaus theorem, Schur’s theorem on the coincidence of weak and norm convergence for series in l1, and a few others). On the other hand, I don’t know who first proved the continuity result (Matheron says it is well-known but gives no reference).

The proof is short enough that I will present it; it is a nice source of exercises for a first course in functional analysis, provided some integration theory has been seen before (which I guess is always the case).

Here is the main lemma, due to Matheron, in a probabilistic rephrasing, and a slightly weaker version:

Main Lemma: Let G be a topological abelian group, and let An be an arbitrary sequence of measurable (Borel) subsets of G, and (gn) a sequence of elements of G. Assume that for every n and every g in G, either g is in An, or g-gn is in An.

Let moreover be given a sequence of independent Bernoulli random variables (Xn), defined on some auxiliary probability space.

Then, under the condition that the series

\sum_{n\geq 1}{X_n g_n}

converges almost surely in G, there exists a subsequence (hn) of (gn) such that

\sum_{n\geq 1}{h_n}

converges and belongs to infinitely many An.

This is probably not easy to assimilate immediately, so let’s give the application to automatic continuity before sketching the proof. First, we recall that Bernoulli random variables are such that


Now, let T be as above, measurable. We argue by contradiction, assuming that T is not continuous. This implies in particular that for all n, T is not bounded on the ball with radius 2-n, so there exists a sequence (xn) in U such that

\|x_n\|<2^{-n},\ \text{and}\ \|T(x_n)\|>n.

We apply the lemma with

G=U,\text{ written additively},\ g_n=-x_n,\ A_n=\{x\in U\ |\ \|T(x)\|>n/2\}.

The sets An are measurable, because T is assumed to be so. The triangle inequality shows that if x is not in An, then


so that x-xn is in An. (This shows where sequences of sets satisfying the condition of the Lemma arise naturally).

Finally, the series formed with the xn is absolutely convergent by construction, so the series “twisted” with Bernoulli coefficients are also absolutely convergent. Hence, all the conditions of the Main Lemma are satisfied, and we can conclude that there is a subsequence (yn) of the (xn) such that

y=\sum_{n\geq 1}{y_n}

exists, and is in An infinitely often; this means that


for infinitely many n. But this is impossible since T is defined everywhere!

Now here is the proof of the lemma. Consider the series

Y=\sum_{n\geq 1}{X_ng_n}

as a random variable, which is defined almost surely by assumption. Note that any value of Y is nothing but a sum of a subseries of the original series with terms gn. Let

B_n=\{Y\in A_n\}

so that the previous observation means that the desired conclusion is certainly implied by the condition

\mathbf{P}(Y\text{ in infinitely many } A_n)>0.

The event to study is

I=\bigcap_{N\geq 1}{C_N}\ with\ C_N=\bigcup_{n\geq N}{B_n}

The sets CN are decreasing, so their probability is the limit of the probability of CN, and each contains (hence has probability at least equal to that of) BN. So if we can show that

\mathbf{P}(B_n)\geq 1/4\ \ \ \ \ \ \ \ \ \ \ \ \ (*)

(or any other positive constant) for all n, we will get

\mathbf{P}(C_N)\geq 1/4,\ \text{and hence}\ \mathbf{P}(I)\geq 1/4>0,

which gives the desired result. (In other words, we argue from a particularly simple case of the “difficult” direction in the Borel-Cantelli lemma).

Now let’s prove (*). We start with the identity

\{\sum_{m}{X_mg_m}\in g_n+A_n\ and\ X_n=1\}=\{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}

(for any n), which is a tautology. From the yet-unused assumption

A_n\cup (g_n+A_n)=G,

we then conclude that

\{X_n=1\}\subset \{\sum_{m}{X_mg_m}\in A_n\}\cup \{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}=B_n\cup S_n,

say. Therefore

1/2=\mathbf{P}(X_n=1)\leq \mathbf{P}(B_n)+\mathbf{P}(S_n).

But we claim that


Indeed, consider the random variables defined by

Z_m=X_m\ if\ m\not=n,\ \ \ Z_n=1-X_n

Then we obtain

S_n=\{\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0\}

but clearly the sequence (Zm) is also a sequence of independent Bernoulli random variables, so that

\mathbf{P}(\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0)=\mathbf{P}(\sum_{m}{X_mg_m}\in A_n\ and\ X_n=0)\leq\mathbf{P}(Y\in A_n)=\mathbf{P}(B_n)

as desired. We are now done, since we have found that

1/2\leq 2\mathbf{P}(B_n)

which is (*).

(In probabilistic terms, I think the trick of using Zm has something to do with “exchangeable pairs”, but I’m not entirely sure; in analytic terms, it translates to an instance of the invariance of Haar measure by translation on the compact group (Z/2Z)N, as can be seen in the original write-up of Matheron).

More on the Chebotarev invariant (and the field with 6 elements)

Continuing the series of previous posts on what I called the Chebotarev invariant, here are some (easy) results on cyclic groups, which are certainly the simplest groups to consider!

Starting from the formula for c(G)

c(G)\ \ \ \ \ =\ \ \ \ \sum_{\emptyset\not=I\subset \max(G)}{\frac{(-1)^{|I|+1}}{1-\nu(\cap_{H\in I}{H^*})}}

we obtain by a straightforward translation between maximal subgroups of  Z/nZ and prime divisors of n (the subgroup corresponding to some p|n being given by pZ/nZ), the formula

c(\mathbf{Z}/n\mathbf{Z})=-\sum_{2\leq d\mid n}{ \frac{\mu(d)}{1-d^{-1}}},

which is not without some interest — it is both familiar-looking (a sum over divisors with a Möbius function involved –, yet subtly different — the factor 1/(1-1/d) which is not multiplicative in d.

To go further, the natural step is to expand this last term in geometric series, since each of the terms dk which then appear is multiplicative (this is the analogue of obtaining directly the formula explained in T. Tao’s comment on one of the earlier posts on combinatorial cancellation); exchanging the resulting sums and transforming the inner sums using multiplicativity, we get

c(\mathbf{Z}/n\mathbf{Z})=\sum_{k\geq 0}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

which is rewritten most conveniently by isolating the first two terms:

c(\mathbf{Z}/n\mathbf{Z})=2-\frac{\varphi(n)}{n}+\sum_{k\geq 2}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

and the point is that a trivial estimate follows

1\leq c(\mathbf{Z}/n\mathbf{Z})\leq 2+\sum_{k\geq 2}{\Bigl(1-\frac{1}{\zeta(k)}\Bigr)

where the last series is convergent.

In fact, this last series, plus 1 (which can be considered to be the natural term k=1 since the Riemann zeta function has a pole at s=1), is a “known” constant, called the Niven constant N: its defining property is that it is the average value of the maximal power of a prime dividing n:

N=\lim_{X\rightarrow+\infty}{\frac{1}{X}\sum_{1\leq n\leq X}{\alpha(n)}


\alpha(n)=\max\{r\ |\ p^r\ divides\ n\ for\ some\ prime\ p\}

It is more or less obvious that the upper estimate above is best possible: if we consider n given by the product of the first k primes, and let k tend to infinity, we get a sequence of integers where c(Z/nZ) converges to 1+N.

The appearance of N here is, a posteriori at least, easy to explain: it is because


is both the probability that a “random” vector in Zk not be primitive (i.e., its coordinates are not coprime in their totality; this is what lies behind our computation), and the probability that a “random” integer is not k-power free (i.e, it is divisible by a k-th power; this is why N appears in the average of α(n)).

The numerical value of N is very easy to compute (using Pari for instance):


(and to show how mathematical knowledge has degenerated, I did not recognize the value N from a well-rounded knowledge of analytic number theory, but by searching for 0.70521, 1.70521 and 2.70521 with Google…)

So, the conclusion is that, for a cyclic group with many subgroups, we may have to get close to 3 elements before we can conclude that we have generators of the whole group. Whether that is surprising or not will of course depend on the reader.

Now, it is natural to go somewhat further if we consider the definition of the Chebotarev invariant as the expectation of a random variable. As is emphasized in any probability theory course, the expectation is very far from sufficient to get a good idea, in general, of a random variable: the basic question is whether its values are rather concentrated around the average, or if instead they tend to spread out a lot.  A much better feeling of the variable (though still allowing a lot of variation among all possible distributions) arises if the variance V (or the second moment M2) is known; recall that

M_2(X)=\mathbf{E}(X^2),\ V(X)=\mathbf{E}(X^2)-\mathbf{E}(X)^2=M_2-\mathbf{E}(X)^2.

One can compute, in principle, the second moment of the waiting time whose expectation is the Chebotarev invariant by a formula similar to the one described above. In the special case of a cyclic group, and assuming I did not make a mistake in the computation (which I have yet to check by consulting probabilists), we get

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\sum_{1<e\mid n}{\frac{\mu(d)\mu(e)}{1-[d,e]^{-1}}\Bigl(\frac{1}{1-d^{-1}}+\frac{1}{1-e^{-1}}-1\Bigr)}}

(where I write c2(G) for the second moment, and where [d,e] is the lcm of d and e).

This sum is not much more difficult to approach than the previous one, and after some computations it reduces to a single divisor sum

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\frac{\mu(d)}{1-d^{-1}}\Bigl(1-\frac{2}{1-d^{-1}}\Bigr)}

from which the estimates

1\leq c_2(\mathbf{Z}/n\mathbf{Z})\leq 4+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}

follow; as before, they are best possible, but this time the new constant

N_2=3+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}=7.7117246805241021285577922472877917633\ldots

does not seem to have appeared before — at least, according to Google.

All this gives a variance, in the case of n with many small prime factors, which is close to


(hence a standard deviation around 1.18, and is therefore not negligible at all compared with the expectation).

Now what about the field with 6 elements? Well, the point is that  both for the Chebotarev invariant and for the secondary one,  the expression as a divisor sum is an inclusion-exclusion (or sieve) formula over the non-trivial subgroups of Z/nZ, where the terms corresponding to each d are the same as if we computed the corresponding invariant for a hypothetical group with d elements having no subgroup except the trivial one and the whole group, such as the additive groups of finite prime fields do. So for n=6, things behave in this respect as if a field with 6 elements existed. (Yes, this is far-fetched…)

Another bad reference, and another exercise

My name turns out to be involved in another case of problematic bibliographic reference, but in contrast with the previous case, I am completely blameless. The book Mathematical Constants turns out to contain, as Added in Press, the statement that the “probability” that two integers belonging to the ring Z[j] (where j is a primiteve root of unity of order 3) are coprime is equal to

\frac{6}{\pi^2 H}


H=\sum_{k\geq 0}{\Bigl(\frac{1}{(3k+1)^2}-\frac{1}{(3k+2)^2}\Bigr)}

(probability is in the usual not-quite-strictly-correct sense of looking at the limit of the proportion in larger and larger balls or boxes centered at 0). For this, a reference is given to a putative unpublished note of mine entitled “Coprimality and squarefreeness within quadratic fields”.

However, this note does not exist, and never did! The only reason for this (somewhat casual) claim is that I had explained how to prove this (it is really an exercise, in line with this one) on the NMBRTHRY mailing list back in 2003.