More on the Chebotarev invariant (and the field with 6 elements)

Continuing the series of previous posts on what I called the Chebotarev invariant, here are some (easy) results on cyclic groups, which are certainly the simplest groups to consider!

Starting from the formula for c(G)

c(G)\ \ \ \ \ =\ \ \ \ \sum_{\emptyset\not=I\subset \max(G)}{\frac{(-1)^{|I|+1}}{1-\nu(\cap_{H\in I}{H^*})}}

we obtain by a straightforward translation between maximal subgroups of  Z/nZ and prime divisors of n (the subgroup corresponding to some p|n being given by pZ/nZ), the formula

c(\mathbf{Z}/n\mathbf{Z})=-\sum_{2\leq d\mid n}{ \frac{\mu(d)}{1-d^{-1}}},

which is not without some interest — it is both familiar-looking (a sum over divisors with a Möbius function involved –, yet subtly different — the factor 1/(1-1/d) which is not multiplicative in d.

To go further, the natural step is to expand this last term in geometric series, since each of the terms dk which then appear is multiplicative (this is the analogue of obtaining directly the formula explained in T. Tao’s comment on one of the earlier posts on combinatorial cancellation); exchanging the resulting sums and transforming the inner sums using multiplicativity, we get

c(\mathbf{Z}/n\mathbf{Z})=\sum_{k\geq 0}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

which is rewritten most conveniently by isolating the first two terms:

c(\mathbf{Z}/n\mathbf{Z})=2-\frac{\varphi(n)}{n}+\sum_{k\geq 2}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

and the point is that a trivial estimate follows

1\leq c(\mathbf{Z}/n\mathbf{Z})\leq 2+\sum_{k\geq 2}{\Bigl(1-\frac{1}{\zeta(k)}\Bigr)

where the last series is convergent.

In fact, this last series, plus 1 (which can be considered to be the natural term k=1 since the Riemann zeta function has a pole at s=1), is a “known” constant, called the Niven constant N: its defining property is that it is the average value of the maximal power of a prime dividing n:

N=\lim_{X\rightarrow+\infty}{\frac{1}{X}\sum_{1\leq n\leq X}{\alpha(n)}

where

\alpha(n)=\max\{r\ |\ p^r\ divides\ n\ for\ some\ prime\ p\}

It is more or less obvious that the upper estimate above is best possible: if we consider n given by the product of the first k primes, and let k tend to infinity, we get a sequence of integers where c(Z/nZ) converges to 1+N.

The appearance of N here is, a posteriori at least, easy to explain: it is because

1-\frac{1}{\zeta(k)}

is both the probability that a “random” vector in Zk not be primitive (i.e., its coordinates are not coprime in their totality; this is what lies behind our computation), and the probability that a “random” integer is not k-power free (i.e, it is divisible by a k-th power; this is why N appears in the average of α(n)).

The numerical value of N is very easy to compute (using Pari for instance):

N=1.7052111401053677642885514534345081585\ldots

(and to show how mathematical knowledge has degenerated, I did not recognize the value N from a well-rounded knowledge of analytic number theory, but by searching for 0.70521, 1.70521 and 2.70521 with Google…)

So, the conclusion is that, for a cyclic group with many subgroups, we may have to get close to 3 elements before we can conclude that we have generators of the whole group. Whether that is surprising or not will of course depend on the reader.

Now, it is natural to go somewhat further if we consider the definition of the Chebotarev invariant as the expectation of a random variable. As is emphasized in any probability theory course, the expectation is very far from sufficient to get a good idea, in general, of a random variable: the basic question is whether its values are rather concentrated around the average, or if instead they tend to spread out a lot.  A much better feeling of the variable (though still allowing a lot of variation among all possible distributions) arises if the variance V (or the second moment M2) is known; recall that

M_2(X)=\mathbf{E}(X^2),\ V(X)=\mathbf{E}(X^2)-\mathbf{E}(X)^2=M_2-\mathbf{E}(X)^2.

One can compute, in principle, the second moment of the waiting time whose expectation is the Chebotarev invariant by a formula similar to the one described above. In the special case of a cyclic group, and assuming I did not make a mistake in the computation (which I have yet to check by consulting probabilists), we get

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\sum_{1<e\mid n}{\frac{\mu(d)\mu(e)}{1-[d,e]^{-1}}\Bigl(\frac{1}{1-d^{-1}}+\frac{1}{1-e^{-1}}-1\Bigr)}}

(where I write c2(G) for the second moment, and where [d,e] is the lcm of d and e).

This sum is not much more difficult to approach than the previous one, and after some computations it reduces to a single divisor sum

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\frac{\mu(d)}{1-d^{-1}}\Bigl(1-\frac{2}{1-d^{-1}}\Bigr)}

from which the estimates

1\leq c_2(\mathbf{Z}/n\mathbf{Z})\leq 4+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}

follow; as before, they are best possible, but this time the new constant

N_2=3+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}=7.7117246805241021285577922472877917633\ldots

does not seem to have appeared before — at least, according to Google.

All this gives a variance, in the case of n with many small prime factors, which is close to

(1+N_2)-(1+N)^2=1.3935573679739184290461227489481785625\ldots

(hence a standard deviation around 1.18, and is therefore not negligible at all compared with the expectation).

Now what about the field with 6 elements? Well, the point is that  both for the Chebotarev invariant and for the secondary one,  the expression as a divisor sum is an inclusion-exclusion (or sieve) formula over the non-trivial subgroups of Z/nZ, where the terms corresponding to each d are the same as if we computed the corresponding invariant for a hypothetical group with d elements having no subgroup except the trivial one and the whole group, such as the additive groups of finite prime fields do. So for n=6, things behave in this respect as if a field with 6 elements existed. (Yes, this is far-fetched…)

Finding life beyond the Central Limit Theorem

One of my son’s favorite nature documentary explains how deep-sea explorers have found there, within the last few decades, forms of life which are completely different from anything seen on the surface of the earth, or at more accessible depths in the ocean.

Now, in this spirit of adventure, I will discuss a recent preprint of J. Jacod, A. Nikeghbali and myself, where we try to find meaningful behavior in probabilistic phenomenon “beyond the standard limit theorems” — in particular, beyond the Central Limit Theorem. Moreover (and this is where, as a number-theorist, I find it most interesting) we find traces of these forms of life in a number of arithmetic situations. One was expected: it involves the analogy between Random Matrices and the Riemann zeta function, and it was the basic motivation behind this work (already explored previously in a slightly different way by A. Nikeghbali and M. Yor); the second was also natural: it is the finite field analogue of the previous one, where the Katz-Sarnak philosophy and techniques can be applied; and the third one was a little more unexpected: it lies in the context of the Erdös-Kác Theorem on the approximate normal distribution of the number of prime divisors of integers.

As often happens in mathematics, this example which we noticed last is rather simpler to introduce and explain than the others, so I’ll start with it. The Erdös-Kác Theorem states that for any real numbers a and b, with a<b we have

\lim_{N\rightarrow +\infty}\ \frac{1}{N}|\{n\leq N\ |\ a<(\omega(n)-\log\log N)/\sqrt{\log\log N}<b \}|=\frac{1}{\sqrt{2\pi}}\int_a^b{e^{-t^2/2}dt,

where ω(n) is the number of distinct prime divisors of n. This is phrased, informally, as saying that ω(n) is asymptotically distributed like a normal random variable with mean and variance both equal to (log log n), but more properly it is really the convergence in law of a normalized version

\frac{\omega(n)-\log\log N}{\sqrt{\log\log N}}\ \ \ \ for\ n\leq N

of ω(n) to a standard normal random variable, with mean 0 and variance 1. As such, this looks similar to the Central Limit Theorem, and there are indeed proofs which proceed explicitly using this idea (see this for instance).

Our main point is that, in this example and in others, one can extract meaningful behavior without doing the renormalization. And this is where a new type of convergence is necessary, since ω(n) in itself does not have a limiting distribution (when looked in the same sense as implicit above: taking n< N with N going to infinity).

This new type of convergence is expressed in a form involving the characteristic function (i.e., Fourier transform) of random variables. For the special case of ω(n), it is the following formula: there exists a continuous function Φ(u), defined for all real numbers u, with Φ(0)=1, and real numbers λN such that the limit

(\diamond)\ \ \ \ \ \ \ \lim_{N\rightarrow +\infty}\ \ \exp(\lambda_N(1-e^{iu}))\ \ \frac{1}{N}\sum_{n\leq N}{\exp(iu\omega(n))}=\Phi(u)

exists for all u, and is uniform on compact subsets. In fact, this is proved with an explicit formula for the limiting function and with

\lambda_N=\log\log N.

From this result, an easy argument with normalization shows that the normalized characteristic functions

\frac{1}{N}\sum_{n\leq N}{\exp(iu(\omega(n)-\log\log N)/\sqrt{\log\log N})}

do converge to that of a standard normal random variable, and this recovers exactly the Erdös-Kác theorem, using the characterization of convergence in law by means of characteristic functions (P. Lévy’s theorem).

The first factor of the left-hand side of the expression (◊) is the inverse of the characteristic function of a Poisson random variable with parameter λN, in other words of a random variable XN such that

\mathbf{P}(X_N=k)=e^{-\lambda_N}\frac{\lambda_N^k}{k!}

for non-negative integers k. So, moving heuristically this factor to the other side, we have approximately

\frac{1}{N}\sum_{n\leq N}{\exp(iu\omega(n))}\simeq \exp(\lambda_N(e^{iu}-1)) \Phi(u)=\mathbf{E}(e^{iuX_N})\Phi(u)

and using the basic properties of characteristic functions, things behave as if ω(n), for n<N, was a random variable which is the sum of the Poisson variable XN with parameter λN (which tends to infinity) and some (independent) term which has a limit as N goes to infinity. However, this is not literally true, because Φ(u) is not a characteristic function of a random variable. But because of this intuition, we say that there is mod-Poisson convergence, i.e., there is convergence, intuitively, modulo addition of Poisson random variables. (Note that this is not literally correct also because the set of Poisson random variables defined on a probability space is not a vector space, so the “modulo” has to be taken with a grain of salt).

This example gives the paradigm of what we may want to look at: starting with limiting theorem (in distribution) which involve some random variables (Yn) of interest, after normalization

Z_n=\frac{Y_n-\mathbf{E}(Y_n)}{\sqrt{\mathbf{V}(Y_n)}}

to have mean 0 and variance 1, we wonder whether a new stronger limiting theorem exists, where the normalization is avoided, but where the characteristic function of Yn is not considered “bare” but is divided by other suitable characteristic functions (which compensate for the possibly increasing variance of the original random variables in a different way than by looking at the standard normalization Zn).

In the case of the number of prime divisors, those were of Poisson type. But in fact, the first case where this behavior was explicitly noticed involved Gaussian random variables (though it is probable that, historically, the case of ω(n) was the first one: the computation of the limit above, but not its interpretation, is already contained in the proof of the Erdös-Kác theorem due to Rényi and Turán; nowadays, it is most easily obtained by a direct application of the Delange-Selberg method).

This other example is a reinterpretation of a famous formula due to J. Keating and N. Snaith, in the setting of Random Matrix theory: let Yn be a random variable distributed like

\det(1-A)

where A is a random unitary matrix distributed according to Haar measure on the compact unitary group U(n) of size n; then, it follows from the work of Keating-Snaith that the following formula holds for all real u:

\lim_{n\rightarrow +\infty}\ \ \exp(u^2\log n) \mathbf{E}(\exp(iu\log |Y_n|^2))=\frac{G(1+iu)^2}{G(1+2iu)},

where G is the Barnes (double gamma) function. Here, the first factor on the left-hand side is the inverse of the characteristic function of a normal random variable with mean 0 and variance 2 log n, and so we speak of mod-Gaussian convergence. The variance is increasing, so the heuristic idea is that Yn is distributed like a Gaussian, which is more and more widely spread out, plus some independent converging “core”.

There are by now many proofs of this formula, often stated in terms of the Mellin transform instead of the characteristic function, and again without the interpretation we give; see for instance the probabilistic proof of Bourgade, Hughes, Nikeghbali and Yor.

It is not yet entirely clear what is the meaning and generality of the type of limiting phenomenon found in those two examples. One can’t help noticing (and being encouraged by) the richness of the environment where it is found: in addition to the Random Matrix setting, the limiting function in the Keating and Snaith formula is one of the conjectural terms for moments of the Riemann zeta function on the critical line. In our preprint, we also prove quite general versions involving the equally rich and classical probabilistic setting of infinitely divisible distribution: I won’t say more because for this part I am much less competent than my co-authors.

We feel that there is a lot to be done to understand and explain these limit theorems, and hopefully that this form of convergence can also be directly useful to derive new results, in particular in arithmetic.

The Chebotarev invariant of W(E_8)

A quick follow-up to the previous post: D. Zywina has found the value of the Chebotarev invariant for the Weyl group of E8, which is

c(W(E_8))= 4.19424758989300188767793949869\ldots

So you would typically expect to need to look at a bit more than four Frobenius conjugacy classe, on average, to check that a Galois extension with group a subgroup of W(E8) is not smaller. As we managed to do it with only two with F. Jouve in our particular case, this means that we were indeed somewhat lucky.

(Thanks to Nguyen Ngoc Dong Quan for also proposing to do the computation with Magma).

The Chebotarev invariant, or Wait-and-See

So here’s the promised explanation for the sums I described in an earlier post about combinatorial cancellation. Those are particular cases of a formula for the evaluation of an invariant of a finite group G, in the case where G is Fpk. The definition of this invariant, in turn, is motivated by the approach to computing the Galois group of the splitting field of a polynomial which is based on finding some a-priori upper bound G, and then computing the Frobenius automorphisms modulo successive primes until one obtains this way enough conjugacy classes represented by the Galois group to conclude that it must coincide with G (of course, this can only work if the upper-bound was a correct guess…)

This method is not particularly convenient in general but it may be quite useful for theoretical purposes (it combines well with sieve methods, for instance, as first exploited by Gallagher), and also for particular cases where the computations turn out to work very well. F. Jouve, D. Zywina and myself succeeded in using it to construct our surf-shaped polynomial with Galois group W(E8), as I described earlier. For this, we only needed to look at two primes, and at that time we were somewhat surprised that it should have worked so well. The question is then: is this a justified surprise, or is it simply that it is intrinsically “easy” to check that a subgroup of W(E8) is the whole group by this method?

To analyze this, one is led rather naturally to investigate the properties of the following probabilistic model: the finite group G being given and fixed, consider a sequence (Xn) of independent random variables, with values in G, each uniformly distributed. Then define the “waiting time”

\tau_G=\min\{n\geq 1\ |\ X_1^*,\ldots, X_n^*\ generate\ G\}

where the superscript * indicates that we look at the conjugacy classes, and where a set C of conjugacy classes generates G if and only if there is no proper subgroup H of G which intersects each class in C.

(Note that this notion requires a little care: it is not even entirely obvious that the set of all conjugacy classes generates G, and indeed this is false for certain infinite groups, like SU(n), where every matrix can be diagonalized, i.e., the subgroup of diagonal matrices intersects every conjugacy class; but it is well-known to be the case for a finite group, by an easy counting argument).

The waiting time is somewhat unwieldy: it is a random variable defined on an unspecified probability space. To get a more concrete invariant, consider its expectation:

c(G)=\mathbf{E}(\tau_G)

This is what I call the “Chebotarev invariant”, because of the arithmetic motivation: one may expect that, for a given Galois extension K/Q, the Frobenius conjugacy classes σp modulo primes are “close”, in some sense, to the probabilistic model of a sequence of conjugacy classes associated to independent random variables as above. Indeed, the Chebotarev density theorem says that, at least, the frequency of apparition is consistent with the Law of Large Numbers:

\lim_{X\rightarrow +\infty}\frac{1}{\pi(X)}|\{p\leq X\ |\ \sigma_p=C \}|=\frac{|C|}{|G|}

for any conjugacy class C in G.

So one may hope that c(G) is close, usually, to the number of primes needed to conclude that the Galois group is G and not a proper subgroup of it. (Note however that for an individual extension, trying to quantify how quickly individual conjugacy classes appear as Frobenius conjugacy classes leads immediately to open questions which are intricately linked with the Generalized Riemann Hypothesis, even for the simplest case G=Z/2Z).

In any case, the link with our famous sums is the expression

c(G)\ \ \ \ \ =\ \ \ \ \sum_{\emptyset\not=I\subset \max(G)}{\frac{(-1)^{|I|+1}}{1-\nu(\cap_{H\in I}{H^*})}}

where max(G) is the set of conjugacy classes of (proper) maximal subgroups in G, ν(C), for any set of conjugacy classes is |C|/|G|, and H* is the set of conjugacy classes which intersect a given (conjugacy class of) maximal subgroup. For the case of Fpk, where conjugacy is trivial, this leads to the sums of the earlier post. This formula is not hard to prove: it is some example of inclusion-exclusion and the computation of the expectation of a geometric random variable.

(Probabilistically, it is a non-standard variant of a general “coupon collector problem”, where the types of coupons correspond roughly to the various conjugacy classes of maximal subgroups, and what is slightly unusual is that a single coupon, i.e., a single sample Xi, may succeed in giving many different types of coupons, namely all classes of maximal subgroups which do not intersect the conjugacy class of Xi).

Now, here I would have liked to give the value of c(W(E8)), and checked if it was or not significantly larger than 2, but unfortunately, GAP seems not to be up to the job on my laptop, and I don’t have access to Magma at the moment (which, I suspect, would be able to do it, since it was able to compute the set of maximal subgroups very quickly, as described in the two posts here and there concerning my paper with Jouve and Zywina).

So this series of posts is not finished, and I hope to come back with this value, and more, in another episode…

The unity of mathematics, preserved

If you were asked to name two domains of mathematics which are as far apart as possible today, so far as to be essentially disconnected and living witnesses of the impossibility of believing in the unity of mathematics, it is quite possible that the answer “Applied statistics” and “Category theory” would spring to mind.

But, lo and behold: two wonderful papers by Guy Romier in the Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, published in 1969, give the missing link between these two estranged cousins (see here and there; thanks are again due to the Numdam project for making the archives of many French mathematics journals available freely). There, the foundations of statistics experiments are established on the firm footing of a whole battery of categories; in particular (Définition 1 in the first paper, page 278), the “experimental structure” of of a statistical problem is defined as a subcategory of one of those basic categories, satisfying two axioms, one of which involves projective limits of some kind.

(I must say that I rather wonder how such papers were received in the statistics community; of course, my knowledge of statistics is essentially inexistent, my excellent probability teacher having decided to devote as much time as possible in his course to Brownian motion, but somehow I don’t think this approach has stuck…).