Archive for July, 2008
Finding life beyond the Central Limit Theorem
One of my son’s favorite nature documentary explains how deep-sea explorers have found there, within the last few decades, forms of life which are completely different from anything seen on the surface of the earth, or at more accessible depths in the ocean.
Now, in this spirit of adventure, I will discuss a recent preprint of J. Jacod, A. Nikeghbali and myself, where we try to find meaningful behavior in probabilistic phenomenon “beyond the standard limit theorems” — in particular, beyond the Central Limit Theorem. Moreover (and this is where, as a number-theorist, I find it most interesting) we find traces of these forms of life in a number of arithmetic situations. One was expected: it involves the analogy between Random Matrices and the Riemann zeta function, and it was the basic motivation behind this work (already explored previously in a slightly different way by A. Nikeghbali and M. Yor); the second was also natural: it is the finite field analogue of the previous one, where the Katz-Sarnak philosophy and techniques can be applied; and the third one was a little more unexpected: it lies in the context of the Erdös-Kác Theorem on the approximate normal distribution of the number of prime divisors of integers.
As often happens in mathematics, this example which we noticed last is rather simpler to introduce and explain than the others, so I’ll start with it. The Erdös-Kác Theorem states that for any real numbers a and b, with a<b we have
where ω(n) is the number of distinct prime divisors of n. This is phrased, informally, as saying that ω(n) is asymptotically distributed like a normal random variable with mean and variance both equal to (log log n), but more properly it is really the convergence in law of a normalized version
of ω(n) to a standard normal random variable, with mean 0 and variance 1. As such, this looks similar to the Central Limit Theorem, and there are indeed proofs which proceed explicitly using this idea (see this for instance).
Our main point is that, in this example and in others, one can extract meaningful behavior without doing the renormalization. And this is where a new type of convergence is necessary, since ω(n) in itself does not have a limiting distribution (when looked in the same sense as implicit above: taking n< N with N going to infinity).
This new type of convergence is expressed in a form involving the characteristic function (i.e., Fourier transform) of random variables. For the special case of ω(n), it is the following formula: there exists a continuous function Φ(u), defined for all real numbers u, with Φ(0)=1, and real numbers λN such that the limit
exists for all u, and is uniform on compact subsets. In fact, this is proved with an explicit formula for the limiting function and with
From this result, an easy argument with normalization shows that the normalized characteristic functions
do converge to that of a standard normal random variable, and this recovers exactly the Erdös-Kác theorem, using the characterization of convergence in law by means of characteristic functions (P. Lévy’s theorem).
The first factor of the left-hand side of the expression (◊) is the inverse of the characteristic function of a Poisson random variable with parameter λN, in other words of a random variable XN such that
for non-negative integers k. So, moving heuristically this factor to the other side, we have approximately
and using the basic properties of characteristic functions, things behave as if ω(n), for n<N, was a random variable which is the sum of the Poisson variable XN with parameter λN (which tends to infinity) and some (independent) term which has a limit as N goes to infinity. However, this is not literally true, because Φ(u) is not a characteristic function of a random variable. But because of this intuition, we say that there is mod-Poisson convergence, i.e., there is convergence, intuitively, modulo addition of Poisson random variables. (Note that this is not literally correct also because the set of Poisson random variables defined on a probability space is not a vector space, so the “modulo” has to be taken with a grain of salt).
This example gives the paradigm of what we may want to look at: starting with limiting theorem (in distribution) which involve some random variables (Yn) of interest, after normalization
to have mean 0 and variance 1, we wonder whether a new stronger limiting theorem exists, where the normalization is avoided, but where the characteristic function of Yn is not considered “bare” but is divided by other suitable characteristic functions (which compensate for the possibly increasing variance of the original random variables in a different way than by looking at the standard normalization Zn).
In the case of the number of prime divisors, those were of Poisson type. But in fact, the first case where this behavior was explicitly noticed involved Gaussian random variables (though it is probable that, historically, the case of ω(n) was the first one: the computation of the limit above, but not its interpretation, is already contained in the proof of the Erdös-Kác theorem due to Rényi and Turán; nowadays, it is most easily obtained by a direct application of the Delange-Selberg method).
This other example is a reinterpretation of a famous formula due to J. Keating and N. Snaith, in the setting of Random Matrix theory: let Yn be a random variable distributed like
where A is a random unitary matrix distributed according to Haar measure on the compact unitary group U(n) of size n; then, it follows from the work of Keating-Snaith that the following formula holds for all real u:
where G is the Barnes (double gamma) function. Here, the first factor on the left-hand side is the inverse of the characteristic function of a normal random variable with mean 0 and variance 2 log n, and so we speak of mod-Gaussian convergence. The variance is increasing, so the heuristic idea is that Yn is distributed like a Gaussian, which is more and more widely spread out, plus some independent converging “core”.
There are by now many proofs of this formula, often stated in terms of the Mellin transform instead of the characteristic function, and again without the interpretation we give; see for instance the probabilistic proof of Bourgade, Hughes, Nikeghbali and Yor.
It is not yet entirely clear what is the meaning and generality of the type of limiting phenomenon found in those two examples. One can’t help noticing (and being encouraged by) the richness of the environment where it is found: in addition to the Random Matrix setting, the limiting function in the Keating and Snaith formula is one of the conjectural terms for moments of the Riemann zeta function on the critical line. In our preprint, we also prove quite general versions involving the equally rich and classical probabilistic setting of infinitely divisible distribution: I won’t say more because for this part I am much less competent than my co-authors.
We feel that there is a lot to be done to understand and explain these limit theorems, and hopefully that this form of convergence can also be directly useful to derive new results, in particular in arithmetic.
The Chebotarev invariant of W(E_8)
A quick follow-up to the previous post: D. Zywina has found the value of the Chebotarev invariant for the Weyl group of E8, which is
So you would typically expect to need to look at a bit more than four Frobenius conjugacy classe, on average, to check that a Galois extension with group a subgroup of W(E8) is not smaller. As we managed to do it with only two with F. Jouve in our particular case, this means that we were indeed somewhat lucky.
(Thanks to Nguyen Ngoc Dong Quan for also proposing to do the computation with Magma).
The Chebotarev invariant, or Wait-and-See
So here’s the promised explanation for the sums I described in an earlier post about combinatorial cancellation. Those are particular cases of a formula for the evaluation of an invariant of a finite group G, in the case where G is Fpk. The definition of this invariant, in turn, is motivated by the approach to computing the Galois group of the splitting field of a polynomial which is based on finding some a-priori upper bound G, and then computing the Frobenius automorphisms modulo successive primes until one obtains this way enough conjugacy classes represented by the Galois group to conclude that it must coincide with G (of course, this can only work if the upper-bound was a correct guess…)
This method is not particularly convenient in general but it may be quite useful for theoretical purposes (it combines well with sieve methods, for instance, as first exploited by Gallagher), and also for particular cases where the computations turn out to work very well. F. Jouve, D. Zywina and myself succeeded in using it to construct our surf-shaped polynomial with Galois group W(E8), as I described earlier. For this, we only needed to look at two primes, and at that time we were somewhat surprised that it should have worked so well. The question is then: is this a justified surprise, or is it simply that it is intrinsically “easy” to check that a subgroup of W(E8) is the whole group by this method?
To analyze this, one is led rather naturally to investigate the properties of the following probabilistic model: the finite group G being given and fixed, consider a sequence (Xn) of independent random variables, with values in G, each uniformly distributed. Then define the “waiting time”
where the superscript * indicates that we look at the conjugacy classes, and where a set C of conjugacy classes generates G if and only if there is no proper subgroup H of G which intersects each class in C.
(Note that this notion requires a little care: it is not even entirely obvious that the set of all conjugacy classes generates G, and indeed this is false for certain infinite groups, like SU(n), where every matrix can be diagonalized, i.e., the subgroup of diagonal matrices intersects every conjugacy class; but it is well-known to be the case for a finite group, by an easy counting argument).
The waiting time is somewhat unwieldy: it is a random variable defined on an unspecified probability space. To get a more concrete invariant, consider its expectation:
This is what I call the “Chebotarev invariant”, because of the arithmetic motivation: one may expect that, for a given Galois extension K/Q, the Frobenius conjugacy classes σp modulo primes are “close”, in some sense, to the probabilistic model of a sequence of conjugacy classes associated to independent random variables as above. Indeed, the Chebotarev density theorem says that, at least, the frequency of apparition is consistent with the Law of Large Numbers:
for any conjugacy class C in G.
So one may hope that c(G) is close, usually, to the number of primes needed to conclude that the Galois group is G and not a proper subgroup of it. (Note however that for an individual extension, trying to quantify how quickly individual conjugacy classes appear as Frobenius conjugacy classes leads immediately to open questions which are intricately linked with the Generalized Riemann Hypothesis, even for the simplest case G=Z/2Z).
In any case, the link with our famous sums is the expression
where max(G) is the set of conjugacy classes of (proper) maximal subgroups in G, ν(C), for any set of conjugacy classes is |C|/|G|, and H* is the set of conjugacy classes which intersect a given (conjugacy class of) maximal subgroup. For the case of Fpk, where conjugacy is trivial, this leads to the sums of the earlier post. This formula is not hard to prove: it is some example of inclusion-exclusion and the computation of the expectation of a geometric random variable.
(Probabilistically, it is a non-standard variant of a general “coupon collector problem”, where the types of coupons correspond roughly to the various conjugacy classes of maximal subgroups, and what is slightly unusual is that a single coupon, i.e., a single sample Xi, may succeed in giving many different types of coupons, namely all classes of maximal subgroups which do not intersect the conjugacy class of Xi).
Now, here I would have liked to give the value of c(W(E8)), and checked if it was or not significantly larger than 2, but unfortunately, GAP seems not to be up to the job on my laptop, and I don’t have access to Magma at the moment (which, I suspect, would be able to do it, since it was able to compute the set of maximal subgroups very quickly, as described in the two posts here and there concerning my paper with Jouve and Zywina).
So this series of posts is not finished, and I hope to come back with this value, and more, in another episode…
The unity of mathematics, preserved
If you were asked to name two domains of mathematics which are as far apart as possible today, so far as to be essentially disconnected and living witnesses of the impossibility of believing in the unity of mathematics, it is quite possible that the answer “Applied statistics” and “Category theory” would spring to mind.
But, lo and behold: two wonderful papers by Guy Romier in the Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, published in 1969, give the missing link between these two estranged cousins (see here and there; thanks are again due to the Numdam project for making the archives of many French mathematics journals available freely). There, the foundations of statistics experiments are established on the firm footing of a whole battery of categories; in particular (Définition 1 in the first paper, page 278), the “experimental structure” of of a statistical problem is defined as a subcategory of one of those basic categories, satisfying two axioms, one of which involves projective limits of some kind.
(I must say that I rather wonder how such papers were received in the statistics community; of course, my knowledge of statistics is essentially inexistent, my excellent probability teacher having decided to devote as much time as possible in his course to Brownian motion, but somehow I don’t think this approach has stuck…).
Combinatorial cancellation
Here is the example of sums exhibiting a large amount of combinatorial cancellation that I mentioned in a comment on my previous post. First, some notation: p is a prime number, k is a positive integer, and G is the set of hyperplanes in the vector space Fpk of dimension k over the finite field with p elements. Then consider
where, for any subset I of G, we let
(the dimension of the intersection of the hyperplanes in I).
The terms of this sum are of absolute value of size roughly 1, if p is large (since I is non-empty, the intersection is contained in any of the hyperplanes in I, so j(I)<k), but there are many of them, namely
so one may suspect that the overall sum is quite large.
And yet, it turns out that, for fixed k and p going to infinity, the sum converges to k. My explanation for this is rather indirect, and I will explain it in another post soon, but maybe some readers can see a direct reason?