The Chebotarev invariant, or Wait-and-See

So here’s the promised explanation for the sums I described in an earlier post about combinatorial cancellation. Those are particular cases of a formula for the evaluation of an invariant of a finite group G, in the case where G is Fpk. The definition of this invariant, in turn, is motivated by the approach to computing the Galois group of the splitting field of a polynomial which is based on finding some a-priori upper bound G, and then computing the Frobenius automorphisms modulo successive primes until one obtains this way enough conjugacy classes represented by the Galois group to conclude that it must coincide with G (of course, this can only work if the upper-bound was a correct guess…)

This method is not particularly convenient in general but it may be quite useful for theoretical purposes (it combines well with sieve methods, for instance, as first exploited by Gallagher), and also for particular cases where the computations turn out to work very well. F. Jouve, D. Zywina and myself succeeded in using it to construct our surf-shaped polynomial with Galois group W(E8), as I described earlier. For this, we only needed to look at two primes, and at that time we were somewhat surprised that it should have worked so well. The question is then: is this a justified surprise, or is it simply that it is intrinsically “easy” to check that a subgroup of W(E8) is the whole group by this method?

To analyze this, one is led rather naturally to investigate the properties of the following probabilistic model: the finite group G being given and fixed, consider a sequence (Xn) of independent random variables, with values in G, each uniformly distributed. Then define the “waiting time”

\tau_G=\min\{n\geq 1\ |\ X_1^*,\ldots, X_n^*\ generate\ G\}

where the superscript * indicates that we look at the conjugacy classes, and where a set C of conjugacy classes generates G if and only if there is no proper subgroup H of G which intersects each class in C.

(Note that this notion requires a little care: it is not even entirely obvious that the set of all conjugacy classes generates G, and indeed this is false for certain infinite groups, like SU(n), where every matrix can be diagonalized, i.e., the subgroup of diagonal matrices intersects every conjugacy class; but it is well-known to be the case for a finite group, by an easy counting argument).

The waiting time is somewhat unwieldy: it is a random variable defined on an unspecified probability space. To get a more concrete invariant, consider its expectation:

c(G)=\mathbf{E}(\tau_G)

This is what I call the “Chebotarev invariant”, because of the arithmetic motivation: one may expect that, for a given Galois extension K/Q, the Frobenius conjugacy classes σp modulo primes are “close”, in some sense, to the probabilistic model of a sequence of conjugacy classes associated to independent random variables as above. Indeed, the Chebotarev density theorem says that, at least, the frequency of apparition is consistent with the Law of Large Numbers:

\lim_{X\rightarrow +\infty}\frac{1}{\pi(X)}|\{p\leq X\ |\ \sigma_p=C \}|=\frac{|C|}{|G|}

for any conjugacy class C in G.

So one may hope that c(G) is close, usually, to the number of primes needed to conclude that the Galois group is G and not a proper subgroup of it. (Note however that for an individual extension, trying to quantify how quickly individual conjugacy classes appear as Frobenius conjugacy classes leads immediately to open questions which are intricately linked with the Generalized Riemann Hypothesis, even for the simplest case G=Z/2Z).

In any case, the link with our famous sums is the expression

c(G)\ \ \ \ \ =\ \ \ \ \sum_{\emptyset\not=I\subset \max(G)}{\frac{(-1)^{|I|+1}}{1-\nu(\cap_{H\in I}{H^*})}}

where max(G) is the set of conjugacy classes of (proper) maximal subgroups in G, ν(C), for any set of conjugacy classes is |C|/|G|, and H* is the set of conjugacy classes which intersect a given (conjugacy class of) maximal subgroup. For the case of Fpk, where conjugacy is trivial, this leads to the sums of the earlier post. This formula is not hard to prove: it is some example of inclusion-exclusion and the computation of the expectation of a geometric random variable.

(Probabilistically, it is a non-standard variant of a general “coupon collector problem”, where the types of coupons correspond roughly to the various conjugacy classes of maximal subgroups, and what is slightly unusual is that a single coupon, i.e., a single sample Xi, may succeed in giving many different types of coupons, namely all classes of maximal subgroups which do not intersect the conjugacy class of Xi).

Now, here I would have liked to give the value of c(W(E8)), and checked if it was or not significantly larger than 2, but unfortunately, GAP seems not to be up to the job on my laptop, and I don’t have access to Magma at the moment (which, I suspect, would be able to do it, since it was able to compute the set of maximal subgroups very quickly, as described in the two posts here and there concerning my paper with Jouve and Zywina).

So this series of posts is not finished, and I hope to come back with this value, and more, in another episode…

The unity of mathematics, preserved

If you were asked to name two domains of mathematics which are as far apart as possible today, so far as to be essentially disconnected and living witnesses of the impossibility of believing in the unity of mathematics, it is quite possible that the answer “Applied statistics” and “Category theory” would spring to mind.

But, lo and behold: two wonderful papers by Guy Romier in the Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, published in 1969, give the missing link between these two estranged cousins (see here and there; thanks are again due to the Numdam project for making the archives of many French mathematics journals available freely). There, the foundations of statistics experiments are established on the firm footing of a whole battery of categories; in particular (Définition 1 in the first paper, page 278), the “experimental structure” of of a statistical problem is defined as a subcategory of one of those basic categories, satisfying two axioms, one of which involves projective limits of some kind.

(I must say that I rather wonder how such papers were received in the statistics community; of course, my knowledge of statistics is essentially inexistent, my excellent probability teacher having decided to devote as much time as possible in his course to Brownian motion, but somehow I don’t think this approach has stuck…).

Combinatorial cancellation

Here is the example of sums exhibiting a large amount of combinatorial cancellation that I mentioned in a comment on my previous post. First, some notation: p is a prime number, k is a positive integer, and G is the set of hyperplanes in the vector space Fpk of dimension k over the finite field with p elements. Then consider

\sum_{\emptyset\not=I\subset \mathbf{G}}{\frac{(-1)^{|I|+1} }{1-p^{j(I)-k}}}

where, for any subset I of G, we let

j(I)=\mathrm{dim} \bigcap_{H\in I}{H}

(the dimension of the intersection of the hyperplanes in I).

The terms of this sum are of absolute value of size roughly 1, if p is large (since I is non-empty, the intersection is contained in any of the hyperplanes in I, so j(I)<k), but there are many of them, namely

2^{|\mathbf{G}|}-1=2^{p^k+\cdots+p+1}-1,

so one may suspect that the overall sum is quite large.

And yet, it turns out that, for fixed k and p going to infinity, the sum converges to k. My explanation for this is rather indirect, and I will explain it in another post soon, but maybe some readers can see a direct reason?

A charming little textbook

Like many others, the library of the mathematics department in Bordeaux used, every once in a while, to put in a bin somewhere duplicate copies of journals for anyone to take home; more rarely books also appeared there. This way, I found one some day which I find particularly fascinating.

It is not a research book, but an old French textbook for the program of the last year of high-school (the “Terminale”), published in 1971. It is part of a series probably encompassing the whole high-school curriculum. It is written by M. Debray, M. Queysanne, and D. Revuz (yes, the one of Revuz-Yor).

More properly, this is only Volume 1 of the Terminale textbook, since there exist two additional ones (which I do not have). The main topics of this first volume are “Nombres, probabilités” (numbers, probability theory), and one can see what the two others treated by looking at the official curriculum for the Terminale, which is reproduced at the beginning of the text: one was for analysis (basic calculus) and the other for geometry.

So the first striking impression is that of sheer size: if the other two volumes are of comparable length, this amounts to about 1000 pages for a single year of study, and this must have been combined with classes in French, Philosophy, Physics and Chemistry, History and Geography and Biology, and at least one foreign language, with Latin as a bougie sur le gâteau for quite a few students (probably; I can’t seem to find the general official program for 1971).

So what did French students know when finishing high-school in 1971? Before going into this, let’s look at what they were already supposed to know when entering the Terminale (as indicated by various Rappels de première in the text). The first chapter is entitled Entiers naturels (integers), which is reasonable enough; Section 1.1 is

Éléments réguliers pour une loi de composition interne. (“Regular elements for an internal operation”).

So an internal operation on a set is a concept taken for granted. Later, we learn that binary relations are just as old hat (hence also equivalence relations and quotient sets), and similarly for vector spaces (over the reals; not necessarily finite-dimensional). We learn that the study of the reals was, in Première, based on the Propriété des segments emboîtés (the fact that a countable decreasing intersection of non-empty compact intervals is non-empty), but it is taken anew now with the axiom that a non-empty set of reals which is bounded above as a least upper bound.

Then here are some highlights of the book:

  • In agreement with the official curriculum, the most important structures (groups, rings, vector spaces, and homomorphisms) are pointed out as frequently as possible whenever they occur;
  • The complex numbers are constructed as a subring of the 2×2 matrices over the reals (the product of such matrices being another remnant of the previous year);
  • The rationals and the integers are reconstructed as subrings of the reals, and shown to satisfy all the properties previously proved about them from the preliminary chapters where they were studied independently;
  • But the construction of Z from the integers, by a good old Grothendieck group construction, is relegated to an Appendix;
  • Is N really taken for granted? No, another Appendix sketches the construction of the (nonnegative) integers from set theory (!), through cardinals; this is for “interested readers”;
  • Limits are (of course) presented in clean and robust ε / δ fashion.

One may get from this the impression of Bourbakism running rampant among schoolchildren, but I confess to having selected these topics just to give such an impression. In fact, the text is not at all a distilled version of Nicolas’s great treatise for younger readers, but its influence is probably present in two respects: an emphasis on structures (and homomorphisms), and on rigor. Where things are not proved, this is stated clearly: nous avons admis les principales propriétés de… (“We have assumed valid the main properties of…”).

The style itself is much more readable for a textbook: most new definitions come after a short description of why they might be interesting or natural, often together with a preliminary example, which may then be the subject of a quick follow-up after the definition itself. In addition, there are many excellently chosen exercises, ranging from direct computational ones to more adventurous escapades (the arithmetic-geometric mean, for instance).

But more importantly perhaps, there is not only an emphasis on abstraction and definitions: the discussion of the reals contains a detailed investigation of approximations (decimal expansions, interval arithmetic, relative and absolute errors), and there is a really nice probability chapter. There, the definitions, as before, are rigorous and correspond to the general theory: although the theory is of course built on finite sets, there are tribus (“σ-algebras”), random variables, distribution laws and probability measures, and this chapter concludes with a version of the weak law of large numbers.

In the end, I get a mixed feeling from reading this book. It seems a fair guess that, in its time and for its intended audience, it must have been a complete disaster: the amount of the material to assimilate, and its abstract sophistication, must have made it a subject of hatred (or of jest…) for most students. But, as a “platonic” incarnation of the concept of a mathematical textbook, independently of its time and place, it is really not bad. If one or more of my children, one day, develop a taste for mathematics, and if they start looking eagerly, at a young and impressionable age, for almost any book to assuage their thirst for mathematics (as I did, and as most other mathematicians have done and still do), then at a reasonable time I will give them this textbook with pleasure.

(Note: I do not know at what rate the curriculum changed; I myself remember learning equivalence relations well before high-school, since vectors in the plane were introduced as equivalence classes of “bipoints équipollents”, but by the time I was finishing high-school, around 1986-87, they had been removed, though I remember that my mathematics teacher, in the first post-high-school year, started using them during the very first lesson without so much as a word of explanation… Nowadays, judging from what the students know when entering the first semester of university in Bordeaux, the program has probably been reduced by at least 90 percent, and all abstraction has vanished).

Another bad reference, and another exercise

My name turns out to be involved in another case of problematic bibliographic reference, but in contrast with the previous case, I am completely blameless. The book Mathematical Constants turns out to contain, as Added in Press, the statement that the “probability” that two integers belonging to the ring Z[j] (where j is a primiteve root of unity of order 3) are coprime is equal to

\frac{6}{\pi^2 H}

where

H=\sum_{k\geq 0}{\Bigl(\frac{1}{(3k+1)^2}-\frac{1}{(3k+2)^2}\Bigr)}

(probability is in the usual not-quite-strictly-correct sense of looking at the limit of the proportion in larger and larger balls or boxes centered at 0). For this, a reference is given to a putative unpublished note of mine entitled “Coprimality and squarefreeness within quadratic fields”.

However, this note does not exist, and never did! The only reason for this (somewhat casual) claim is that I had explained how to prove this (it is really an exercise, in line with this one) on the NMBRTHRY mailing list back in 2003.