Enfin Sage!

In previous posts (such as this old one), I’ve mentioned a few computer programs that I use fairly frequently, but one particularly interesting one was missing: SAGE, which packages together many different scientific software and libraries with a convenient interface around the Python language. The reason was that SAGE may be quite a bit more difficult to install than, for instance, GAP or Pari/GP (which it actually includes). Or at least, it was for me: my current laptop is quite nice but also quite recent, and so is Fedora 9, which is my Linux distribution. Some SAGE binary packages exist for Fedora 8, but not for Fedora 9 at the moment, and unfortunately there is a subtle packaging issue of some kind with some cryptographic routines that prevent the Fedora 8 package from working. So the current solution is to download and compile SAGE from source. But here another problem arises: the Core 2 CPU is misdetected by one of the packages involved in today’s version of SAGE (the ATLAS linear algebra package, version 3.8.1), which sees it as some kind of freaky Pentium III, and then takes forever to build.

As often where open source software is involved, the solution is not very far after a bit of search. In this case, the issue is known, as well as the workaround: commenting out two lines in one of the Atlas source files, and modifiying a third one. Once this is done, the rest of the build process worked perfectly. However, one must notice (or not forget), contrary to what I first did, that the build process of Atlas used by SAGE starts by installing and working from a pristine copy of the source code, contained in a compressed file, which will therefore erase the patched version of the delinquent code if one is not careful…

What must really be done is to create a new archive with the modified file. And in case anyone else has the same problem and finds this post, here is a copy of such an archive: it is the file $SAGE_DIR/spkg/standard/atlas-3.8.1.p3.spkg, valid for release 3.0.6 of SAGE (where $SAGE_DIR is the base directory where the SAGE files are found), and should be copied (with that name) in that location; this is a 2.8MB file.

(All this will soon be obsolete of course; within a month at most, the necessary patch will have found a permanent place on all the required software).

Now, this being done, I have started reading the documentation to actually use SAGE. Learning it is easier because of the concrete project of adapting the GAP script I have been using to compute Chebotarev invariants (at least, I find this type of manageable project is a good way to adapt to a new programming environment).

For illustration, here is the resulting SAGE program:

def chebotarev(G):
    G1=gap(G)
    g=G1.Order()._sage_()
    M=G1.ConjugacyClassesMaximalSubgroups()
    M1=[i.Representative() for i in M] # Python list
    nbM=len(M1)
    C=G1.ConjugacyClasses() # Gap list
    O=[c.Size()._sage_() for c in C] # Python list
    print O
    print "Done precomputing"

    cheb=0.0
    scheb=0.0
    ar=matrix(len(C),len(M1))
    # C is a GAP lists hence starts at 1.
    for i in range(1,len(C)+1):
        for j in range(0,len(M1)):
            #print i,j
            if C[i].Intersection(M1[j]) != gap([]):
                ar[i-1,j]=1

    for i in subsets(range(0,len(M1))):
        if len(i) != 0:
            density=0.0
            for j in range(1,len(C)+1):
                isin=True
                for x in i:
                    if ar[j-1,x]==0:
                         isin=False
                if isin==True:
                    density=density+O[j-1]
                    #if len(i)==1: print "---", i, C[j], density
            #if len(i)==1: print i, density
            cheb=cheb+(-1)^(len(i)+1)/(1-density/g)
            scheb=scheb+(-1)^(len(i))/(1-density/g)*(1-2/(1-density/g))
    return cheb, scheb

(Since this is actually Python code, remember that the indentation is significant if copying the script).

Compared with the earlier script, notice that while the “mathematical heart” of the computations are still done in GAP (the maximal subgroups and the conjugacy classes are computed in GAP from SAGE), the logic is written in the Python language. This has a number of advantages in this case: (1) it is easy to loop over all subsets of the maximal subgroups (GAP doesn’t have a command to get all subsets of a finite set; I had to implement it by interpreting subsets as bit patterns of integers, and the annoyance of this was compounded by the absence of a bitwise “and” operator in GAP…); (2) it is easy to get a real approximation instead of a rational number with enormous denominator (the Chebotarev invariants are rational numbers of this type as soon as the group is not ridiculously small); GAP doesn’t implement any floating point operations, and earlier I transferred the rationals to Pari/GP to get such real approximations… (Note that Magma does not have the two defects mentioned here, and as I said before, it is currently more efficient for some of these computations).

Searching for numbers

In my last post, I mentioned finding the significance of the Niven constant

N=1+\sum_{k\geq 2}{(1-\zeta(k)^{-1})}=1.70521114\ldots

using Google. In fact, I searched first for “0.70521”, then for “2.70521” (which is the value of real interest in my case), and then last for “1.70521”.

Yesterday, I had the apparently subtle insight that since the integral part of the constant was unclear, I should search for “.70521” instead. But it doesn’t work! Instead of showing prominently pages with the Niven constant, we get a lot of information on the 70521 ZIP code and other irrelevant information. Increasing the precision with 705211 does not help (one gets various types of objects with component number 705211, from light bulbs to UV filters). Going to 7052111 gives us a number of patents and transaction numbers, 70521114 is similar, and 705211140 gives…nothing. (Until this post appears in the Google index, of course — this effect has been noticed already by I. Lugo).

To get the Niven constant, one must therefore search correctly, and not too precisely. It starts working at 1.70521, which is the precision I had chosen (randomly) to see if my constant was already known. Amusingly, searching for 1.705211 finds a reference to an Advanced Problem in the American Math. Monthly in 1975, which asks precisely to prove the result of Niven going back to 1969 where the constant first occurred. (It should also find the MathWorld page on the Niven constant, where the value is given with this precision, but it is hidden in an image containing the formula and Google apparently does not index the ALT text in the HTML tag). But from 1.7052111 on, it is all terra incognita.

I wonder if this “optimal” precision is typical if we want to find other constants on the web?  For e and π, we can certainly go much further, but what about other less glamorous numbers?

More on the Chebotarev invariant (and the field with 6 elements)

Continuing the series of previous posts on what I called the Chebotarev invariant, here are some (easy) results on cyclic groups, which are certainly the simplest groups to consider!

Starting from the formula for c(G)

c(G)\ \ \ \ \ =\ \ \ \ \sum_{\emptyset\not=I\subset \max(G)}{\frac{(-1)^{|I|+1}}{1-\nu(\cap_{H\in I}{H^*})}}

we obtain by a straightforward translation between maximal subgroups of  Z/nZ and prime divisors of n (the subgroup corresponding to some p|n being given by pZ/nZ), the formula

c(\mathbf{Z}/n\mathbf{Z})=-\sum_{2\leq d\mid n}{ \frac{\mu(d)}{1-d^{-1}}},

which is not without some interest — it is both familiar-looking (a sum over divisors with a Möbius function involved –, yet subtly different — the factor 1/(1-1/d) which is not multiplicative in d.

To go further, the natural step is to expand this last term in geometric series, since each of the terms dk which then appear is multiplicative (this is the analogue of obtaining directly the formula explained in T. Tao’s comment on one of the earlier posts on combinatorial cancellation); exchanging the resulting sums and transforming the inner sums using multiplicativity, we get

c(\mathbf{Z}/n\mathbf{Z})=\sum_{k\geq 0}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

which is rewritten most conveniently by isolating the first two terms:

c(\mathbf{Z}/n\mathbf{Z})=2-\frac{\varphi(n)}{n}+\sum_{k\geq 2}{\Bigl(1-\prod_{p\mid n}{(1-p^{-k})}\Bigr)

and the point is that a trivial estimate follows

1\leq c(\mathbf{Z}/n\mathbf{Z})\leq 2+\sum_{k\geq 2}{\Bigl(1-\frac{1}{\zeta(k)}\Bigr)

where the last series is convergent.

In fact, this last series, plus 1 (which can be considered to be the natural term k=1 since the Riemann zeta function has a pole at s=1), is a “known” constant, called the Niven constant N: its defining property is that it is the average value of the maximal power of a prime dividing n:

N=\lim_{X\rightarrow+\infty}{\frac{1}{X}\sum_{1\leq n\leq X}{\alpha(n)}

where

\alpha(n)=\max\{r\ |\ p^r\ divides\ n\ for\ some\ prime\ p\}

It is more or less obvious that the upper estimate above is best possible: if we consider n given by the product of the first k primes, and let k tend to infinity, we get a sequence of integers where c(Z/nZ) converges to 1+N.

The appearance of N here is, a posteriori at least, easy to explain: it is because

1-\frac{1}{\zeta(k)}

is both the probability that a “random” vector in Zk not be primitive (i.e., its coordinates are not coprime in their totality; this is what lies behind our computation), and the probability that a “random” integer is not k-power free (i.e, it is divisible by a k-th power; this is why N appears in the average of α(n)).

The numerical value of N is very easy to compute (using Pari for instance):

N=1.7052111401053677642885514534345081585\ldots

(and to show how mathematical knowledge has degenerated, I did not recognize the value N from a well-rounded knowledge of analytic number theory, but by searching for 0.70521, 1.70521 and 2.70521 with Google…)

So, the conclusion is that, for a cyclic group with many subgroups, we may have to get close to 3 elements before we can conclude that we have generators of the whole group. Whether that is surprising or not will of course depend on the reader.

Now, it is natural to go somewhat further if we consider the definition of the Chebotarev invariant as the expectation of a random variable. As is emphasized in any probability theory course, the expectation is very far from sufficient to get a good idea, in general, of a random variable: the basic question is whether its values are rather concentrated around the average, or if instead they tend to spread out a lot.  A much better feeling of the variable (though still allowing a lot of variation among all possible distributions) arises if the variance V (or the second moment M2) is known; recall that

M_2(X)=\mathbf{E}(X^2),\ V(X)=\mathbf{E}(X^2)-\mathbf{E}(X)^2=M_2-\mathbf{E}(X)^2.

One can compute, in principle, the second moment of the waiting time whose expectation is the Chebotarev invariant by a formula similar to the one described above. In the special case of a cyclic group, and assuming I did not make a mistake in the computation (which I have yet to check by consulting probabilists), we get

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\sum_{1<e\mid n}{\frac{\mu(d)\mu(e)}{1-[d,e]^{-1}}\Bigl(\frac{1}{1-d^{-1}}+\frac{1}{1-e^{-1}}-1\Bigr)}}

(where I write c2(G) for the second moment, and where [d,e] is the lcm of d and e).

This sum is not much more difficult to approach than the previous one, and after some computations it reduces to a single divisor sum

c_2(\mathbf{Z}/n\mathbf{Z})=\sum_{1<d\mid n}{\frac{\mu(d)}{1-d^{-1}}\Bigl(1-\frac{2}{1-d^{-1}}\Bigr)}

from which the estimates

1\leq c_2(\mathbf{Z}/n\mathbf{Z})\leq 4+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}

follow; as before, they are best possible, but this time the new constant

N_2=3+\sum_{k\geq 2}{(2k+1)\Bigl(1-\frac{1}{\zeta(k)}\Bigr)}=7.7117246805241021285577922472877917633\ldots

does not seem to have appeared before — at least, according to Google.

All this gives a variance, in the case of n with many small prime factors, which is close to

(1+N_2)-(1+N)^2=1.3935573679739184290461227489481785625\ldots

(hence a standard deviation around 1.18, and is therefore not negligible at all compared with the expectation).

Now what about the field with 6 elements? Well, the point is that  both for the Chebotarev invariant and for the secondary one,  the expression as a divisor sum is an inclusion-exclusion (or sieve) formula over the non-trivial subgroups of Z/nZ, where the terms corresponding to each d are the same as if we computed the corresponding invariant for a hypothetical group with d elements having no subgroup except the trivial one and the whole group, such as the additive groups of finite prime fields do. So for n=6, things behave in this respect as if a field with 6 elements existed. (Yes, this is far-fetched…)

Finding life beyond the Central Limit Theorem

One of my son’s favorite nature documentary explains how deep-sea explorers have found there, within the last few decades, forms of life which are completely different from anything seen on the surface of the earth, or at more accessible depths in the ocean.

Now, in this spirit of adventure, I will discuss a recent preprint of J. Jacod, A. Nikeghbali and myself, where we try to find meaningful behavior in probabilistic phenomenon “beyond the standard limit theorems” — in particular, beyond the Central Limit Theorem. Moreover (and this is where, as a number-theorist, I find it most interesting) we find traces of these forms of life in a number of arithmetic situations. One was expected: it involves the analogy between Random Matrices and the Riemann zeta function, and it was the basic motivation behind this work (already explored previously in a slightly different way by A. Nikeghbali and M. Yor); the second was also natural: it is the finite field analogue of the previous one, where the Katz-Sarnak philosophy and techniques can be applied; and the third one was a little more unexpected: it lies in the context of the Erdös-Kác Theorem on the approximate normal distribution of the number of prime divisors of integers.

As often happens in mathematics, this example which we noticed last is rather simpler to introduce and explain than the others, so I’ll start with it. The Erdös-Kác Theorem states that for any real numbers a and b, with a<b we have

\lim_{N\rightarrow +\infty}\ \frac{1}{N}|\{n\leq N\ |\ a<(\omega(n)-\log\log N)/\sqrt{\log\log N}<b \}|=\frac{1}{\sqrt{2\pi}}\int_a^b{e^{-t^2/2}dt,

where ω(n) is the number of distinct prime divisors of n. This is phrased, informally, as saying that ω(n) is asymptotically distributed like a normal random variable with mean and variance both equal to (log log n), but more properly it is really the convergence in law of a normalized version

\frac{\omega(n)-\log\log N}{\sqrt{\log\log N}}\ \ \ \ for\ n\leq N

of ω(n) to a standard normal random variable, with mean 0 and variance 1. As such, this looks similar to the Central Limit Theorem, and there are indeed proofs which proceed explicitly using this idea (see this for instance).

Our main point is that, in this example and in others, one can extract meaningful behavior without doing the renormalization. And this is where a new type of convergence is necessary, since ω(n) in itself does not have a limiting distribution (when looked in the same sense as implicit above: taking n< N with N going to infinity).

This new type of convergence is expressed in a form involving the characteristic function (i.e., Fourier transform) of random variables. For the special case of ω(n), it is the following formula: there exists a continuous function Φ(u), defined for all real numbers u, with Φ(0)=1, and real numbers λN such that the limit

(\diamond)\ \ \ \ \ \ \ \lim_{N\rightarrow +\infty}\ \ \exp(\lambda_N(1-e^{iu}))\ \ \frac{1}{N}\sum_{n\leq N}{\exp(iu\omega(n))}=\Phi(u)

exists for all u, and is uniform on compact subsets. In fact, this is proved with an explicit formula for the limiting function and with

\lambda_N=\log\log N.

From this result, an easy argument with normalization shows that the normalized characteristic functions

\frac{1}{N}\sum_{n\leq N}{\exp(iu(\omega(n)-\log\log N)/\sqrt{\log\log N})}

do converge to that of a standard normal random variable, and this recovers exactly the Erdös-Kác theorem, using the characterization of convergence in law by means of characteristic functions (P. Lévy’s theorem).

The first factor of the left-hand side of the expression (◊) is the inverse of the characteristic function of a Poisson random variable with parameter λN, in other words of a random variable XN such that

\mathbf{P}(X_N=k)=e^{-\lambda_N}\frac{\lambda_N^k}{k!}

for non-negative integers k. So, moving heuristically this factor to the other side, we have approximately

\frac{1}{N}\sum_{n\leq N}{\exp(iu\omega(n))}\simeq \exp(\lambda_N(e^{iu}-1)) \Phi(u)=\mathbf{E}(e^{iuX_N})\Phi(u)

and using the basic properties of characteristic functions, things behave as if ω(n), for n<N, was a random variable which is the sum of the Poisson variable XN with parameter λN (which tends to infinity) and some (independent) term which has a limit as N goes to infinity. However, this is not literally true, because Φ(u) is not a characteristic function of a random variable. But because of this intuition, we say that there is mod-Poisson convergence, i.e., there is convergence, intuitively, modulo addition of Poisson random variables. (Note that this is not literally correct also because the set of Poisson random variables defined on a probability space is not a vector space, so the “modulo” has to be taken with a grain of salt).

This example gives the paradigm of what we may want to look at: starting with limiting theorem (in distribution) which involve some random variables (Yn) of interest, after normalization

Z_n=\frac{Y_n-\mathbf{E}(Y_n)}{\sqrt{\mathbf{V}(Y_n)}}

to have mean 0 and variance 1, we wonder whether a new stronger limiting theorem exists, where the normalization is avoided, but where the characteristic function of Yn is not considered “bare” but is divided by other suitable characteristic functions (which compensate for the possibly increasing variance of the original random variables in a different way than by looking at the standard normalization Zn).

In the case of the number of prime divisors, those were of Poisson type. But in fact, the first case where this behavior was explicitly noticed involved Gaussian random variables (though it is probable that, historically, the case of ω(n) was the first one: the computation of the limit above, but not its interpretation, is already contained in the proof of the Erdös-Kác theorem due to Rényi and Turán; nowadays, it is most easily obtained by a direct application of the Delange-Selberg method).

This other example is a reinterpretation of a famous formula due to J. Keating and N. Snaith, in the setting of Random Matrix theory: let Yn be a random variable distributed like

\det(1-A)

where A is a random unitary matrix distributed according to Haar measure on the compact unitary group U(n) of size n; then, it follows from the work of Keating-Snaith that the following formula holds for all real u:

\lim_{n\rightarrow +\infty}\ \ \exp(u^2\log n) \mathbf{E}(\exp(iu\log |Y_n|^2))=\frac{G(1+iu)^2}{G(1+2iu)},

where G is the Barnes (double gamma) function. Here, the first factor on the left-hand side is the inverse of the characteristic function of a normal random variable with mean 0 and variance 2 log n, and so we speak of mod-Gaussian convergence. The variance is increasing, so the heuristic idea is that Yn is distributed like a Gaussian, which is more and more widely spread out, plus some independent converging “core”.

There are by now many proofs of this formula, often stated in terms of the Mellin transform instead of the characteristic function, and again without the interpretation we give; see for instance the probabilistic proof of Bourgade, Hughes, Nikeghbali and Yor.

It is not yet entirely clear what is the meaning and generality of the type of limiting phenomenon found in those two examples. One can’t help noticing (and being encouraged by) the richness of the environment where it is found: in addition to the Random Matrix setting, the limiting function in the Keating and Snaith formula is one of the conjectural terms for moments of the Riemann zeta function on the critical line. In our preprint, we also prove quite general versions involving the equally rich and classical probabilistic setting of infinitely divisible distribution: I won’t say more because for this part I am much less competent than my co-authors.

We feel that there is a lot to be done to understand and explain these limit theorems, and hopefully that this form of convergence can also be directly useful to derive new results, in particular in arithmetic.

The Chebotarev invariant of W(E_8)

A quick follow-up to the previous post: D. Zywina has found the value of the Chebotarev invariant for the Weyl group of E8, which is

c(W(E_8))= 4.19424758989300188767793949869\ldots

So you would typically expect to need to look at a bit more than four Frobenius conjugacy classe, on average, to check that a Galois extension with group a subgroup of W(E8) is not smaller. As we managed to do it with only two with F. Jouve in our particular case, this means that we were indeed somewhat lucky.

(Thanks to Nguyen Ngoc Dong Quan for also proposing to do the computation with Magma).