Problems from the archive

(The title of this post is based on WBGO‘s nice late-Sunday show “Jazz from the archive”, which I used to follow when I was at Rutgers; the physical archive in question was that of the Institute of Jazz Studies of Rutgers University; the post itself is partly motivated by seeing Gil Kalai’s examples of his own “early” problems…)

The following problem of classical analysis of functions of one-complex variable has puzzled me for a long time (between 15 and 20 years), though I never really spent a lot of time on it — it has never had anything to do with my “real” research:

Let f,g be entire functions, and assume that for all r>0 we have
\max_{|z|=r}{|f(z)|}=\max_{|z|=r}{|g(z)|}
What can we say about f and g? Specifically, do there exist two real numbers a and b such that
g(z)=e^{ia}f(e^{ib}z)
for all z?

(Actually, the “natural” conclusion to ask would be: does there exist a linear operator T, from the space of entire functions to itself, such that Tf=g and such that

\max_{|z|=r}{|T(\phi)|}=\max_{|z|=r}{|\phi(z)|}

for all entire functions φ; indeed, the function g=Tf satisfies the condition any such “joint isometry” for the vector space of entire functions equipped with the sup-norms on all circles centered at the origin; but I have the impression that it is known that any such operator is of the form stated above).

My guess has been that the answer is Yes, but I must say the evidence is not outstandingly strong; mostly, the analogue is known when one uses L2 norms on the circles, since

\frac{1}{2\pi}\int_0^{2\pi}{|f(re^{it})|^2dt}=\sum_{n\geq 0}{|a_n|^2r^n}

if an are the Taylor coefficients of f. If those coincide with the same values for another entire function g (with coefficients bn) we get by unicity of Taylor expansions that

|a_n|=|b_n|,\ so\ b_n=e^{i\theta_n}a_n

for all n, and the linear operator

\sum_{n\geq 0}{c_nz^n}\mapsto \sum_{n\geq 0}{c_ne^{i\theta_n}z^n}

is of course a joint isometry for the L2 norms on all circles, mapping f to g.

The only result I’ve seen that seems potentially helpful (though I never looked particularly hard; it was mentioned in an obituary notice of the Bulletin of the LMS, I think, that I read rather by chance around 1999) is a result of O. Blumenthal — who is today much better known for his work on Hilbert modular forms, his name surviving in Hilbert-Blumenthal abelian varieties. In this paper from 1907, Blumenthal studies the structure of the sup norms in general, and computes them explicitly for (some) polynomials of degree 2. Although the uniqueness question above is not present in his paper, one can deduce by inspection that it holds for these polynomials at least.

(I wouldn’t be surprised at all if this question had in fact been solved around that period of time, where complex functions were extensively studied in France and Germany in particular; that the few persons to whom I have mentioned it had not heard of it is not particularly surprising since this is one mathematical topic where what was the height of fashion is now become very obscure).

Independence of zeros of L-functions over function fields

My paper “The large sieve, monodromy, and zeta functions of algebraic curves, II: independence of the zeros” has just been published online by International Math. Research Notices (the title really should be shorter… but I wanted to emphasize the link with the earlier paper in the series, though that one didn’t know it would have a little brother, so its title does not mention that it is number 1). Note in passing that a paper is masculine for me simply because this is so in French (“un article, un papier”).

The motivation of this work is to try to understand a bit more one famous conjecture about the zeros of the Riemann zeta function, and about those of other L-functions more generally. In the “simplest” case of ζ(s), it is expected that the Riemann Hypothesis holds (of course), so that the zeros (counted with multiplicity) can be written

\rho=\frac{1}{2}\pm i\gamma_n

where we order for convenience the ordinates in increasing order

0\leq \gamma_1\leq \gamma_2\leq \gamma_3\ldots

(choosing an arbitrary ordering of multiple zeros, in case it happens that a zero is not simple). Then, the conjecture is that

\gamma_1,\ \gamma_2,\ \ldots,\ \gamma_n,\ \ldots

are Q-linearly independent. This implies in particular that all zeros are simple, but it is of course much stronger.

If you think about it, this may look like a strange and somewhat arbitrary conjecture. The point is of course that it does turn out naturally in various problems. I know of at least two instances:

(1) Ingham showed that it is incompatible with the conjecture

|\sum_{n\leq N}{\mu(n)}|<\sqrt{n}\ \ \ \ for\ all\ N\geq 2,

apparently stated by Mertens; this inequality was of some interest because it implies (very easily) the Riemann Hypothesis. Of course, Ingham’s result made it very doubtful that it could hold, and later Odlyzko and te Riele succeeded in proving that it does not. (Here, μ is the Möbius function). For further closely related recent results, see for instance the work of Ng.

(2) The conjecture has been extended to state that all non-negative ordinates of zeros of primitive Dirichlet L-functions are Q-linearly independent; this was used (I don’t know if it had been introduced before) by Rubinstein and Sarnak in their very nice paper studying the Chebychev bias and generalizations of it. The Chebychev bias refers to the apparent fact that, usually, there are more primes p<X which are congruent to 3 modulo 4 than congruent to 1 modulo 4; it is a strange property, which is not literally true for all X, but should be so only in some sense of logarithmic density, as explained by Rubinstein and Sarnak. It has its unlikely source in the fact that the primes are best counted with a weight log p, and together with all their powers pk; the counting function incorporating those two changes exhibits no bias, but moving from it to the counting function for primes leads to a discrepancy having to do with squares of primes, and for any modulus q, to a recurrent excess of primes which are non-squares modulo q compared with those which are squares modulo q.

The way I tried to understand this somewhat better is well established: look at what happens for zeta and L-functions over finite fields. There we have two advantages, which have already been used quite often: first, the Riemann Hypothesis is known (by the work of Deligne in general), and second, the L-functions are polynomials (in p-s) with integral coefficients, instead of analytic functions with infinitely many zeros. In addition, these polynomials have a spectral interpretation: this is the basis of the Katz-Sarnak study of zeta functions and symmetry, motivated by the conjectures relating Random Matrix Theory and the zeta function.

The basic example of zeta functions over finite fields are those of algebraic curves; to be concrete, say we have a polynomial h of degree 2g+1, and an odd prime p; we can then look at the curve C with affine equation

y^2=h(x)

over the field with p elements. Its zeta function (or rather the one of the projective version of this curve) is defined by the formal power series

\exp(\sum_{n\geq 1}{(1+|C(\mathbf{F}_{p^n})|)T^n/n)

involving the number of points on the curve over all finite fields of characteristic p. Then it is known that this zeta function is the Taylor expansion of a rational function given by

\frac{P_C(T)}{(1-T)(1-pT)}

where the polynomial PC is of degree 2g (g is the genus of the curve), has integral coefficients, and can be factored as follows:

P_C(T)=\prod_{1\leq j\leq 2g}{(1-\alpha_jT)}

where

\alpha_j\alpha_{2g-j}=p,\ \ \ \ \ \ \ |\alpha_j|=\sqrt{p}.

The first identity corresponds to the functional equation, and the last fact is the Riemann Hypothesis in this context (to see why this is so, write T=p-s and look for the real parts of complex zeros s); it was first proved for curves by André Weil, though special cases go all the way back to Gauss!

So what is the analogue of the Q-linear independence conjecture here? It is not hard to convince oneself that the right analogue is that, if we write

\alpha_j=\sqrt{p}\exp(2i\pi \theta_j)\ \ \ with\ \theta_j\in [0,1[

for all j, then the question is whether

(1,\theta_1,\ldots,\theta_g)

are (or not) Q-linearly independent. The restriction to only half the values of α is because of the “functional equation” recalled above that relates αj with α2g-j, while the appearance of 1, has to do with the “transfer” from multiplicative to additive forms of the variable.

Now, the point of my paper is that one can indeed show that this independence holds “for most curves”, in a certain sense. Such a restriction is necessary: it is perfectly possible to construct curves where the independence does not hold (I give various examples in the paper). Moreover, I show that one can look at independence involving roots of more than a single curve: so not only are the zeros independent, but they tend to be independent of those of other curves. This implies, in particular, that there is typically no bias between the number of points of one curve and another: if C and D are two “generically chosen” algebraic curves of the same genus over the same finite field with p elements, then

\lim_{N\rightarrow +\infty}{\frac{1}{N}|\{n\leq N\ |\ |C(\mathbf{F}_{p^n}|<|D(\mathbf{F}_{p^n}|\}|}=1/2

(in fact, properly normalized, the difference is asymptotically normally distributed).

I will just say a few words about the techniques involved: as the title indicates, the basic tool is the large sieve (in a suitable form, introduced in the older brother paper), and the main arithmetico-geometric input comes from monodromy computations. In the cases I treat in detail, those come from an unpublished result of J-K. Yu, recently reproved by C. Hall. The main idea is to first show that one can deduce the desired independence statements, for a given curve (or tuple of curves) provided the splitting field of the zeta function(s) is as large as possible. This part of the argument involves fairly elementary (but cute) theory of representations of finite groups (which go back to results of Girstmair and others). Then one is reduced to showing that the maximality of splitting fields occurs most of the time, and this is provided by the sieve (qualitative forms of such a statement were first proved around 1994 by N. Chavdarov).

Actually, N. Katz pointed out (after the first version was written) that one could use, bypassing any consideration of the splitting field, the properties of the so-called Frobenius tori of Serre. This essentially means that the large sieve, per se, can be avoided, and replaced by an application of the uniform and effective Chebotarev density theorem. Still, I kept the original approach (in addition to discussing briefly this other method) for two reasons:

(1) It also leads quickly to the fact that the αj themselves are Q-linearly independent (Frobenius tori can not detect this condition); I don’t know any application of this fact, but it seems interesting in principle.

(2) Frobenius tori depend rather strongly on p-adic properties of the zeros; this means that the method does not work if, instead, we try to answer the following question: given a “random” element of SL(n,Z), do the roots of its characteristic polynomial satisfy any multiplicative relation? The “splitting field” technique, in suitable senses of the word “random” (e.g., random walks with respect to a finite generating set) can be made to prove such a statement, using the large sieve technique developed in my recent book.

(3) OK, here’s a third reason that may not apply so much given the possible use of the Frobenius tori: N. Katz had asked a number of times (including, in a truly epiphanic moment for me, at the end of his lecture during the Newton Institute workshop on Random Matrices and L-functions in July 2004) about the possible analogue and meaning, for the usual L-functions over number fields, of the generic maximality of splitting fields of L-functions of algebraic curves, as then known from the qualitative results of Chavdarov; I thought that maybe this link with independence could be a partial answer — but whether it is or not, it is not so clear now.

Linear operators which you can write down are continuous

This is a follow-up to a comment I made on Tim Gowers’s post concerning the use of Zorn’s lemma (which I encourage students, in particular, to read if they have not done so yet). The issue was whether it is possible to write down concretely an unbounded (or in other words, not continuous) linear operator on a Banach space. I mentioned that I had been explained a few years ago that this is not in fact possible, in the same sense that it is not possible to “write down” a non-measurable function: indeed, any measurable linear map

T\ :\ U\rightarrow V

where U is a Banach space and V is a normed vector space, is automatically continuous.

Incidentally, I had asked this to my colleague É. Matheron in Bordeaux because the question had arisen while translating W. Appel’s book of mathematics for physicists: in the chapters on unbounded linear operators (which are important in Quantum Mechanics), he had observed that those operators could often be written down, but only in the sense of a partially defined operator, defined on a dense subspace, and we wondered if the dichotomy “either unbounded and not everywhere defined, or everywhere defined, and continuous”, was a real theorem or not. In the sense that measurability is much weaker than the (not well defined) notion of “concretely given”, it is indeed a theorem.

Not only did Matheron tell me of this automatic continuity result, he gave me a copy of a short note of his (“A useful lemma concerning subseries convergence”, Bull. Austral. Math. Soc. 63 (2001), no. 2, 273–277), where this result is proved very quickly, as a consequence of a simple lemma which also implies a number of other well-known facts of functional analysis (the Banach-Steinhaus theorem, Schur’s theorem on the coincidence of weak and norm convergence for series in l1, and a few others). On the other hand, I don’t know who first proved the continuity result (Matheron says it is well-known but gives no reference).

The proof is short enough that I will present it; it is a nice source of exercises for a first course in functional analysis, provided some integration theory has been seen before (which I guess is always the case).

Here is the main lemma, due to Matheron, in a probabilistic rephrasing, and a slightly weaker version:

Main Lemma: Let G be a topological abelian group, and let An be an arbitrary sequence of measurable (Borel) subsets of G, and (gn) a sequence of elements of G. Assume that for every n and every g in G, either g is in An, or g-gn is in An.

Let moreover be given a sequence of independent Bernoulli random variables (Xn), defined on some auxiliary probability space.

Then, under the condition that the series

\sum_{n\geq 1}{X_n g_n}

converges almost surely in G, there exists a subsequence (hn) of (gn) such that

\sum_{n\geq 1}{h_n}

converges and belongs to infinitely many An.

This is probably not easy to assimilate immediately, so let’s give the application to automatic continuity before sketching the proof. First, we recall that Bernoulli random variables are such that

\mathbf{P}(X_n=0)=\mathbf{P}(X_n=1)=1/2.

Now, let T be as above, measurable. We argue by contradiction, assuming that T is not continuous. This implies in particular that for all n, T is not bounded on the ball with radius 2-n, so there exists a sequence (xn) in U such that

\|x_n\|<2^{-n},\ \text{and}\ \|T(x_n)\|>n.

We apply the lemma with

G=U,\text{ written additively},\ g_n=-x_n,\ A_n=\{x\in U\ |\ \|T(x)\|>n/2\}.

The sets An are measurable, because T is assumed to be so. The triangle inequality shows that if x is not in An, then

\|T(x-x_n)\|=\|T(x_n)-T(x)\|>\|T(x_n)\|-\|T(x)\|>n/2

so that x-xn is in An. (This shows where sequences of sets satisfying the condition of the Lemma arise naturally).

Finally, the series formed with the xn is absolutely convergent by construction, so the series “twisted” with Bernoulli coefficients are also absolutely convergent. Hence, all the conditions of the Main Lemma are satisfied, and we can conclude that there is a subsequence (yn) of the (xn) such that

y=\sum_{n\geq 1}{y_n}

exists, and is in An infinitely often; this means that

\|T(y)\|>n/2

for infinitely many n. But this is impossible since T is defined everywhere!

Now here is the proof of the lemma. Consider the series

Y=\sum_{n\geq 1}{X_ng_n}

as a random variable, which is defined almost surely by assumption. Note that any value of Y is nothing but a sum of a subseries of the original series with terms gn. Let

B_n=\{Y\in A_n\}

so that the previous observation means that the desired conclusion is certainly implied by the condition

\mathbf{P}(Y\text{ in infinitely many } A_n)>0.

The event to study is

I=\bigcap_{N\geq 1}{C_N}\ with\ C_N=\bigcup_{n\geq N}{B_n}

The sets CN are decreasing, so their probability is the limit of the probability of CN, and each contains (hence has probability at least equal to that of) BN. So if we can show that

\mathbf{P}(B_n)\geq 1/4\ \ \ \ \ \ \ \ \ \ \ \ \ (*)

(or any other positive constant) for all n, we will get

\mathbf{P}(C_N)\geq 1/4,\ \text{and hence}\ \mathbf{P}(I)\geq 1/4>0,

which gives the desired result. (In other words, we argue from a particularly simple case of the “difficult” direction in the Borel-Cantelli lemma).

Now let’s prove (*). We start with the identity

\{\sum_{m}{X_mg_m}\in g_n+A_n\ and\ X_n=1\}=\{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}

(for any n), which is a tautology. From the yet-unused assumption

A_n\cup (g_n+A_n)=G,

we then conclude that

\{X_n=1\}\subset \{\sum_{m}{X_mg_m}\in A_n\}\cup \{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}=B_n\cup S_n,

say. Therefore

1/2=\mathbf{P}(X_n=1)\leq \mathbf{P}(B_n)+\mathbf{P}(S_n).

But we claim that

\mathbf{P}(S_n)\leq\mathbf{P}(B_n).

Indeed, consider the random variables defined by

Z_m=X_m\ if\ m\not=n,\ \ \ Z_n=1-X_n

Then we obtain

S_n=\{\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0\}

but clearly the sequence (Zm) is also a sequence of independent Bernoulli random variables, so that

\mathbf{P}(\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0)=\mathbf{P}(\sum_{m}{X_mg_m}\in A_n\ and\ X_n=0)\leq\mathbf{P}(Y\in A_n)=\mathbf{P}(B_n)

as desired. We are now done, since we have found that

1/2\leq 2\mathbf{P}(B_n)

which is (*).

(In probabilistic terms, I think the trick of using Zm has something to do with “exchangeable pairs”, but I’m not entirely sure; in analytic terms, it translates to an instance of the invariance of Haar measure by translation on the compact group (Z/2Z)N, as can be seen in the original write-up of Matheron).

Enfin Sage!

In previous posts (such as this old one), I’ve mentioned a few computer programs that I use fairly frequently, but one particularly interesting one was missing: SAGE, which packages together many different scientific software and libraries with a convenient interface around the Python language. The reason was that SAGE may be quite a bit more difficult to install than, for instance, GAP or Pari/GP (which it actually includes). Or at least, it was for me: my current laptop is quite nice but also quite recent, and so is Fedora 9, which is my Linux distribution. Some SAGE binary packages exist for Fedora 8, but not for Fedora 9 at the moment, and unfortunately there is a subtle packaging issue of some kind with some cryptographic routines that prevent the Fedora 8 package from working. So the current solution is to download and compile SAGE from source. But here another problem arises: the Core 2 CPU is misdetected by one of the packages involved in today’s version of SAGE (the ATLAS linear algebra package, version 3.8.1), which sees it as some kind of freaky Pentium III, and then takes forever to build.

As often where open source software is involved, the solution is not very far after a bit of search. In this case, the issue is known, as well as the workaround: commenting out two lines in one of the Atlas source files, and modifiying a third one. Once this is done, the rest of the build process worked perfectly. However, one must notice (or not forget), contrary to what I first did, that the build process of Atlas used by SAGE starts by installing and working from a pristine copy of the source code, contained in a compressed file, which will therefore erase the patched version of the delinquent code if one is not careful…

What must really be done is to create a new archive with the modified file. And in case anyone else has the same problem and finds this post, here is a copy of such an archive: it is the file $SAGE_DIR/spkg/standard/atlas-3.8.1.p3.spkg, valid for release 3.0.6 of SAGE (where $SAGE_DIR is the base directory where the SAGE files are found), and should be copied (with that name) in that location; this is a 2.8MB file.

(All this will soon be obsolete of course; within a month at most, the necessary patch will have found a permanent place on all the required software).

Now, this being done, I have started reading the documentation to actually use SAGE. Learning it is easier because of the concrete project of adapting the GAP script I have been using to compute Chebotarev invariants (at least, I find this type of manageable project is a good way to adapt to a new programming environment).

For illustration, here is the resulting SAGE program:

def chebotarev(G):
    G1=gap(G)
    g=G1.Order()._sage_()
    M=G1.ConjugacyClassesMaximalSubgroups()
    M1=[i.Representative() for i in M] # Python list
    nbM=len(M1)
    C=G1.ConjugacyClasses() # Gap list
    O=[c.Size()._sage_() for c in C] # Python list
    print O
    print "Done precomputing"

    cheb=0.0
    scheb=0.0
    ar=matrix(len(C),len(M1))
    # C is a GAP lists hence starts at 1.
    for i in range(1,len(C)+1):
        for j in range(0,len(M1)):
            #print i,j
            if C[i].Intersection(M1[j]) != gap([]):
                ar[i-1,j]=1

    for i in subsets(range(0,len(M1))):
        if len(i) != 0:
            density=0.0
            for j in range(1,len(C)+1):
                isin=True
                for x in i:
                    if ar[j-1,x]==0:
                         isin=False
                if isin==True:
                    density=density+O[j-1]
                    #if len(i)==1: print "---", i, C[j], density
            #if len(i)==1: print i, density
            cheb=cheb+(-1)^(len(i)+1)/(1-density/g)
            scheb=scheb+(-1)^(len(i))/(1-density/g)*(1-2/(1-density/g))
    return cheb, scheb

(Since this is actually Python code, remember that the indentation is significant if copying the script).

Compared with the earlier script, notice that while the “mathematical heart” of the computations are still done in GAP (the maximal subgroups and the conjugacy classes are computed in GAP from SAGE), the logic is written in the Python language. This has a number of advantages in this case: (1) it is easy to loop over all subsets of the maximal subgroups (GAP doesn’t have a command to get all subsets of a finite set; I had to implement it by interpreting subsets as bit patterns of integers, and the annoyance of this was compounded by the absence of a bitwise “and” operator in GAP…); (2) it is easy to get a real approximation instead of a rational number with enormous denominator (the Chebotarev invariants are rational numbers of this type as soon as the group is not ridiculously small); GAP doesn’t implement any floating point operations, and earlier I transferred the rationals to Pari/GP to get such real approximations… (Note that Magma does not have the two defects mentioned here, and as I said before, it is currently more efficient for some of these computations).

Searching for numbers

In my last post, I mentioned finding the significance of the Niven constant

N=1+\sum_{k\geq 2}{(1-\zeta(k)^{-1})}=1.70521114\ldots

using Google. In fact, I searched first for “0.70521”, then for “2.70521” (which is the value of real interest in my case), and then last for “1.70521”.

Yesterday, I had the apparently subtle insight that since the integral part of the constant was unclear, I should search for “.70521” instead. But it doesn’t work! Instead of showing prominently pages with the Niven constant, we get a lot of information on the 70521 ZIP code and other irrelevant information. Increasing the precision with 705211 does not help (one gets various types of objects with component number 705211, from light bulbs to UV filters). Going to 7052111 gives us a number of patents and transaction numbers, 70521114 is similar, and 705211140 gives…nothing. (Until this post appears in the Google index, of course — this effect has been noticed already by I. Lugo).

To get the Niven constant, one must therefore search correctly, and not too precisely. It starts working at 1.70521, which is the precision I had chosen (randomly) to see if my constant was already known. Amusingly, searching for 1.705211 finds a reference to an Advanced Problem in the American Math. Monthly in 1975, which asks precisely to prove the result of Niven going back to 1969 where the constant first occurred. (It should also find the MathWorld page on the Niven constant, where the value is given with this precision, but it is hidden in an image containing the formula and Google apparently does not index the ALT text in the HTML tag). But from 1.7052111 on, it is all terra incognita.

I wonder if this “optimal” precision is typical if we want to find other constants on the web?  For e and π, we can certainly go much further, but what about other less glamorous numbers?