Demystifying (a bit) the arithmetic large sieve

In the development of the large sieve, leading to its arithmetic applications, there is one step which — I must admit shamefully — I never completely understood as a natural argument instead of one with a somewhat mysterious clever trick. Or, at least, until now: just this week-end, one vaguely related thing leading to another, I’ve just stumbled on a very transparent proof of this result.

I’ve written a short note with the details, but here is the outline. For background, it may be useful to read my earlier post on the subject on T. Tao’s blog, though I’ll use the classical setting of the sieve for integers below (as I did in the note, although it extends mutatis mutandis to the general setting described in my recent book).

The point is to provide the link between the analytic and arithmetic large sieve inequalities. The first one of these is the inequality

 (*)\ \ \ \ \ \ \ \ \ \ \sum_{q\leq Q}{\sum_{(a,q)=1}{|\sum_{n\leq N}{a_n\exp(2i\pi na/q)}|^2}}\leq (N-1+Q^2)\sum_{n\leq N}{|a_n|^2}

which is valid for all complex numbers (an); here the sum over a is over all residue classes modulo q, coprime with q, and the resulting exponential is independent of the choice of representatives of those classes. This inequality is discussed in great detail in Montgomery’s survey article; I’ll just say here that the constant N-1+Q2, which is not particularly easy to obtain, is of no importance for our purpose, and I will just write

\Delta=N-1+Q^2

below to emphasize this.

The second inequality, the arithmetic large sieve inequality, states that, for any choice of subsets Ωp for primes p, we have

(**)\ \ \ \ \ \ \ \ \ \ |\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ \text{for}\ p\leq Q\}|\leq \Delta H^{-1}

where

H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{|\Omega_p|}{p-|\Omega_p|}}}.

This is thus, recognizably, a sieve estimate: we bound from above the number of integers remaining in a segment after removing all those which fail to satisfy some constraints on their reductions modulo primes, constraints which are imposed for all primes p up to Q. This Q is a parameter which is adjusted for applications, depending on N.

—————————————————————–

Example. If sieve is completely unfamiliar, here is one of the most traditional examples: let Ωp be the residue classes 0 and -2 modulo p. Among the integers surviving the sieve process are then all prime numbers r larger than Q such that r+2 is also prime. Hence the arithmetic large sieve inequality implies an upper bound for the number of twin primes up to N, namely

\pi_2(N)\leq  \Delta H^{-1},\ \ \ \ \ \text{where}\ \ \ \ \ H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{2}{p-2}}}

(where in fact the coefficients 2 must be replaced by 1 when p=2).

Using elementary techniques (see this treatment by Ben Green for example), or general results on sums of multiplicative functions, it follows that

H\geq c(\log Q)^2

for Q> 2 and some constant c>0. This leads to an estimate for the number of twin primes which is (conjecturally) of the right order of magnitude

\pi_2(N)\ll \frac{N}{(\log N)^2}

and by a partial summation, to the (weaker, but still spectacular) conclusion that the series of inverses of twin primes converges:

\sum_{p,p+2\ \text{primes}}{\frac{1}{p}}<+\infty,

which was one of V. Brun’s first achievements in building the beginnings of the modern theory of sieve methods in the early 20th Century.

————————————————————————–

So I will now describe how to derive (*) from (**). Or rather, I will derive (**) from the dual inequality

 (***)\ \ \ \ \ \ \sum_{n\leq N}{|\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}|^2}\leq \Delta\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}

where the complex coefficients β(q,a) are still arbitrary, and Δ has the same value as before. The equivalence of (*) and (***) is a consequence of the elementary duality theory in finite-dimensional Hilbert spaces. But more importantly, (*) is often proved by means of (***) and this duality principle; in more general contexts, this is because (***) is usually easier to approach. So it is a natural starting point to argue towards the arithmetic inequality (**).

But before we begin, a notational convention: below, q (and sums over q) will always refer to squarefree (positive) integers.The idea is quite simple, in particular if you’ve ever seen the Chebychev inequality in probability: let S denote the sifted set

S=\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ for\ p\leq Q\}.

Comparing with the structure of (***), we are going to describe an “amplifier” A(n) which is of the form

A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}

and has the property that |A(n)| is “large” whenever n is in S. If we have

|A(n)|\geq B,

for n in S, then by positivity we deduce

B^2|S|\leq \Delta \sum_{q}{\sum_{a}{|\beta(q,a)|^2}}

and if the last sum can be expressed conveniently and is not too large, we get an upper bound for |S| in this manner.

To build the amplifier, consider first a single prime p less than Q: if n is in S, we know that the reduction of n modulo p is constrained to not be in Ωp. We can express this analytically by saying that the characteristic function of this set, evaluated at n, is zero. But now expand this characteristic function (say fp) in terms of the additive characters (the “discrete Fourier transform”): we have

f_p(x)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi ax/p)}

with

\alpha(p,a)=\frac{1}{p}\sum_{x\in\Omega_p}{\exp(-2i\pi ax/p)}.

Thus for n in S, we have

0=f_p(n)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi an/p)}

and we rewrite this by isolating the contribution of the 0-th harmonic (which is the constant function 1, which encodes the probability that an integer modulo p is in Ωp):

\frac{|\Omega_p|}{p}=\alpha(p,0)=\sum_{(a,p)=1}{(-\alpha(p,a))\exp(2i\pi an/p)}.

This is the basis of our detector! First, this defines the coefficients

\beta(p,a)=-\alpha(p,a)

and then, to extend this to all squarefree integers q up to Q, we use the Chinese Remainder Theorem and multiply the detectors for primes dividing q, getting coefficients β(q,a) for a coprime with q, such that

\prod_{p\mid q}{\frac{|\Omega_p|}{p}}=\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}.

For notational simplicity, we write

c_p=\frac{|\Omega_p|}{p}.

Thus the final detector for S is

A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}},

and we have

|A(n)|=\sum_{q\leq Q}{\prod_{p\mid q}{c_p}}.

Calling this quantity B, as previously, the resulting inequality after applying (***), from the sketch above, is

B^2|S|\leq \Delta A

with

A=\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}.

To compute this, we invoke the orthogonality of additive characters, and their compatibility with the Chinese Remainder Theorem: we have first

A=\sum_{q\leq Q}{\prod_{p\mid q}{\sum_{(a,p)=1}{|\beta(p,a)|^2}}},

and then, using the (discrete) Parseval formula

\sum_{a\ mod\ p}{|\alpha(p,a)|^2}=\frac{1}{p}\sum_{x\ mod\ p}{|f_p(x)|^2}=c_p,

we get by removing the contribution of a=0 that

A=\sum_{q\leq Q}{\prod_{p\mid q}{c_p(1-c_p)}}.

Now we have an inequality

|S|\leq \Delta \frac{A}{B^2},

which is similar, but not quite identical, to (**). In general, it is somewhat weaker — though barely so for some problems like the twin-prime example above, where the sets Ωp are quite small (i.e., what are called small sieves…)

To improve this inequality and get (**), we notice that we still can try other amplifiers. In particular, it is natural to notice that the inequality we got is not homogeneous if we replace the coefficients β(q,a) by multipliying by constants (depending only on q, multiplicatively). Indeed, if we consider now

\gamma(q,a)=(\prod_{p\mid q}{\lambda_p})\beta(q,a)

we replace B with

B_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_pc_p}}

while A is replaced with

A_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}}

and we can try to minimize the ratio A1/B12 when the coefficients λp are allowed to vary. This is the same classical problem that occurs in the Selberg sieve (see for instance the write-up by Ben Green of the application of the Selberg sieve to the twin-prime problem, already mentioned, for an introduction): minimize a quadratic form with respect to a linear constraint (with a different quadratic form), and it is easily and elegantly solved: we casually Cauchy

B_1^2\leq (\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}})\times (\sum_{q\leq Q}{\prod_{p\mid q}{c_p/(1-c_p)}})=A_1H,

so that, first of all, we can not do better (i.e., make the ratio smaller) than

\frac{A_1}{B_1^2}=\frac{1}{H}

and second, by the equality case in Cauchy’s inequality, we can do something this good by taking

\lambda_q=\prod_{p\mid q}{1/(1-c_p)}.

This leads to the optimal value of the ratio and therefore proves the arithmetic large sieve inequality (**), as described previously.

A final remark: the simplicity of the argument suggests that it is probably not new. There are (in Montgomery’s survey article, for instance) a number of earlier derivations of the arithmetic large sieve inequality from the dual form of the analytic inequality. However, those I have seen (I must say I had not really looked at any of them until after writing the gist of my argument…) are based on, or inspired by, the connection with the Selberg sieve, and are therefore less elementary (and motivated). Still, the ingredients look always much the same, and I think this write-up is more a question of re-arranging them, instead of a really new proof.

Can L^p be the same as L^q?

A fairly standard question that students are asked after a course in real analysis (involving Lebesgue integration) is to show that there is no inclusion between Lp spaces on R with respect to Lebesgue measure. This is usually done (sometimes by hinting or prompting) by considering for which values of α and p the functions

x\mapsto x^{\alpha}

are in Lp (after truncating either for |x|<1 or for |x|>1, to avoid divergences).

Once I was present as observer at an oral examination for such a course, and the professor in charge had raised this question more or less by asking

Is it possible that Lp is the same as Lq, if p is not equal to q?

where the meaning was clearly, implicitly, the one above; and I wondered (aloud) if the answer was the same in a more abstract way: is it possible that Lp(X,μ) be isomorphic to Lq(X,μ) if p is different from q? This is a question that makes sense for general measure spaces, of course, and one where one must be careful to specify what “isomorphic” is taken to mean. In the algebraic sense, this is a question of cardinality only, and doesn’t seem very interesting, but isomorphism as topological vector spaces seems much more natural. Note that the answer is not immediately clear for the classical Lebesgue spaces over R: even if there is no inclusion, there might exist some clever linear renormalization map that sends (for instance) a square-integrable function bijectively — and continuously — to an integrable one.

But still, the answer is (not surprisingly) “No”, provided the only obvious reservation is made: if Lp and Lq are finite dimensional, then they are isomorphic as topological vector spaces, as is well known (they are not necessarily isometric, of course). But otherwise, we have

Theorem. Let (X,μ) be a measure space, and let p and q be real numbers at least 1 such that Lp(X,μ) has infinite dimension. Then Lp(X,μ) and Lq(X,μ) are isomorphic, as topological vector spaces, if and only if p=q.

I think this goes back to Banach, at least for the classical spaces of functions on R (if I understand correctly Chapter XII of his book Théorie des opérations linéaires). For the general case, although I didn’t find a reference, this must be well-known since it is a direct consequence of the computation of the type and cotype invariants of such Banach spaces. (The only reason I actually had the idea to look at these is that I was browsing pleasurably into the nice book of Li and Queffélec on the geometry of Banach spaces, where type and cotype are described in detail; this is overall very far from what I can claim any expertise to…)

Indeed, for an infinite dimensional Banach space of the form Lp(X,μ), it is known that

type(L^p(X,\mu))=\min(2,p),\ \ \ \ \ \ cotype(L^p(X,\mu))=\max(2,p)

where the (best) type (denoted type(E)) of a Banach space E is defined to be the largest real number p such that

\int_0^1{||\sum_{j=1}^n{r_j(t)x(j)}||dt}\leq M\left(\sum_{j=1}^n{||x(j)||^p}\right)^{1/p}

for any n>1, any finite sequence of vectors x(j) in E, and some constant M>0 (independent of n), where

r_j(t)=\mathrm{sign}\sin 2^j\pi t

denotes the sequence of Rademacher functions. Dually, the cotype is the smallest real number q such that

\int_0^1{||\sum_{j=1}^n{r_j(t)x(j)}||dt}\geq m\left(\sum_{j=1}^n{||x(j)||^q}\right)^{1/q}

for some constant m>0. The results above on the type and cotype of Lp spaces are explained in Section III.3 of the book of Li and Queffélec.

These definitions show that the type and cotype are preserved under continuous linear isomorphisms, so if we have infinite-dimensional spaces Lp(X,μ) and Lq(X,μ) which are isomorphic, their types and cotypes must coincide, i.e., we must have

\min(2,p)=\min(2,q),\ \ \ and\ \ \ \max(2,p)=\max(2,q),

which means p=q.

Problems from the archive

(The title of this post is based on WBGO‘s nice late-Sunday show “Jazz from the archive”, which I used to follow when I was at Rutgers; the physical archive in question was that of the Institute of Jazz Studies of Rutgers University; the post itself is partly motivated by seeing Gil Kalai’s examples of his own “early” problems…)

The following problem of classical analysis of functions of one-complex variable has puzzled me for a long time (between 15 and 20 years), though I never really spent a lot of time on it — it has never had anything to do with my “real” research:

Let f,g be entire functions, and assume that for all r>0 we have
\max_{|z|=r}{|f(z)|}=\max_{|z|=r}{|g(z)|}
What can we say about f and g? Specifically, do there exist two real numbers a and b such that
g(z)=e^{ia}f(e^{ib}z)
for all z?

(Actually, the “natural” conclusion to ask would be: does there exist a linear operator T, from the space of entire functions to itself, such that Tf=g and such that

\max_{|z|=r}{|T(\phi)|}=\max_{|z|=r}{|\phi(z)|}

for all entire functions φ; indeed, the function g=Tf satisfies the condition any such “joint isometry” for the vector space of entire functions equipped with the sup-norms on all circles centered at the origin; but I have the impression that it is known that any such operator is of the form stated above).

My guess has been that the answer is Yes, but I must say the evidence is not outstandingly strong; mostly, the analogue is known when one uses L2 norms on the circles, since

\frac{1}{2\pi}\int_0^{2\pi}{|f(re^{it})|^2dt}=\sum_{n\geq 0}{|a_n|^2r^n}

if an are the Taylor coefficients of f. If those coincide with the same values for another entire function g (with coefficients bn) we get by unicity of Taylor expansions that

|a_n|=|b_n|,\ so\ b_n=e^{i\theta_n}a_n

for all n, and the linear operator

\sum_{n\geq 0}{c_nz^n}\mapsto \sum_{n\geq 0}{c_ne^{i\theta_n}z^n}

is of course a joint isometry for the L2 norms on all circles, mapping f to g.

The only result I’ve seen that seems potentially helpful (though I never looked particularly hard; it was mentioned in an obituary notice of the Bulletin of the LMS, I think, that I read rather by chance around 1999) is a result of O. Blumenthal — who is today much better known for his work on Hilbert modular forms, his name surviving in Hilbert-Blumenthal abelian varieties. In this paper from 1907, Blumenthal studies the structure of the sup norms in general, and computes them explicitly for (some) polynomials of degree 2. Although the uniqueness question above is not present in his paper, one can deduce by inspection that it holds for these polynomials at least.

(I wouldn’t be surprised at all if this question had in fact been solved around that period of time, where complex functions were extensively studied in France and Germany in particular; that the few persons to whom I have mentioned it had not heard of it is not particularly surprising since this is one mathematical topic where what was the height of fashion is now become very obscure).

Independence of zeros of L-functions over function fields

My paper “The large sieve, monodromy, and zeta functions of algebraic curves, II: independence of the zeros” has just been published online by International Math. Research Notices (the title really should be shorter… but I wanted to emphasize the link with the earlier paper in the series, though that one didn’t know it would have a little brother, so its title does not mention that it is number 1). Note in passing that a paper is masculine for me simply because this is so in French (“un article, un papier”).

The motivation of this work is to try to understand a bit more one famous conjecture about the zeros of the Riemann zeta function, and about those of other L-functions more generally. In the “simplest” case of ζ(s), it is expected that the Riemann Hypothesis holds (of course), so that the zeros (counted with multiplicity) can be written

\rho=\frac{1}{2}\pm i\gamma_n

where we order for convenience the ordinates in increasing order

0\leq \gamma_1\leq \gamma_2\leq \gamma_3\ldots

(choosing an arbitrary ordering of multiple zeros, in case it happens that a zero is not simple). Then, the conjecture is that

\gamma_1,\ \gamma_2,\ \ldots,\ \gamma_n,\ \ldots

are Q-linearly independent. This implies in particular that all zeros are simple, but it is of course much stronger.

If you think about it, this may look like a strange and somewhat arbitrary conjecture. The point is of course that it does turn out naturally in various problems. I know of at least two instances:

(1) Ingham showed that it is incompatible with the conjecture

|\sum_{n\leq N}{\mu(n)}|<\sqrt{n}\ \ \ \ for\ all\ N\geq 2,

apparently stated by Mertens; this inequality was of some interest because it implies (very easily) the Riemann Hypothesis. Of course, Ingham’s result made it very doubtful that it could hold, and later Odlyzko and te Riele succeeded in proving that it does not. (Here, μ is the Möbius function). For further closely related recent results, see for instance the work of Ng.

(2) The conjecture has been extended to state that all non-negative ordinates of zeros of primitive Dirichlet L-functions are Q-linearly independent; this was used (I don’t know if it had been introduced before) by Rubinstein and Sarnak in their very nice paper studying the Chebychev bias and generalizations of it. The Chebychev bias refers to the apparent fact that, usually, there are more primes p<X which are congruent to 3 modulo 4 than congruent to 1 modulo 4; it is a strange property, which is not literally true for all X, but should be so only in some sense of logarithmic density, as explained by Rubinstein and Sarnak. It has its unlikely source in the fact that the primes are best counted with a weight log p, and together with all their powers pk; the counting function incorporating those two changes exhibits no bias, but moving from it to the counting function for primes leads to a discrepancy having to do with squares of primes, and for any modulus q, to a recurrent excess of primes which are non-squares modulo q compared with those which are squares modulo q.

The way I tried to understand this somewhat better is well established: look at what happens for zeta and L-functions over finite fields. There we have two advantages, which have already been used quite often: first, the Riemann Hypothesis is known (by the work of Deligne in general), and second, the L-functions are polynomials (in p-s) with integral coefficients, instead of analytic functions with infinitely many zeros. In addition, these polynomials have a spectral interpretation: this is the basis of the Katz-Sarnak study of zeta functions and symmetry, motivated by the conjectures relating Random Matrix Theory and the zeta function.

The basic example of zeta functions over finite fields are those of algebraic curves; to be concrete, say we have a polynomial h of degree 2g+1, and an odd prime p; we can then look at the curve C with affine equation

y^2=h(x)

over the field with p elements. Its zeta function (or rather the one of the projective version of this curve) is defined by the formal power series

\exp(\sum_{n\geq 1}{(1+|C(\mathbf{F}_{p^n})|)T^n/n)

involving the number of points on the curve over all finite fields of characteristic p. Then it is known that this zeta function is the Taylor expansion of a rational function given by

\frac{P_C(T)}{(1-T)(1-pT)}

where the polynomial PC is of degree 2g (g is the genus of the curve), has integral coefficients, and can be factored as follows:

P_C(T)=\prod_{1\leq j\leq 2g}{(1-\alpha_jT)}

where

\alpha_j\alpha_{2g-j}=p,\ \ \ \ \ \ \ |\alpha_j|=\sqrt{p}.

The first identity corresponds to the functional equation, and the last fact is the Riemann Hypothesis in this context (to see why this is so, write T=p-s and look for the real parts of complex zeros s); it was first proved for curves by André Weil, though special cases go all the way back to Gauss!

So what is the analogue of the Q-linear independence conjecture here? It is not hard to convince oneself that the right analogue is that, if we write

\alpha_j=\sqrt{p}\exp(2i\pi \theta_j)\ \ \ with\ \theta_j\in [0,1[

for all j, then the question is whether

(1,\theta_1,\ldots,\theta_g)

are (or not) Q-linearly independent. The restriction to only half the values of α is because of the “functional equation” recalled above that relates αj with α2g-j, while the appearance of 1, has to do with the “transfer” from multiplicative to additive forms of the variable.

Now, the point of my paper is that one can indeed show that this independence holds “for most curves”, in a certain sense. Such a restriction is necessary: it is perfectly possible to construct curves where the independence does not hold (I give various examples in the paper). Moreover, I show that one can look at independence involving roots of more than a single curve: so not only are the zeros independent, but they tend to be independent of those of other curves. This implies, in particular, that there is typically no bias between the number of points of one curve and another: if C and D are two “generically chosen” algebraic curves of the same genus over the same finite field with p elements, then

\lim_{N\rightarrow +\infty}{\frac{1}{N}|\{n\leq N\ |\ |C(\mathbf{F}_{p^n}|<|D(\mathbf{F}_{p^n}|\}|}=1/2

(in fact, properly normalized, the difference is asymptotically normally distributed).

I will just say a few words about the techniques involved: as the title indicates, the basic tool is the large sieve (in a suitable form, introduced in the older brother paper), and the main arithmetico-geometric input comes from monodromy computations. In the cases I treat in detail, those come from an unpublished result of J-K. Yu, recently reproved by C. Hall. The main idea is to first show that one can deduce the desired independence statements, for a given curve (or tuple of curves) provided the splitting field of the zeta function(s) is as large as possible. This part of the argument involves fairly elementary (but cute) theory of representations of finite groups (which go back to results of Girstmair and others). Then one is reduced to showing that the maximality of splitting fields occurs most of the time, and this is provided by the sieve (qualitative forms of such a statement were first proved around 1994 by N. Chavdarov).

Actually, N. Katz pointed out (after the first version was written) that one could use, bypassing any consideration of the splitting field, the properties of the so-called Frobenius tori of Serre. This essentially means that the large sieve, per se, can be avoided, and replaced by an application of the uniform and effective Chebotarev density theorem. Still, I kept the original approach (in addition to discussing briefly this other method) for two reasons:

(1) It also leads quickly to the fact that the αj themselves are Q-linearly independent (Frobenius tori can not detect this condition); I don’t know any application of this fact, but it seems interesting in principle.

(2) Frobenius tori depend rather strongly on p-adic properties of the zeros; this means that the method does not work if, instead, we try to answer the following question: given a “random” element of SL(n,Z), do the roots of its characteristic polynomial satisfy any multiplicative relation? The “splitting field” technique, in suitable senses of the word “random” (e.g., random walks with respect to a finite generating set) can be made to prove such a statement, using the large sieve technique developed in my recent book.

(3) OK, here’s a third reason that may not apply so much given the possible use of the Frobenius tori: N. Katz had asked a number of times (including, in a truly epiphanic moment for me, at the end of his lecture during the Newton Institute workshop on Random Matrices and L-functions in July 2004) about the possible analogue and meaning, for the usual L-functions over number fields, of the generic maximality of splitting fields of L-functions of algebraic curves, as then known from the qualitative results of Chavdarov; I thought that maybe this link with independence could be a partial answer — but whether it is or not, it is not so clear now.

Linear operators which you can write down are continuous

This is a follow-up to a comment I made on Tim Gowers’s post concerning the use of Zorn’s lemma (which I encourage students, in particular, to read if they have not done so yet). The issue was whether it is possible to write down concretely an unbounded (or in other words, not continuous) linear operator on a Banach space. I mentioned that I had been explained a few years ago that this is not in fact possible, in the same sense that it is not possible to “write down” a non-measurable function: indeed, any measurable linear map

T\ :\ U\rightarrow V

where U is a Banach space and V is a normed vector space, is automatically continuous.

Incidentally, I had asked this to my colleague É. Matheron in Bordeaux because the question had arisen while translating W. Appel’s book of mathematics for physicists: in the chapters on unbounded linear operators (which are important in Quantum Mechanics), he had observed that those operators could often be written down, but only in the sense of a partially defined operator, defined on a dense subspace, and we wondered if the dichotomy “either unbounded and not everywhere defined, or everywhere defined, and continuous”, was a real theorem or not. In the sense that measurability is much weaker than the (not well defined) notion of “concretely given”, it is indeed a theorem.

Not only did Matheron tell me of this automatic continuity result, he gave me a copy of a short note of his (“A useful lemma concerning subseries convergence”, Bull. Austral. Math. Soc. 63 (2001), no. 2, 273–277), where this result is proved very quickly, as a consequence of a simple lemma which also implies a number of other well-known facts of functional analysis (the Banach-Steinhaus theorem, Schur’s theorem on the coincidence of weak and norm convergence for series in l1, and a few others). On the other hand, I don’t know who first proved the continuity result (Matheron says it is well-known but gives no reference).

The proof is short enough that I will present it; it is a nice source of exercises for a first course in functional analysis, provided some integration theory has been seen before (which I guess is always the case).

Here is the main lemma, due to Matheron, in a probabilistic rephrasing, and a slightly weaker version:

Main Lemma: Let G be a topological abelian group, and let An be an arbitrary sequence of measurable (Borel) subsets of G, and (gn) a sequence of elements of G. Assume that for every n and every g in G, either g is in An, or g-gn is in An.

Let moreover be given a sequence of independent Bernoulli random variables (Xn), defined on some auxiliary probability space.

Then, under the condition that the series

\sum_{n\geq 1}{X_n g_n}

converges almost surely in G, there exists a subsequence (hn) of (gn) such that

\sum_{n\geq 1}{h_n}

converges and belongs to infinitely many An.

This is probably not easy to assimilate immediately, so let’s give the application to automatic continuity before sketching the proof. First, we recall that Bernoulli random variables are such that

\mathbf{P}(X_n=0)=\mathbf{P}(X_n=1)=1/2.

Now, let T be as above, measurable. We argue by contradiction, assuming that T is not continuous. This implies in particular that for all n, T is not bounded on the ball with radius 2-n, so there exists a sequence (xn) in U such that

\|x_n\|<2^{-n},\ \text{and}\ \|T(x_n)\|>n.

We apply the lemma with

G=U,\text{ written additively},\ g_n=-x_n,\ A_n=\{x\in U\ |\ \|T(x)\|>n/2\}.

The sets An are measurable, because T is assumed to be so. The triangle inequality shows that if x is not in An, then

\|T(x-x_n)\|=\|T(x_n)-T(x)\|>\|T(x_n)\|-\|T(x)\|>n/2

so that x-xn is in An. (This shows where sequences of sets satisfying the condition of the Lemma arise naturally).

Finally, the series formed with the xn is absolutely convergent by construction, so the series “twisted” with Bernoulli coefficients are also absolutely convergent. Hence, all the conditions of the Main Lemma are satisfied, and we can conclude that there is a subsequence (yn) of the (xn) such that

y=\sum_{n\geq 1}{y_n}

exists, and is in An infinitely often; this means that

\|T(y)\|>n/2

for infinitely many n. But this is impossible since T is defined everywhere!

Now here is the proof of the lemma. Consider the series

Y=\sum_{n\geq 1}{X_ng_n}

as a random variable, which is defined almost surely by assumption. Note that any value of Y is nothing but a sum of a subseries of the original series with terms gn. Let

B_n=\{Y\in A_n\}

so that the previous observation means that the desired conclusion is certainly implied by the condition

\mathbf{P}(Y\text{ in infinitely many } A_n)>0.

The event to study is

I=\bigcap_{N\geq 1}{C_N}\ with\ C_N=\bigcup_{n\geq N}{B_n}

The sets CN are decreasing, so their probability is the limit of the probability of CN, and each contains (hence has probability at least equal to that of) BN. So if we can show that

\mathbf{P}(B_n)\geq 1/4\ \ \ \ \ \ \ \ \ \ \ \ \ (*)

(or any other positive constant) for all n, we will get

\mathbf{P}(C_N)\geq 1/4,\ \text{and hence}\ \mathbf{P}(I)\geq 1/4>0,

which gives the desired result. (In other words, we argue from a particularly simple case of the “difficult” direction in the Borel-Cantelli lemma).

Now let’s prove (*). We start with the identity

\{\sum_{m}{X_mg_m}\in g_n+A_n\ and\ X_n=1\}=\{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}

(for any n), which is a tautology. From the yet-unused assumption

A_n\cup (g_n+A_n)=G,

we then conclude that

\{X_n=1\}\subset \{\sum_{m}{X_mg_m}\in A_n\}\cup \{\sum_{m\not=n}{X_mg_m}\in A_n\ and\ X_n=1\}=B_n\cup S_n,

say. Therefore

1/2=\mathbf{P}(X_n=1)\leq \mathbf{P}(B_n)+\mathbf{P}(S_n).

But we claim that

\mathbf{P}(S_n)\leq\mathbf{P}(B_n).

Indeed, consider the random variables defined by

Z_m=X_m\ if\ m\not=n,\ \ \ Z_n=1-X_n

Then we obtain

S_n=\{\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0\}

but clearly the sequence (Zm) is also a sequence of independent Bernoulli random variables, so that

\mathbf{P}(\sum_{m}{Z_mg_m}\in A_n\ and\ Z_n=0)=\mathbf{P}(\sum_{m}{X_mg_m}\in A_n\ and\ X_n=0)\leq\mathbf{P}(Y\in A_n)=\mathbf{P}(B_n)

as desired. We are now done, since we have found that

1/2\leq 2\mathbf{P}(B_n)

which is (*).

(In probabilistic terms, I think the trick of using Zm has something to do with “exchangeable pairs”, but I’m not entirely sure; in analytic terms, it translates to an instance of the invariance of Haar measure by translation on the compact group (Z/2Z)N, as can be seen in the original write-up of Matheron).