Shortlisted for prestigious literary award!

Glory, it has been said, may come in many guises. It seems I have (at last) a decent (1 in 6) shot at literary honor, as my book The large sieve and its applications has been shortlisted for the Diagram Prize (see also the earlier pre-shortlist announcement).

One may quibble that being in the running for the award more fully known as “Diagram Prize for Oddest Book Title of the Year” is no guarantee of literary quality, but still, I’m sure the book will sell better if Cambridge University Press decides to adorn it with a nice sticker Shortlisted for the Diagram Prize 2009 — after all, many are those who would think this is a prize for mathematical books (what are diagrams for, after all?) I wonder how ethical it would be to mention this in my CV, if I similarly omit the full name of the prize…

In any case, after I cast my vote online, the browser displayed the current state of the poll, and it seems I will be a long shot (if I had no personal interest, I would probably have voted for The 2009-2014 World Outlook for 60-milligram Containers of Fromage Frais, where the wonderful language switch is hard to beat — although this seems to be aimed at the cognoscenti, with a price of 795$). Maybe I would have a better chance if the subtitle was included in mine, actually (this being, I recall, Arithmetic geometry, random walks and discrete groups).

(Thanks to K. Khuri-Makdisi for pointing this out).

Gamma function

It’s of course a coincidence, but it’s nice to learn that the first reference concerning the well-beloved Gamma function is in a paper from 1729. It’s by Euler, of course, and a copy of the original latin paper can be found here, as well as a link to an English translation.

Whittaker and Watson stated that the first definition is the following infinite product

\frac{1}{z\Gamma(z)}=\prod_{n\geq 1}{\Bigl(1+\frac{z}{n}\Bigr)\Bigl(1+\frac{1}{n}\Bigr)^{-z}}

for all complex numbers z, except the poles of the function, which are

z=0,\ -1,\ -2,\ldots, -n,\ \text{ for } n\geq 1.

In fact, Euler writes it slightly differently, namely (in a form equivalent to)

\Gamma(z+1)=\prod_{k\geq 1}{\frac{k^{1-z}(k+1)^z}{k+z}}.

I did not remember having seen this formula, though I guess I probably had it in front of me at least once (I will check if it appears in Titchmarsh’s book, which would not be surprising). It looks a bit like, but is different, from the Weierstrass product

\frac{1}{z\Gamma(z)}=e^{\gamma z}\prod_{n\geq 1}{\Bigl(1+\frac{z}{n}\Bigr)e^{-z/n}}\quad ,

where γ is the Euler constant. (This definition has the problem of requiring first to say what this constant is).

It turns out that this original definition was exactly suited to some purpose for which the Weierstrass product, frustratingly, just barely didn’t quite work. So I was happy when I found it in the big tables of Gradhstein and Ryzhik. However, those tables are sometimes unfortunately wobbly when it comes to the range of validity of their formulas, and in that case stated that the product was valid for

\text{Re}(z)>0,

(which did not suit me), so it was rather more satisfactory to see it in Whittaker and Watson. In fact, it is easy to derive it from the Weierstrass product and the definition of the Euler constant γ.

By the way, there is a recurrent conflict about the Gamma function: is the “right” function really Γ(z), or Γ(z+1), which interpolates the factorial rather better? In analytic number theory, various reasons suggest the standard definition is indeed the right one, but I’ve recently starting vacillating, and my current conclusion is that there are two canonical (Gamma) functions who just happen to be related by translation. Or maybe three, where the third is 1/Γ(z). Or four, with also 1/Γ(z+1).

Quoting the great unknown

Most scientists, and mathematicians in particular, must have been confronted at some point with the problem of properly quoting a theorem (or a theory, or a conjecture, …) that they do not truly understand. It may be that they have not read the proof in detail to have checked independently its correctness, or that the result involves concepts, objects and theories with which they are not familiar. Now, how should this be phrased? Surely, this is part of the information to convey to the reader of a scientific paper or book?

Well, as far as I understand (this being one of the many things that one is supposed to learn on the job), this type of doubt or indeterminacy is supposed to be hidden. The only accepted style of citing, at least judging from the typical mathematical text, is both steadfastly Olympian and touchingly Gimpelian. It implies, at the same time, a complete mastery of whatever is involved in the result mentioned, however incorrectly it may be phrased (“Wiles has proved that every modular elliptic curve is associated with a cusp form with integral coefficients”), and a childlike belief that whatever is claimed in a written source must be true (“Grothendieck has proved [SGA V, Exp. 12, Section 4, Théorème 3.1] that 27 is the only prime number which is not squarefree”). I’ve very rarely seen a quote come with a comment that the author does not claim to have understood it (which, it is true, would be somewhat embarassing for a result which is crucial to one’s proof…)

Personally, I often feel a definite awkwardness in citing a result I don’t “know” in the deepest sense of the word, and of course with the current profusion of preprint sources, things tend to become worse as more results are available for use and quotation than ever. One can try being careful (“Grothendieck has announced a proof that 27 is prime”), but what may seem reasonable when speaking of a result barely out in the arXiv, quickly becomes insulting when refering to some result which is published and is really true, but which, not to make bones about it, one has simply not checked personally. Indeed, referees may take this badly, especially if they happened to prove the theorem in question.


To give a concrete example, I have no doubt that the Riemann Hypothesis over finite fields is true, but although I have really done a lot of reading about it, and can claim to have gone in great detail in the first proof of Deligne, I can not yet claim to have mastered the second — which I’ve used much more often in my work (and even with the first, I have certainly not a full mastery of the total amount of background material, such as the complete proof of the Grothendieck-Lefschetz trace formula). This example is not academic at all: many analytic number theory results depend on estimates for exponential sums which are not accessible at all without Deligne’s work and its extensions, but very few are the analytic number theorists who understand the full proof. (My own PhD advisor said that this was essentially the only result he had ever used that he could not, if needed, reprove from scratch).

I think most uses of the Riemann Hypothesis over finite fields are fair, however: many people who use it for analytic number theory have a highly sensitive sense of what estimates should be correct, and indeed they quite often suggest precise conjectures about what they expect (which are then hopefully confirmed by people like N. Katz, using the deep algebraico-geometric framework underlying the Riemann Hypothesis): the point is that there is really an interplay between the analytic number theorists (who suggest the application of the Riemann Hypothesis, quite frequently in highly non-trivial ways) and the algebraic geometers who prove the required estimates hold. (One of the nicest example of this interaction is the beautiful paper of Fouvry and Katz on applications of the Katz-Laumon theory of “stratification of estimates” for families of exponential sums).

Another fairly common situation (at least recently), for analytic number theory, is to use the modularity of elliptic curves. Here, the gap between my understanding and what would be required to claim that I can vouch for the proof, is larger: although I’ve been exposed a lot to the general strategy, I’ve never understood in any depth the crucial final steps. But again, what I (and many others) usually want to use this theorem for is to state a corollary of a result we have proved for more general modular forms. Even if the case of elliptic curves might be the best motivation for the work, the general case is usually interesting enough that it doesn’t feel like cheating, especially if this point is explained in the introduction.

These are cases where it’s possible to phrase things professionnally enough. But sometimes, it’s much more complicated. Indeed, what should be true may be much less obvious, and the result we prove might depend completely on a truth that we can, in all honesty, only say that it’s nice that it’s true, because otherwise (and it could have been otherwise, for all we know), we would be stuck.

Thus I’ve had the occasion to use results that ultimately depend on the classification of finite simple groups (through the type of “strong approximation theorems” that say that the reduction modulo most primes of a Zariski dense subgroup of SL(n,Z), for instance, is the full group SL(n,ZpZ)) , and here it’s quite difficult for me to know how to quote this honestly. It is true that the corollaries of the classification which are involved do not really depend on its finer aspects (e.g., they do not depend on how many exceptional groups there are, provided there are only finitely many), but on the other hand, I don’t really have enough personal experience and intuition about finite groups to feel that the classification should be as it is, with a few known infinite families and finitely many exceptions.


Note that I really have no suggestion about this. Would it be good advice to suggest, say, that any analytic number theorists working on modular forms should spend a few weeks getting acquainted enough with the proof of the Ramanujan-Petersson conjecture, in order to feel better about using it? This might be what I would want to say, but how far can it get? If one starts from scratch — knowing no algebraic geometry for instance –, much more than a few weeks will be needed to feel any confidence about the correctness of the proof.

In the prevailing scientific mindset, this is an investment of time that may well be out of the question for most people (with, possibly, the exception of beginning graduate students — they, I think, should at least spend enough time acquiring the basics of many “languages”, so that they can go further later on if needed in their work; it is much harder to learn the principles of probability theory, or combinatorics, or algebraic number theory, as a post-doc or young professor, than as a student). I must confess that, at the current time, I certainly do not think I will be able to invest significant efforts in trying to understand the classification of finite simple groups…

Banach or not Banach?

In the basic theory of normed vector spaces and Banach spaces, there are a number of important theorems involving one or two such spaces, and each of them may, or may not, need to be complete (i.e., really a Banach space, and not only a normed space). Of course, at some point the statements should be engraved in memorial stone, and not require any thought, but if one hasn’t worked on the topic for a while (say, a few years), it’s easy to forget. Fortunately, it’s also easy to recover the assumptions needed by using elementary arguments to check how the theorems, if assumed to hold with all complete spaces, sometimes extend trivially to cases where one or more is not.

(1) The Hahn-Banach Theorem: here, the statement is about extending a continuous linear functional

\lambda\,:\, W\rightarrow \mathbf{C}

(assuming everything is over the base field C for simplicity) from a subspaceW to the whole space V. Should V be complete? Well, even if we prove the theorem in that case, note that we could use it, given a non-complete space V, to extend λ to the completion Y of V, and then we can trivially restrict it again to V. So there’s no need for V to be complete! Similarly, even if W is not closed, we can extend λ by continuity to its closure, and if the theorem holds for a closed subspace, we can then extend further to V. So W need not be closed.

(2) The Banach-Steinhaus Theorem: now we have a family of continuous linear maps

T_i\,:\, V\rightarrow W

between normed vector spaces, which are assumed to be pointwise bounded:

\sup_{i\in I}{||T_i(v)||}<+\infty

for all v in V, and the conclusion is that the operators are uniformly bounded over the unit ball of V:

\sup_{i\in I}{||T_i||}<+\infty

(recall the operator norm is the supremum over the unit ball). Should V, or W, be complete? Well, clearly, if the theorem holds for Banach spaces, and W is not complete, we can define a family by composition

U_i\,:\, V\rightarrow W\rightarrow Y

where Y is the completion of W, and the pointwise bound is still trivially true for this family. The uniform conclusion for it is also the same as for the original one (for the same obvious reason that the images of the Ti and Ui are the same).
So completeness of W is clearly not necessary.

However, if we try similar arguments for V, we can say (indeed) that the Ti extend to the completion of V, but it is not obvious that the pointwise bound holds for those extensions, since they must be evaluated at more points than those of V. This can lead quickly to doubts and the conclusion that it’s certainly safer to assume V is complete; or it can lead to constructing a counter-example to confirm this impression: maybe the simplest non-complete normed vector space is the space

d=\{(x_n)\,\mid\, x_n=0\text{ for } n \text{ sufficiently large}\}

of sequences with only finitely many non-zero coefficients, with the supremum norm of the coefficients. The coordinate mappings (which are the most obvious ones for this space!) can not serve as family of operators, since they are all of norm 1. But take

T_n(x)=nx_n,\text{ for which } ||T_n||=n\rightarrow +\infty,

for instance. Because the coefficients of a sequence in d vanish far away, we have

T_n(x)=0\text{ for all } n \text{ sufficiently large}

and so the pointwise bound holds, despite the operators not being uniformly bounded.

(One can also remember vaguely that the proof of the theorem uses the Baire Property somewhere, so some completeness is needed, and it must be that of V).

(3) The Open Mapping Theorem: here we have two spaces and a surjective continuous linear map

T\,:\, V\rightarrow W

and the outcome is that T is open: it maps open subsets of V to open subsets of W. Which of the two should be Banach spaces for this to hold?

If we try to assume the result for Banach spaces and extend it, we run into problems at both ends this time: if we replace T by the map S obtained by post-composing with the embedding of W in its completion, the assumption of openness may start to fail. If we assume that W is complete and extend T by continuity to the completion Y of V, the assumptions still hold, but the conclusion we obtain is that this extension is open: it maps an open subset of Y to an open subset of W, and there is no reason that the restriction to V be open. So this leads us to suspect that both source and target spaces have to be Banach spaces for this result to have a chance to hold. This is indeed the case.

For counterexamples, one can take for instance

V=C([0,1])\text{ with } ||f||=\sup|f(t)|,\quad\quad W=C([0,1])\text{ with } ||f||_1=\int_0^1{|f(t)|dt}

and T the identity map. Then W is not complete, and T is linear continuous and surjective but not open (that would make it an isomorphism, and then W would be complete…) This works more generally for any space V with two norms which are comparable, but only one of which is complete.

An example where the source space is not complete is given by first taking a non-continuous linear map

S\,:\,Y\rightarrow W

where Y and W are Banach spaces (note that the existence of such a map is essentially dependent on the Axiom of Choice!), and then defining

V=\text{graph of } S=\{(v,w)\in Y\times W\,\mid\, w=S(v)\}

with the norm induced from the product norm on Y x W. The non-continuity ensures that this space is not complete, but the map

T\,:\, V\rightarrow W,\quad\quad (v,S(v))\mapsto v

is continuous, surjective, but not open.

A smoother Brownian motion

I have mentioned one type of fractal already, but nature’s favorite (or at least, my own favorite…) is probably the curve formed by the path of a standard Brownian Motion, which is mathematically thought of as being a “random” continuous function

[0,+\infty[\rightarrow \mathbf{R}

and is visualized by the graph of this function. One can find online plenty of pictures of “typical” paths, which are immediately noticeable by virtue of being very rough: indeed, it is a theorem of Paley, Wiener and Zygmund that, almost surely, the function

t\mapsto B(t)

is nowhere differentiable, hence violent oscillations must exist arbitrarily closely to every point of the graph.

Hence the typical pictures of simulated Brownian motion are not as striking or beautiful as those of approximations of other fractal objects can be. This seems to be a shame, and things don’t have to be this way: since we can only draw approximations, the problem is partly that Brownian motion is often approximated by rescaled random walks with smaller and smaller steps with affine interpolation of the discrete steps of the walk, which creates the spiky look at each change of direction.

But here’s my own version:

Sample Brownian approximation

Isn’t it much nicer? Or, at least, more “readable”?

The theoretical backround is the theory of Bernstein polynomials (S. Bernstein, not J. Bernstein, who also has his own polynomials). They were used by S. Bernstein to provide a nice constructive proof of the Weierstrass approximation theorem: for any continuous function

f\,:\, [0,1]\rightarrow \mathbf{R}

the polynomials

B_n(x;f)=\sum_{0\leq i\leq n}{{n\choose i}f(i/n)x^i(1-x)^{n-i}}

converge uniformly to f as n goes to infinity. Since those are polynomials, they are very smooth approximations to an arbitrary continuous function. And if one applies this to a path of the Brownian motion, using the known distribution of the random variable B(i/n) (namely, a standard centered gaussian with variance i/n), one gets nice polynomial approximations of Brownian motion! The one above has degree 40000, and it should be said that trying to compute it from the formula doesn’t work (binomial coefficients (40000/i) are not so easy to compute when i is reasonably big…): what is used instead is the probabilistic interpretation of

{n\choose i}x^i(1-x)^{n-i}

as the probability that a binomial distribution with parameters x and n (i.e., the sum of n Bernoulli random variables with probability of being 1 equal to x) takes value i; this is approximated nicely by packages such as GSL. On the other hand, it takes much longer to output a graph like the one above, compared with a random walk with similar complexity (interpreted, roughly speaking, as how complicated the graph looks like — how many changes of direction, for instance).

All of this is described in the first part of my paper on Bernstein polynomials and Brownian motion (note that the last section of this, which tries to study zeros of random Bernstein polynomials, contains a serious mistake…); in particular it is shown that, indeed, those converge (in law) to Brownian motion, and in fact can be used to give an alternative proof of its mathematical existence.