Orthogonality of columns of integral unitary operators: a challenge

Given a unitary matrix A=(a_{i,j}) of finite size, it is a tautology that the column vectors of A are orthonormal, and in particular that
\sum_{i} a_{i,j} \overline{a_{i,k}} =0
for any $j\not=k$. This has an immediate analogue for a unitary operator U\,:\, H\rightarrow H, if H is a separable Hilbert space: given any orthonormal basis (e_n)_{n\geq 1} of H, we can define the “matrix” (a_{i,j})_{i,j\geq 1} representing U by
U(e_j)=\sum_{i\geq 1}a_{i,j}e_i,
and the “column vectors” (a_{i,j})_{i\geq 1}, for distinct indices j, are orthogonal in the \ell_2-sense: we have
0=\langle e_j,e_k\rangle = \langle U(e_j),U(e_k)\rangle=\sum_{i}a_{i,j}\overline{a_{i,k}}
if j\not=k.

Now assume that H is some L^2 space, say H=L^2(X,\mu), and U is an integral operator on H given by a kernel k\,:\, X\times X\rightarrow \mathbf{C}, so that
U(\varphi)(x)=\int_{X}\varphi(y)k(x,y)d\mu(y)
for \varphi \in L^2(X,\mu).
Intuitively, the values k(x,y) of the kernel form a kind of “continuous matrix” representing U. The question is: are its columns orthogonal? In other words, given y\not=z in X, do we have
\int_{X}k(x,y)\overline{k(x,z)}d\mu(x)=0?

If one remembers the fact that “nice” kernels define trace class integral operators in such a way that the trace can be recovered as the integral
\int_{X}k(x,x)d\mu(x)
over the diagonal (the basis of the trace formula for automorphic forms…), this sounds rather reasonable. There is however a difficulty: it is not so easy to write kernels k(x,y) which both define a unitary operator, and are such that the integrals
(\star)\quad\quad\quad\quad \int_{X}k(x,y)\overline{k(x,z)}d\mu(x)
are well-defined in the usual sense! For instance, the most important unitary integral operator is certainly the Fourier transform, defined on L^2(\mathbf{R},dx), and its kernel is
k(x,y)=e^{2i\pi xy},
for which the integrals above are all undefined in the Lebesgue sense. This is natural: if the kernel k(x,y) were square integrable on X\times X, for instance, the corresponding integral operator on L^2(X,\mu) would be compact, and its spectrum could not be contained in the unit circle (excluding the degenerate case of a finite-dimensional L^2-space.)

This probably explains why this question of orthogonality of column vectors is not to be found in standard textbooks. There are some examples however where things do work.

We consider the space H=L^2(\mathbf{R}^*,|x|^{-1}dx), and as in the previous post, we look at the unitary operator
T=\rho\Bigl(\begin{pmatrix}0&-1\\1&0\end{pmatrix}\Bigr),
where \rho is the principal series representation with eigenvalue 1/4 of \mathrm{PGL}_2(\mathbf{R}). The result of Cogdell and Piatetski-Shapiro already mentioned there shows that T is, indeed, a unitary operator given by a smooth kernel k(x,y)=j(xy) for some function j on \mathbf{R}^*. This function is explicit, and (as expected) not very integrable: we have
j(x)=\begin{cases}-2\pi \sqrt{x}Y_0(4\pi\sqrt{x})\text{ for } x>0,\\4\sqrt{|x|}K_0(4\pi\sqrt{|x|})\text{ for } x<0.\end{cases}.

Since it is classical that Y_0(x)\approx x^{-1/2} for x\rightarrow +\infty, this function is neither integrable nor square-integrable. But, the function K_0 on [0,+\infty[ decays exponentially at infinity! This means that the integrals (\star), which are given by
\int_{\mathbf{R}^*}j(xy)\overline{j(xz)}\frac{dx}{|x|},
make perfect sense when y and z have opposite sign (this requires also knowing that there is no problem at 0, but that is indeed the case, because the Bessel functions here have just a logarithmic singularity there, and the factors \sqrt{|x|} eliminate the |x|^{-1} in the integral.)

It should not be a surprise then that we have
\int_{\mathbf{R}^*}j(xy)\overline{j(xz)}\frac{dx}{|x|}=0
for yz<0. This boils down to an identity for integrals of Bessel functions that can be found in (combinations of) standard tables, or it can be proved more conceptually by viewing
j(xy)=k(x,y)
as limit of
\frac{1}{2\epsilon}\int_{|u-y|<\epsilon} k(x,u)du,
which is T(f_{y,\epsilon}) for the function f_{y,\epsilon} which is the normalized characteristic function of the interval of radius \epsilon around y, and similarly for z. Since
\langle f_{y,\epsilon},f_{z,\epsilon}\rangle =0
when \epsilon is small enough, the unitarity gives
\int_{\mathbf{R}^*} Tf_{y,\epsilon}(x)\overline{Tf_{z,\epsilon}(x)}\frac{dx}{|x|}=0,
and one must take the limit \epsilon\rightarrow 0, which is made relatively easy by the exponential decay of K_0 at infinity…

This is nice, but here comes a challenge: if one spells out this identity in terms of Bessel functions, what needs to be done is equivalent to showing that the function
K(a, b)=\int_{0}^{+\infty}{Y_0(ax)K_0(bx)xdx}
defined for a,b>0, is antisymmetric: we have
K(a,b)=-K(b,a).
Now, this fact is an “elementary” property of classical functions. Can one prove it directly? (By which I mean, without using the operator interpretation, but also without using an explicit formula for the integral…) For the moment, I have not succeeded…

I’ll conclude by correcting a mistake in my previous post (it should not be a surprise to anyone that if I attempt to be as clever as Euler, I may stumble rather badly, and the correction is in some sense rather small compared with one might expect)… There I claimed that the integral transform w\mapsto W appearing in the Voronoi formula for the divisor function is given by
|y|^{1/2}W(y)=T(|x|^{1/2}w(|x|)).
But this is not the case: the proper formula is
|y|^{1/2}W(y)=T(|x|^{1/2}\tilde{w}(x)),
where \tilde{w}(x)=w(x) if x>0, but \tilde{w}(x)=0 if x<0. This affects the final formula: we have
\|W\|^2=\|w\|^2,
instead of the claimed
\|W\|^2=2\|w\|^2
(the "proof" using the Fourier transform has the same mistake of using w(|xy|) instead of \tilde{w}(xy), so there is no contradiction between the informal argument and the rigorous one.)

Trace functions, II: Examples

Continuing after my last post, this one will be a list of examples of trace functions modulo some prime number p. For each of the examples, I will give a bound for its conductor, which I recall is the main numerical invariant that allows us to measure the complexity of the trace function K(n) (formally, the conductor is attached to the object \mathcal{F} that gives rise to K, but we can define the conductor of a trace function to be the minimal conductor of such a \mathcal{F}.) These objects \mathcal{F} will be called sheaves, since this is the language used in the paper(s) of Fouvry, Michel and myself, but one doesn’t need to know anything about sheaves to understand the examples.

I will start with a list of concrete functions which are trace functions, and then explain some of the basic operations one can perform on known trace functions to obtain new ones. All these examples will be (I hope) very natural, but it is usually a deep theorem that the functions come from sheaves.

Throughout, p is a fixed prime number. Generically, \psi denotes a non-trivial additive character modulo p, for instance
\psi(x)=e^{2i\pi x/p},
(which may also be viewed casually as an \ell-adic character), and \chi denotes a multiplicative character modulo p (non-trivial, unless specified otherwise.)

(1) Characters and mixed characters

Let f and g be non-zero rational functions in \mathbf{F}_p(T). Let
K(x)=\psi(f(x))\chi(g(x)),
for x which is not a pole of f, or a zero or pole of g, and K(x)=0 in that case. Then K is a trace weight. The (or an) associated sheaf is of rank 1, and its conductor is bounded by the sum of degrees of numerators and denominators of f and g. However, the size of the conductor arises for different reasons for f and g: for the “additive” component f, singularities are poles of f, and the contribution of each pole x_0 comes from the Swan conductor, which is bounded by the order of the pole at x_0; for the “multiplicative” component g, the singularities are zeros and poles of g, and each only contributes 1 to the conductor: the Swan conductors for K_g=\chi(g(x)) are all zero.

For analytic applications, the main point is that, by fixing f and g over \mathbf{Q}, one obtains for each p large enough (so that the reduction modulo p makes sense), and each choice of characters \psi and \chi, a trace weight associated to f and g which has conductor uniformly bounded (depending on f and g only). Thus any estimates valid for all primes with implied constants depending only on the conductor of the trace functions involved will become an interesting estimate concerning f and g. This applies to the main theorem of my paper with Fouvry and Michel concerning orthogonality of Fourier coefficients of modular forms and trace functions…

These examples are the most classical, and are very useful. Even the simple case g=1 and f(X)=X^{-1} is full of surprises.

(2) Fiber-counting functions

Another very useful example comes from a fixed non-constant rational function f\in \mathbf{F}_p(T), which is viewed as defining a morphism
f\,:\, \mathbf{P}^1\rightarrow \mathbf{P}^1.
Consider then
K(x)=|\{y\in \mathbf{P}^1\,\mid\, f(y)=x\}|.
This is a trace weight, associated to the direct image sheaf
\mathcal{F}=f_*\bar{\mathbf{Q}}_{\ell},
which in representation theoretic terms is an induced representation from a finite-index subgroup, so that it remains relatively simple.
Here the rank r of the sheaf is the degree \deg(f) of f as a morphism (i.e., the generic number of pre-images of a point x); the singularities are the finitely many x in \mathbf{P}^1 such that the equation
f(y)=x
has fewer than r solutions (in \mathbf{P}^1(\bar{\mathbf{F}}_p)) and, at least if p>\deg(f), the Swan conductors vanish everywhere, so that the conductor is bounded in terms of the degrees of the numerator and denominator of f only. In particular, if f is defined over \mathbf{Q}, varying p (large enough) will provide a family of trace functions modulo primes with uniformly bounded conductor, similar to the characters of the previous example with fixed rational functions as arguments.

The main reason this function is useful is that, for any other (arbitrary) function \varphi on \mathbf{P}^1(\mathbf{F}_p), we have tautologically
\sum_{y}{\varphi(f(y))}=\sum_{x}{K(x)\varphi(x)}
(in other words, it is maybe better to interpret K as the image measure of the uniform measure on the finite set \mathbf{P}^1(\mathbf{F}_p) under f, and this formula is the classical “integration” formula for an image measure…)

One also often takes the function
\tilde{K}(x)=K(x)-1,
where 1 is the average of K over \mathbf{F}_p. This is also a trace function (the sheaf corresponding to K contains a trivial quotient, and this is the trace function of the kernel of the map to this trivial quotient). We now have
\sum_{x}{\tilde{K}(x)\varphi(x)}=\sum_{y}{\varphi(f(y))}-\sum_{x}{\varphi(x)}.

(3) Number of points on families of algebraic varieties

More generally, we can count points on one-parameter families of algebraic varieties of dimension d\geq 1. For instance, families of elliptic curves or of more general curves are quite common. To be concrete, one may have a polynomial f\in \mathbf{F}_p[T,Y,Z], where T is seen as the parameter, and consider the curves
C_t\,:\, f(t,X,Y)=0.
Usually, it is not so much the number of points as the correction term that is most interesting. For instance, if the curves are generically geometrically irreducible, and have a single point at infinity, the size of C_t(\mathbf{F}_p) is (for all but finitely many t) of the form
|C_t(\mathbf{F}_p)|=p-a(C_t),
where a_(C_t) satisfies the Weil bound
|a(C_t)|\leq 2g(C_t)\sqrt{p},
in terms of the genus of C_t. In fact, once one ensures that the family of curves is such that the genus of the curves is the same g\geq 0 (for all but finitely many t), the function
K(t)=a(C_t)
is a trace function on the corresponding dense open set of \mathbf{A}^1, for some sheaf which has rank 2g. For the other values of t, the trace function of the corresppnding middle-extension sheaf might differ from the value a(C_t) defined as above using the number of points, but since the number of those singularities is bounded by the conductor, one can usually (analytically at least) not worry too much about this. Similarly, in many cases the sheaf is tamely ramified everywhere (i.e., all Swan conductors vanish), and so the conductor is well-controlled.

In contrast with the first two examples, the construction of a sheaf with this trace function is not elementary: it is an example of the so-called “higher direct image sheaves” (with compact support). Since, for every “good” t, the Riemann Hypothesis for curves shows that
a_p(C_t)=\sqrt{p}(\theta_{1,t}+\cdots+\theta_{2g,t}),
where the \theta_{i,t} are complex numbers of modulus 1, we can interpret the existence of this sheaf as saying that the algebraic variation of the “eigenvalues” \theta_{i,t} is itself controlled by an algebraic object. This is one of the main insights that algebraic geometry (and étale cohomology in particular) brings to analytic number theory.

The family of elliptic curves
x+x^{-1}+y+y^{-1}+t=0
in my bijective challenge is of this type.

(4) Families of Kloosterman sums

One of the great examples, for analytic number theory, is given by families of Kloosterman sums: for an integer m\geq 1, and a non-zero a\in\mathbf{F}_p, we let
Kl_m(a)=\frac{(-1)^{m-1}}{p^{(m-1)/2}}\sum_{x_1\cdots x_m=a}e\Bigl(\frac{x_1+\cdots +x_m}{p}\Bigr).
The Weil bound for m=2, and the even deeper work of Deligne for larger m, prove that
|Kl_m(a)|\leq m
for all a invertible modulo p. Further work, relying once more on the powerful formalism of étale sheaves and higher direct images in particular, shows that the function
K(a)=Kl_m(a),
is (the restriction to invertible a of) a trace function for an irreducible sheaf, with conductor bounded in terms of m only.

(5) The Fourier transform

If we have a function K(x) modulo p, we define its Fourier transform by
\hat{K}(t)=\frac{1}{\sqrt{p}}\sum_{x\in \mathbf{F}_p}{K(x)e\Bigl(\frac{xt}{p}\Bigr)}
for t\in\mathbf{F}_p (the normalization here is convenient, as I will explain). It is now a very deep fact that, if $\latex K$ comes from a sheaf, then so does -\hat{K} (the minus sign is natural, but this has to do with rather deep algebraic geometry…) More precisely, one has to be careful because of the fact that the Fourier transform of an additive character (as a function) is a multiple of a delta function. The latter does fit nicely in the framework of étale sheaves, but not as a middle-extension sheaf or Galois representation (because it is zero on a dense open set, so it would have to be zero to be a middle-extension sheaf or to come from a Galois representation). There is a geometric solution to this issue, but it involves speaking of perverse sheaves and related machinery, which we have barely started to understand: the Fourier transform works perfectly well at the level of perverse sheaves, and one can use their trace functions just as well as those of Galois representations. Since, in our current applications, we can always deal separately with additive characters (or delta functions), we have avoided having to deal with perverse sheaves (up to now…)

The existence of the \ell-adic Fourier transform of sheaves was first proved by Deligne, but the theory of the sheaf-theoretic Fourier transform was largely built by Laumon (with further contributions, in particular, from Brylinski and Katz). To illustrate how powerful it is, consider
K(x)=e\Bigl(\frac{x^{-1}}{p}\Bigr),
a relatively simple case of Example (1). We then have
\hat{K}(x)=Kl_2(x),
so that the existence of the Fourier transform at the level of sheaves implies the existence of the Kloosterman sheaf parameterizing classical Kloosterman sums as in the previous example.

Other examples that arise from our previous examples are many families of exponential sums, for instance
K(t)=\frac{1}{\sqrt{p}}\sum_{x\in\mathbf{F}_p}{\psi(f(x)+tx)\chi(g(x))},
(arising from Example (1); one must assume either that f(x) is not a polynomial of degree \leq 1 or that \chi is non-trivial to have a well-defined sheaf), or
K(t)=\frac{1}{\sqrt{p}}\sum_{x}{e\Bigl(\frac{tf(x)}{p}\Bigr)},
for t\not=0 with K(0) equal to the number of poles of f (the sum over x is over values where the rational function f is defined), that arises from Example (2) (applied with the function \tilde{K}).

This operation of Fourier transform has one last crucial feature for applications to the analysis of trace functions: the conductor of \hat{K} is bounded in terms of that of K only. This is something we prove in our paper using Laumon’s analysis of the singularities of the Fourier transform, and in fact we show that if the conductor of K is at most M\geq 1, then the conductor of \hat{K} is at most 10M^2. Hence the examples above, if the rational functions f (and/or g) are fixed in \mathbf{Q}(T) and then reduced modulo various primes, always have conductor bounded uniformly for all p.

(6) Change of variable

Given a non-constant rational function f\in\mathbf{F}_p(T) seen as a morphism
\mathbf{P}^1\rightarrow \mathbf{P}^1,
and a trace function K(x), one can form the function
f^*K(x)=K(f(x)).
This is again, essentially, a trace function: as in Example (3), one may have to tweak the values of f^*K at some singularities (because pull-back of middle-extension sheaves do not always remain so), but this is fairly easily controlled. Moreover, one can also control the conductor of f^*K in terms of that of K, taking into account the degree of latex f$. A specially simple case of great importance is when f is an homography
f(x)=\frac{ax+b}{cx+d},\quad\quad\quad ad-bc\not=0,
(an automorphism of \mathbf{P}^1) in which case no tweaking is necessary to defined f^*K, and the conductor is the same as that of K (which certainly seems natural!)

We can now compose these various operations. One construction is the following (a finite-field Bessel transform): start with K, apply the Fourier transform, change the variable t to t^{-1}, apply again the Fourier transform. If we call \check{K} the resulting function, the examples above show that if K is a trace function with conductor \leq M, then \check{K} will also be one, and its conductor will be bounded solely in terms of M (in fact, it will be \leq 100M^4 by the bound discussed in Example (5)).


Trailer! In the next post in this series, I will discuss the Riemann Hypothesis for trace functions and its applications. But probably before I will discuss the more recent works of Fouvry, Michel and myself, since we now have three further papers in our series — two small, and one big.

On Weyl groups and gaussians

Am I the last person to notice that for k\geq 0, the even moment
m_{2k}=\frac{(2k)!}{2^kk!}
of a standard gaussian random variable (with expectation zero and variance one) is the same as the index of the Weyl group of \mathrm{Sp}_{2k} inside the Weyl group of \mathrm{GL}_{2k} (in other words, the index of the groups of permutations of 2k elements commuting with a fixed-point free involution among all permutations)?

If “Yes”, what else have I been missing in the same spirit?

Euler style

Courtesy of the divisor function, here is another fun example of reasoning in the great style of Euler (the last installment is rather old…) A classical tool to study the distribution of values of d(n) (the number of positive divisors of n) is the Voronoi summation formula, which expresses a sum

S(w,c,a)=\sum_{n\geq 1}d(n)w(n)e\Bigl(\frac{an}{c}\Bigr),

for a nice test function w, some positive integer c\geq 1, and some integer a coprime to c, in terms of a “dual sum”

S(W,c,\bar{a})=\sum_{m\in \mathbf{Z}-\{0\}}{d(|m|)W(m/c^2)e\Bigl(\frac{\bar{a}m}{c}\Bigr)},

where \bar{a} is the inverse of a modulo c, and

W(y)=\int w(|x|) k(xy)dx

is some integral transform of w, with kernel k(y) involving the classical Bessel functions Y_0 and K_0. Precisely, we have

k(y)=\begin{cases} -2\pi  Y_0(4\pi \sqrt{y})&\text{ if } x>0\\ 4 K_0(4\pi\sqrt{|y|})&\text{ if } y<0\end{cases},

and one should add that there is also a main term in the Voronoi formula, but it is irrelevant for today's story. A classical application of this formula is to improve the error term in Dirichlet's asymptotic evaluation of

\sum_{n\leq X}d(n),

which was done indeed by Voronoi.

In an ongoing work with É. Fouvry, S. Ganguly and Ph. Michel, we needed to know some unitarity property of the transformation

w \mapsto W.

This is an entirely classical question, but we didn't find a ready-made statement in Watson’s book on Bessel functions. There is however a formal argument that suggests the answer: if we consider the function g(x,y) of two real variables defined by

g(x,y)=w(|xy|),

then it turns out that we have

\hat{g}(u,v)=W(uv),

where \hat{g} is the standard Fourier transform of g (this is contained in Section 4.5 of the book of H. Iwaniec and myself.) Hence we have, by the unitarity of the Fourier transform, the identity

\int \int |w(|xy|)|^2dxdy = \int\int |W(uv)|^2dudv.

Offhandedly, by changing variables, this means that

\int |w(|t|)|^2 dt \times I = \int |W(s)|^2 ds \times I,

which would give

2\|w\|^2= \|W\|^2\quad\quad\quad\quad\quad\quad (\star)

(the factor 2 comes from the fact that w is extended to an even function on \mathbf{R} from its original source as a function defined for non-negative real numbers), if not for the fact that the “constant” I is the integral

I=\int \frac{dx}{|x|}.

Alas, it diverges, although probably Euler would write it as I=4\log (\infty) (two infinities from the divergence at 0^{\pm}, the other two from the divergence at \pm \infty), and be happy with the outcome.

One can then prove rigorously the formula (\star) by truncation arguments, but here is a more conceptual argument (which offers the advantage of being something we can just quote), which follows from the interpretation of the Voronoi formula in terms of the representation theory of G=\mathrm{SL}_2(\mathbf{R}). What happens is that there exists a unitary representation \rho of G (the principal series with Casimir eigenvalue 1/4) which can be represented as acting on the Hilbert space H=L^2(\mathbf{R},|x|^{-1}dx) (the Kirilov model) in such a way that the unitary operator

T=\rho\Bigl(\begin{pmatrix}0&-1\\1&0\end{pmatrix}\Bigr)

is given by an integral operator

(T\varphi)(x)=\int \varphi(y) j(xy)\frac{dy}{|y|}

for some function j, which Cogdell and Piatetski-Shapiro called the Bessel function of \rho (see this note of Cogdell for a short explanation of this, with the analogues for finite fields and p-adic fields). Now, by direct inspection of the formula for j(y) that Cogdell and Piatetski-Shapiro computed, and comparison with the kernel k(y) in the Voronoi formula, one finds that

W(y)=|y|^{-1/2} T( x\mapsto \sqrt{|x|} w(|x|) )

(in this other short note, Cogdell explains why it is no coincidence that this abstract Bessel function appears in the Voronoi summation formula). Now, from

\int |\varphi(x)|^2 \frac{dx}{|x|}=\int |T(\varphi)(x)|^2\frac{dx}{|x|},

which holds for all \varphi\in H because T is unitary on H, we deduce exactly (\star)

Remark. There is a completely similar story where the circles x^2+y^2=a replace the hyperbolas xy=a, or in other words, if one defines
g(x,y)=w(x^2+y^2).

Then the Fourier transform of g is still a radial function W(u^2+v^2), and the map w\mapsto W is a Hankel transform (it involves the Bessel function J_0). Its unitarity follows then immediately from that of the Fourier transform, since the analogue of the divergent integral I is now, indeed, a finite constant.

In terms of representation-theory, the story is the same as above, except that the representation \rho is replaced with a discrete series representation. One can also deal similarly with radial functions in higher-dimensional euclidean spaces, which involves other discrete series representations.

Trace functions, I

This is again the first of a series of a few posts in which I will explain (as promised a very long while ago, and as far as I can…) the trace weights that are used in my paper with É. Fouvry and Ph. Michel (henceforth, this paper will be referred-to as FKM). Given a prime number p, these are certain specific functions

K\,:\, \mathbf{F}_p\rightarrow \mathbf{C}

that “come from algebraic geometry”, and that can be studied using both a very rich formalism, and such extraordinarily deep results as Deligne’s “Weil 2” form of the Riemann Hypothesis over finite fields.

In fact, each function of this type is really a kind of “shadow” of a more intrinsic (more algebraic, more geometric, more arithmetic, as you wish) object, and it is rather these objects which algebraic geometry studies. In general, K does not determine this other object: if I call \mathcal{F} the latter, it may well be the case that two distinct objects \mathcal{F}_1 and \mathcal{F}_2 give rise to the same trace function K. However, there is also a basic complexity invariant c(\mathcal{F})\geq 1 defined for a given \mathcal{F} (which is called its “conductor”), and one can show (this uses the Riemann Hypothesis…) that, given p, there is a bound T(p) (which grows with p) such that a given function K can come from at most one object \mathcal{F} with complexity at most T(p). I will come back to this in a later post, since I consider the question of determining precisely T(p) to be quite fundamental and fascinating, but for the basic purpose of FKM, this issue does not really arise.

As a terminological aside, we tend to call these functions K either “trace weights” or “trace functions”. Maybe a better word might be well-deserved for this notion, but we’re not quite sure what might work, though possibly we might use “tracic function”, a good translation of the French fonction tracique that we’ve found ourselves using; this has, at least, some classic ring.

In this first post, I will outline the three possible definitions (or interpretations) of the class of trace functions, going from what is possibly the most closely related to notions known to analytic number theorists, and ending with the most flexible, but maybe least familiar one.

Special Hecke eigenvalues of automorphic forms. In the first picture, one looks at automorphic forms related to the field F=\mathbf{F}_p(T) of rational functions over the finite field \mathbf{F}_p. As is the case for classical modular forms, there are Hecke operators associated to each place of F, in particular to the irreducible polynomials P_x=X-x for x\in\mathbf{F}_p. Given an automorphic form \phi, one can then define a function
K_{\phi}(x)=\lambda_{X-x}(\phi),
the corresponding Hecke eigenvalue for these particular Hecke operators. The complexity of \phi can then be defined as the sum of the “traditional” automorphic conductor and the rank r. Indeed, it is essential here to consider automorphic forms on all groups \mathrm{GL}_r(F), and not just on \mathrm{GL}_1 or \mathrm{GL}_2.

As examples, imitating the correspondance from Dirichlet characters to Hecke characters for \mathrm{GL}_1 over the field \mathbf{Q}, it is not too difficult to construct explicitly some automorphic forms (of rank 1) for which the associated functions are given by
K(x)=e(P(x)/p),\quad\quad\text{ or }\quad\quad K(x)=\chi(P(x)),
for some polynomial P\in\mathbf{Z}[X] and some multiplicative Dirichlet character \chi. These are certainly the most natural-looking “functions of algebraic origin” on a finite field, and indeed this construction of (analogues of) Dirichlet characters is the original, and easiest, way to prove the rationality and functional equation for the associated L-functions over F (since, in order to prove this, one does not even need to mention automorphic forms, the whole argument happening within the realm of Dirichlet characters.)

Despite their many fine qualities, automorphic forms are however a bit inflexible from the point of view of defining generalizations of these basic functions K(x). For instance, it is rather difficult to write down concretely the function attached to an automorphic form of rank at least 2. In fact, I don’t really know how to do it (except for automorphic forms built from the case r=1, like analogues of Eisenstein series) without first applying one of the two other definitions, constructing some object \mathcal{F} and associated trace function K, and then invoking some version of the Langlands correspondence to claim the existence of some automorphic form \phi with Hecke eigenvalues K_{\phi} coinciding with the original K.

Similarly, given two functions K_1(x), K_2(x) arising as Hecke eigenvalues of some automorphic forms \phi_1 and \phi_2, it is a rather big theorem to show that there exists another automorphic form with eigenvalues

K(x)=K_1(x)K_2(x),

(for x unramified for both \phi_1 and \phi_2): this is the general theory of the Rankin-Selberg convolution.

Another serious drawback (which I will amplify later) is that this is — as far as I know, and at current time — strictly a one-variable story. There is no simple definition (that I know) that can be used to easily package a family of automorphic forms \phi_t and, for instance, create a new automorphic form \Phi with Hecke eigenvalues related to some average of the eigenvalues of \phi_t.

Galois representations of function fields. The first alternative to automorphic representation is given by Galois representations, and it is again a customary picture on the side of number fields. The base field is still F=\mathbf{F}_p(T), but we now consider the Galois group
G=\mathrm{Gal}(F^{sep}/F)
of some separable closure of F, and finite-dimensional representations
\rho\,:\, G\rightarrow \mathrm{GL}(V).
Then, as is customary in algebraic number theory, for any x\in \mathbf{F}_p, we have the associated decomposition and inertia group at the place corresponding to x, and the Frobenius automorphism Fr_x which acts on V if x is unramified for \rho (i.e., if the inertia group at x acts trivially on V) and which acts on the invariants V^{I_x} otherwise. In all cases we can define a function
K(x)=\mathrm{Tr}(\rho(Fr_x)\mid V^{I_x}).
It is immediately clear that such a definition gives a very flexible formalism, because we are now dealing largely with linear algebra. So formally, we can add these functions (taking direct sums of representations), multiply them (taking tensor product; because this operation does not always commute with invariants, the corresponding trace function coincides with the product of the two factors at the unramified x, but may differ at the others.) There is a non-trivial difficulty having to do with topology: to obtain a good theory, since G is an infinite profinite group, we want to consider continuous representations. But then, if V is a \mathbf{C}-vector space with its usual topology, we have the difficulty that there are too few representations: any continuous representation then has finite image. One works around this issue by the well-know device of picking some auxiliary prime number \ell\not=p, and considering continuous representations into \bar{\mathbf{Q}}_{\ell}-vector spaces. There are many representations in that case (in particular, many with large infinite image), but of course the trace function now takes values in an \ell-adic field. Qu’à cela ne tienne (or, as Katz says, ell-adic, schmell-adic), one can pick (with some effort or help from a friendly axiom) an isomorphism
\iota\,:\, \bar{\mathbf{Q}}_{\ell}\rightarrow \mathbf{C},
and consider the function
x\mapsto \iota(\mathrm{Tr}(\rho(Fr_x)\mid V^{I_x})),
which is complex-valued.

The complexity is, here also, easy to define: there is a notion of Artin conductor for such a representation, and we add the dimension of V to take the latter into account.

For applications to constructing interesting function, this business involving \ell shouldn’t be considered as too problematic. In fact, to a large extent, it turns out that the theory is rather independent of \ell. Without wanting to develop this too much, one can already see it by noticing that for any \ell\not=p, one can rather easily construct Galois representations with trace functions equal to
K(x)=e(P(x)/p),\quad\quad K(x)=\chi(P(x)),
the basic examples already considered. In fact, this is rather simpler than the corresponding construction of Dirichlet characters of F, and in particular, it is very easy to go from the construction of representations \rho_a and \rho_m with respective trace functions
K_a(x)=e(x/p),\quad\quad K_m(x)=\chi(x),
to the case involving a general polynomial: we have a map F\rightarrow F by T\mapsto P(T), hence a map of Galois groups P^*\,:\, G\rightarrow G, and we can “just” consider the composites
K(x)=\rho_a\circ P^*,\quad\quad K(x)=\rho_m\circ P^*,
to get the desired representations. (This is really a restriction of representations.)

This theory also has fairly natural extensions to higher-dimensional varieties (though one must assume some smoothness for the theory to work decently). To a large extent, FKM might have been written in this language, as far as the definitions of trace weights are concerned. But we use instead the third approach…

Middle-extension sheaves on the affine line. This last theory is closer in terms of formalism to the previous one, but more geometric in spirit, and it is the most flexible. Indeed, it is the one we use in FKM. But the counterpart to this geometric flexibility is that the basic flavor of the definition is least familiar to analytic number theorists. (Here, I am reminded of Cyrano de Bergerac who, having described six different ways of going to the moon, and being asked “Which one did you choose”, replied “A seventh”; or, in proper subjunctive French, –Mais voilà six moyens excellents !. . .Quel système Choisîtes-vous des six, Monsieur ? — Un septième !)

Here the basic object is an \ell-adic étale sheaf on the affine line over \mathbf{F}_p, with an added “regularity” property. It is a consequence of basic properties of such objects that, for any x\in\mathbf{F}_p, we can look at the “stalk” at x, which is a finite-dimensional \bar{\mathbf{Q}}_{\ell}-vector space \mathcal{F}_x, and that the Frobenius automorphism (in some incarnation) acts on this vector space, allowing us to define a trace function
K(x)=\mathrm{Tr}(Fr\mid \mathcal{F}_x),
and this is how we get our trace weights from this point of view.

To get a feeling for the actual meaning of this, I would like first to refer to my old expository text on Deligne’s first proof of the Riemann Hypothesis over finite fields, where the first part is an introduction to étale cohomology, which might be useful for readers with some basic background in elliptic curves over finite fields, but who haven’t studied the étale topology yet. But here is a more down-to-earth way of seeing things, which mixes fish and fowl to some extent.

A middle-extension sheaf \mathcal{F} on the affine line over \mathbf{F}_p, whatever is the actual definition, comes concretely with some data. One of them is a finite set S\subset \bar{\mathbf{F}}_p of singularities, which is defined over \mathbf{F}_p (in other words, it is the zero set of some non-zero polynomial in \mathbf{F}_p[T]). On the complement U of this set, the sheaf is what is called lisse, which is equivalent to saying that there is a representation of the étale fundamental group \pi_1(U) of U in some finite-dimensional \bar{\mathbf{Q}}_{\ell}-vector space which is “equivalent” to the restriction of the sheaf to U. But this étale fundamental group is, in fact, none other (canonically isomorphic) than the Galois group G=\mathrm{Gal}(F^{sep}/F) of the previous description. And in fact, if we view the representation corresponding to \mathcal{F} as a representation of G, the trace functions are the same.

This allows us at least to describe how one can define the complexity of a middle-extension sheaf: one just takes the complexity of the associated Galois representation (the dimension of the vector space, plus the Artin conductor.)

What is the point then of thinking in terms of sheaves? To my mind, here are some important advantages:

  • The geometric picture that arises is often the easiest way to “see” how to manipulate trace functions to construct new ones;
  • There are different ways of extending a lisse sheaf on U to a sheaf on the affine line, and the “middle-extension” is just one of them. It is, in some sense, the best one, but there are others. In the general theory, these may come out because some construction goes outside of the realm of middle-extension sheaves: for instance, the tensor product of two middle-extension sheaves is not one in general; this accounts in a precise way for the way the product of two trace functions may not be one exactly;
  • The theory of sheaves extends handily to higher-dimensional varieties, where more types of singularities and other behaviors arise because there is “more room” for the dimension of various sets where different behaviors arise (so sheaves on a surface might be supported on a curve, etc). Here it is important to see middle-extension sheaves as just some of the étale sheaves, and to allow more general ones.
  • The formalism is by far the most powerful. Especially crucial to the proofs of the deepest results (including the Riemann Hypothesis) is the existence of the étale cohomology groups of a sheaf, and of so-called “higher-direct images” (with compact support or not), which make sense for étale sheaves, but in general do not preserve such regularity properties as being lisse or middle-extension.
  • As a consequence of the above, this is the language in which the sources concerning the properties of étale sheaves are written; for FKM, this means especially the books of N. Katz, which we have consulted and referenced extensively…

To conclude this first post, here is a concrete illustration of what the sheaf formalism gives that is important to analytic number theorists, and which is completely mysterious (as far as I know, at least) on the level of Galois representations or automorphic forms: the existence of the Fourier transform. In fact, given a trace weight K(x) associated to some sheaf \mathcal{F}, a construction of Deligne delivers another sheaf \mathcal{G}, which is still a middle-extension sheaf, and is such that the associated trace function is
 \hat{K}(x)=-\frac{1}{\sqrt{p}}\sum_{y\in\mathbf{F}_p}K(y)e\Bigl(\frac{xy}{p}\Bigr).
This construction is not obvious; in fact, it involves (1) the fact that sheaves make sense on higher-dimensional varieties, with a wide variety of “functorial” properties; (2) the fact that higher-direct images exist: this is what is needed to obtain results of the type “a sum over y of some trace functions parametrized by x is itself a trace function”…

If we assume the existence of this construction (and most analytic number theorists would argue that, whatever a theory of functions of algebraic origin might do, it should be compatible with Fourier transform…) we immediately expand our range of examples with some highly-interesting ones, starting with the basic cases
K(x)=e(P(x)/p),\quad\quad K(x)=\chi(P(x)),
whose Fourier transforms are extremely interesting: they are values of families of exponential sums in one variable.

For instance, take
K(x)=e(\bar{x}/p),\text{ for } x\not=0\pmod{p},
where we denote by \bar{x} the inverse of x modulo p. Then we find that
\hat{K}(x)=-\frac{1}{\sqrt{p}}\sum_{y\not=0}{e\Bigl(\frac{xy+\bar{y}}{p}\Bigr)}
is a trace weight! In other words, the family of Kloosterman sums S(x,1;p), as a function of x, is a function of algebraic origin modulo p

Trailer! In the next posts! I will probably next describe many examples of trace functions, and discuss the formalism that allows us to manipulate them conveniently. After this, I will come to their analytic properties, where the key point is the Riemann Hypothesis over finite fields…