A ternary divisor variation

Here is a sketch of the argument mentioned in the previous post (which arose from the discussions with Étienne Fouvry, Philippe Michel, Paul Nelson, etc, but presentation mistakes are fully mine…).

Theorem. We have
\frac{1}{Q}\sum_{r\sim R}\sum_{s\sim S}\sum_{a\bmod q,\ P(a)= 0\bmod q}\Bigl|S(a,q)-MT_q\Bigr|=O\Bigl(\frac{X}{Q\log^A X}\Bigr)
provided
Q^{3/2}S^{1/2}<N^{1-\epsilon},\quad\quad R<S,
where
(1) we put
S(a,q)=\sum_{m\sim M}\alpha_m\sum_{mn_1n_2n_3\equiv a\bmod{q},\ n_i\sim N_i}1,
and denote by MT_q the expected main term;
(2) the parameters are X=MN=MN_1N_2N_3, Q=RS, the modulus is q=rs, the moduli r and s are coprime and squarefree, and P\in \mathbf{Z}[X] is the usual polynomial associated to an admissible tuple.

If we take R=Q^{1/2-\epsilon} and S=Q^{1/2+\epsilon}, we get a non-trivial result for Q as large as N^{4/7}X^{-\epsilon}.

(In fact, the special shape of P will play no role in this argument, and any non-constant polynomial will work just as well.)

More precisely, I will give a proof which is — except for its terseness — essentially complete for r and s of special type, and we anticipate only technical adjustments to cover the general case — we will write this down carefully of course.

Before starting, a natural question may come to mind: given that qui peut le plus, peut le moins, can one give an analogue result for the usual divisor function? Recall that, for the latter, the (individual) exponent of distribution has been known to be at least 2/3 for a long time (by work of Linnik and Selberg independently, both using the Weil bound for Kloosterman sums.) This exponent has not been improved, even on average over q (although Fouvry succeeded, on average over q, in covering the range X^{2/3-\epsilon}\leq q\leq X^{1-\epsilon}) despite much effort. However, Fouvry and Iwaniec (with an Appendix by Katz to treat yet another complete exponential sum over finite fields) proved already twenty years ago that one could improve it to 2/3+1/48 if one averages for a fixed r \leq X^{3/8} over special moduli q=rs with rs^2\leq X^{1-\epsilon} — this gives in particular a nice earlier illustration of the usefulness of factorable moduli for this type of questions.


So, to work. For fixed r and s, we begin by applying the Poisson summation formula to the three “smooth” variables n_i (the smoothing is hidden in the notation n_i\sim N_i); the simultaneous zero frequencies (h_1,h_2,h_3)=(0,0,0) give the main term, as they should, and the other degenerate cases are easier to handle than the contribution of the non-zero h_i, so the main secondary term for a given q and a is given by
S_2=\frac{N}{q^2}\sum_m \alpha_m\sum_{1\leq |h_i|\\ H_i} K_3(a\bar{m}h_1h_2h_3;q),
where the dual lengths are H_i=Q/N_i, so that the total number of frequencies is H_1H_2H_3=Q^3/N, and where K_3 is a normalized hyper-Kloosterman sum modulo rs:
K_3(u;q)=\frac{1}{q}\sum_{xyz=u}\psi_q(x+y+z),\quad\quad \psi_q(x)=e(x/q).

(Below I will usually not repeat the range of summations when they are unchanged from one line to the next.)

Now we sum over r and s, move the sum over m outside, and apply the Cauchy-Schwarz inequality to the r sum, for a fixed (s,a_s,m), where a_s is a modulo s. To prepare for this step, we use the Chinese Remainder Theorem to split the condition P(a)=0\bmod rs, and to factor the hyper-Kloosterman sum as a sum modulo r times one modulo s.

The contribution of a fixed (s,a_s,m) is
T=\frac{1}{R}\sum_{r}\sum_{P(a_r)=0\bmod{r}}\Bigl|\sum_{h_i}K_3(a\bar{m}h_1h_2h_3;rs)\Bigr|
and we can bound
T\ll H^{1/2+\epsilon} U^{1/2},
where
U=\frac{1}{R^2}\sum_{r_1,r_2} \lambda(r_1,a_1)\lambda(r_2,a_2) \sum_{1\leq |h|\ll H} \overline{K_3(a_1\bar{s}^3\bar{m}h;r_1)}K_3(a_2\bar{s}^3\bar{m}h;r_2)\overline{K_3(a_s\bar{r}_1^3\bar{m}h;s)}K_3(a_s\bar{r}_2^3\bar{m}h;s)
for some coefficients \lambda(\cdot,\cdot) which are bounded.

The point of this is that we have smoothed the variable h=h_1h_2h_3 by eliminating its multiplicity, and that the range of this variable can be quite long; as long as H>(R^2S)^{1/2}, completing the sum in the Polya-Vinogradov (or Poisson) style will be useful.

Now we continue with U. It is here that it simplifies matters to have R<S and to do as if r and s were primes (this is a technicality which experience shows should give no loss in the final, complete, analysis.)

So we consider the sum over r_1, r_2. The diagonal contribution where r_1=r_2 is \ll H^{1+\epsilon}R^{-1+\epsilon}.

In the non-diagonal terms, we distinguish whether r_1\equiv r_2\bmod s or not. If not, we complete the two hyper-Kloosterman sums. This gives two complete exponential sums modulo r_1, r_2 and modulo s. The latter is the Friedlander-Iwaniec sum, in its incarnation as "Borel" correlations of hyper-Kloosterman sums (see the remark at the end of our note on these sums; this identification is already in Heath-Brown's paper, and Philippe realized recently that he had also encountered them in a paper on lower bounds for exponential sums.)

Both sums give square root cancellation (using Deligne’s work, of course) except for the s sum if r_1^3\equiv \pm r_2^3\bmod s. But we may just push these to the second case. Thus, the contribution U_0 of these non-exceptional terms gives
U_0\ll (R^2S)^{1/2+\epsilon}\Bigl(\frac{H}{R^2S}+1\Bigr)
(counting the number of complete sums in the h-interval).

On the other hand, the exceptional (r_1,r_2) are still controlled by the diagonal terms (because of the condition R<S; there is a minor trick involved here if the cubic roots of unity exist modulo s, but I'll gloss over that.)

Now we can gather everything, and one checks that we end up with a bound
\sum_{r,s}\sum_a S_2 \ll X^{\epsilon} QM\Bigl(\frac{1}{R^{1/2}}+\frac{R^{1/4}N^{1/2}}{Q^{5/4}}\Bigr).

We need this to be \ll MNQ^{-1} (\log MN)^{-A} for any A, and we see that we succeed as long as
Q^2R^{-1/2}<N^{1-\epsilon},\quad\quad Q^{3/2}R^{1/2}<N^{1-\epsilon},
as stated in the theorem (the second condition is implied by the first if R<S). And as mentioned just afterwards, if we have R=Q^{1/2-\epsilon}, this gives a good distribution up to X^{-\epsilon}N^{4/7} (note that epsilons may change from one inequality to the next.)

Remark. As the reader can see, we do not use either Weyl shifts, or cancellation in Ramanujan sums. The latter might appear in a more precise analysis, however, and give some extra gain.

Bounded gaps between primes: the dawn of (some) enlightenment

Thanks to the recent conference celebrating the 25th anniversary of the Zahlentheorie Seminar, and even more as this week’s conference for Fouvry’s 60th Birthday in Marseille, I have been able to talk with a number of people about Zhang’s result on bounded gaps between primes, especially Philippe Michel, Paul Nelson and Étienne Fouvry. All generic “our”s and “we”s below refer to these discussions.

We have concentrated exclusively on the critical “ternary-divisor” part of the argument, and attempted many variants and reconfigurations. Our goal is not really to do better than Zhang (in terms of exponents), but to see by direct experience what works and what doesn’t.

Quite a few of these attacks failed, which of course helps in building some understanding. But two or three directions are promising, and one definitely does seem to lead to a different version of the argument. We have not yet written full details, but the approach succeeds (i.e., it definitely breaks the barrier 1/2 of the Riemann Hypothesis, which is the whole game) in getting an exponent of distribution 4/7 for the ternary divisor function with nicely factorable moduli, at a level of informality which inspires great confidence. (When applied in the context of my previous papers with Fouvry and Michel, this type of sketches give fully consistent exponents and uniformity in comparison with the final papers.)

Here are just some remarks. We view the basic problem as understanding

\sum_r\sum_s \sum_{mn_1n_2n_3\equiv a \bmod{rs}} 1

where q=rs is the squarefree modulus, which should therefore (to beat RH) be of size a bit larger than X^{1/2} (by a small positive power of X), and m is fixed (though one can try to exploit average over it, and it will be important in the end that it be relatively small). In fact, quite a few attempts purposely express this as a special case of

\sum_r\sum_s \frac{1}{\sqrt{rs}}\sum_{n} d_3(n) K(n)

for some general “trace function” (these are more general than usually discussed, both because the modulus is not prime, and because the L^2-normalized characteristic function of a residue class modulo a prime is only the trace function of a perverse sheaf, and not a standard lisse sheaf, but that last point is not particularly problematic). The idea is that we want to exploit the general context and insights that we have developped about these objects.

In contrast to what I mentioned in my last post, it does seem that the factorization of moduli in the ternary case is crucial, but the cancellation from Ramanujan sums might not be (although of course extra cancellation can not hurt…)

On the other hand, in trying to work with a generalish K, we understand now this extra cancellation at a higher level: it is not a super-specific fact about Ramanujan sums, but the latter is a concrete illustration of a fundamental phenomenon concerning trace functions of the so-called (pointwise pure) “middle-extension” sheaves, due to Deligne: at a singularity of such a sheaf, the Frobenius eigenvalues have lower weights than at “generic” points (see Lemma 1.8.1 in Weil II), and often strictly lower. (On the automorphic side, it corresponds to the well-known fact that the Satake parameters at a ramified prime are smaller in absolute value than the Ramanujan-Petersson bound at the unramified primes.)
In this picture, the Ramanujan sums correspond to the singularity at 0 of the basic Kloosterman sheaf. (Of course one doesn’t need Deligne to estimate Ramanujan sums, but we can now confidently play complicated games with the trace functions without being afraid of having lost a very special property of these particular sums…)

The working technique also shows that one can argue without the Weyl-type shifts in the original paper — this was not entirely surprising since Heath-Brown’s improvement of the Friedlander-Iwaniec exponent for the bare d_3(n) (and also ours) do not involve such shifts.

Finally, all our current attempts to play games to avoid Deligne-level estimates for exponential sums have hit barriers…

There is of course still much to be done.

Fouvry 60

We are currently enjoying in Marseille the warmth and delights of a French Mediterranean Bouillabaisse while celebrating analytic number theory and the achievements of É. Fouvry, on the occasion of his 60th birthday.

I think everyone who has been in contact with any of his papers has immense respect for his scientific work. All those of us who have been fortunate enough to talk with him beyond purely scientific matters will also attest to his exemplary intellectual honesty, rectitude, generosity and — also important to my mind — to his sense of humor.

In analytic number theory, we play day to day in a wild down-to-earth jungle. We also all know that somewhere there is a Garden of Eden, where the Riemann Hypothesis roams free, and we hope to go there one day. Fewer know that there is a place even beyond, a Nirvana where even the Riemann Hypothesis is but a shadow of a deeper truth. And fewer still are those who have set foot in this special place. É. Fouvry did, and he was among the very first ones, if not the very first; and more people have walked on the moon than been there.

A few years ago, I wrote a nominating letter for Étienne’s application to the Institut Universitaire de France. There is one sentence that I wrote which still seems to me to summarize best my feelings about this part of his work: Rarely in history was so much owed by so many arithmeticians to so few. This is even truer today than it was then. Reader, if you care at all about prime numbers, recall that without É. Fouvry and very few others (two of whom are with us in Marseille), you might well never have known that the gaps between successive primes do not grow to infinity.

Yet another remark on the Friedlander-Iwaniec sum

I just remembered a point I had intended to make concerning the exponential sum of Friedlander and Iwaniec that is crucial in Zhang’s work on gaps between primes, but which slipped my mind. This may present the argument in my note with Fouvry and Michel in a more enlightening way, although it does not simplify the proof (I’ve now added this as a remark in the PDF file.)

We view the sum as
S(\alpha,\beta)=\sum_{x\in \mathbf{F}_p} a_1(x)\overline{a_2(x)}
where p is a prime and the coefficients a_1(x) and a_2(x) are values of Kloosterman sums:
a_1(x)=K_2(x),\quad\quad a_2(x)=K_2(\beta x/(x+\alpha))
for some parameters \alpha and \beta, non-zero elements of \mathbf{F}_p, putting
K_2(x)=-\frac{1}{\sqrt{p}}\sum_y e\Bigl(\frac{xy+1/y}{p}\Bigr).

Now the point is that, from the “automorphic” view of trace functions (as discussed in one of my previous posts), both a_1(x) and a_2(x) can be seen as the Hecke eigenvalues, at the prime T-x, of a cusp form on GL_2(\mathbf{F}_p(T)), i.e., of the analogue of a classical cusp form, living however over a function field instead of \mathbf{Q} — so the polynomial ring \mathbf{F}_p[T] plays its role of cousin of \mathbf{Z}. This result (the existence of these cusp forms) is by no means obvious, but it follows from Deligne’s construction of Kloosterman sheaves and Drinfeld’s proof of the Langlands correspondance for GL(2) over function fields.

In any case, if one admits this, the sum above is clearly an exact analogue of a sum of Hecke eigenvalues at primes of a Rankin-Selberg L-function associated to two cusp forms! If we know the Riemann Hypothesis for this Rankin-Selberg L-function, we can then very classically establish a bound
S(a,b)\ll \sqrt{p},
in full analogy with conditional results for Rankin-Selberg L-functions over number fields, provided the two cusp forms are not the same (otherwise there is a pole with a larger contribution). But here the two cusp forms are not the same simply because their “conductor” (in the arithmetic sense: in fact, just the location of ramified primes matters, i.e., the analogue of the divisors of the level of a primitive cusp form) are not equal!

Finally, the implied constant in applying the Riemann Hypothesis is uniformly bounded in terms of the conductor of the two cusp forms, and these are bounded independently of the prime and parameters, “explaining” the Birch-Bombieri result.

I emphasize again that this is not really a good way of writing a proof, because showing that the Rankin-Selberg L-functions over function fields satisfy GRH is harder than proving Deligne’s Riemann Hypothesis in the more geometric language of sheaves and their trace functions, simply because the argument reduces to Deligne’s result (this was, I think, first done by Lafforgue, though maybe Drinfeld had proved it for $GL(2)$). But this interpretation should make the argument very readable and natural to an analytic number theorist.

Bounded gaps between primes: some grittier details

Because I was teaching a course on prime numbers this semester and had just finished a chapter on Vinogradov’s method when his paper appeared, I promptly switched my plans for the last classes in order to present some aspects of Yitang Zhang’s theorem on bounded gaps between primes. In addition, one of the speakers of this week’s conference celebrating 25 years of the “Zahlentheorie Seminar” of ETH had to cancel at short notice, and I replaced her and gave yesterday another survey-style talk. The notes for the latter (such as they are…) can be downloaded in scanned form.

My insights to Zhang’s work remain clearly superficial, but here are some remarks going a bit beyond what I mentioned in the previous post, coming after these lectures, and some discussions with Ph. Michel and P. Nelson.

(1) The most delicate estimates seem to be those for the “Type III” sums. These concern the “good” distribution in invertible residue classes of an arithmetic function f(n) for integers N<n\leq 2N, modulo a "large modulus" q, where f(n) is of the very special type
f(n)=\sum_{m_1n_1n_2n_3=n}\alpha(m_1),
and the variables m_1, n_1, n_2 and n_3 are, roughly, of size M_1, N_i with M_1N_1N_2N_3=N, and (crucially) q is a bit larger than N^{1/2}: one needs to handle these for q up to N^{1/2+\delta} for some \delta>0 in order to obtain bounded gaps between primes.

The lengths M_1 and N_i are constrained in various ways, and the most critical case seems to be when M_1\approx q^{1/8}, N_i\approx q^{5/8} (the \approx means that one must be able to go a bit beyond such a case, since N is a bit beyond q^2).

(2) Another point is that Zhang manages to bound these sums for each individual residue class a\pmod q (coprime to q). In other words, denoting
\Delta_f(N,q,a)=\sum_{N<n\leq 2N: n\equiv a\pmod{q}}f(n)-\frac{1}{\varphi(q)}\sum_{N<n\leq 2N}f(n),
he proves individual bounds for \Delta_f(N,q,a), instead of average bounds over q (as in the other main part of this argument).

Also, he does not need to use the variable m_1 at all (but since the \alpha(m_1) are mostly unknown coefficients, and the sum is rather short, exploiting it does not seem easy). Hence the result looks enormously like controlling the distribution in residue classes of the ternary divisor function. This is exactly the question that Friedlander and Iwaniec had studied in the famous paper where they proved that the exponent of distribution is at least 1/2+1/230, but their argument is not quite sufficient for Zhang's purpose.

(3) One of the last tricks is Zhang's second use of the structure of the moduli q that are involved in his argument: these were chosen to have only prime factors \leq z=N^{\delta} for some small positive \delta), and Zhang exploits the "granular" (or friable) structure of such moduli in order to obtain flexibility in the possibility of factoring them as q=rs with r of size determined up to a factor at most z. This is particularly important for the “other” sums (it gives bilinear structure and makes it possible to use the dispersion method of Linnik, as already done by Fouvry-Iwaniec and Bombieri-Friedlander-Iwaniec in their works on primes in arithmetic progressions). For the "type III" case, it does not seem to be so much of the essence, but Zhang needs to gain a very small amount compared with Friedlander-Iwaniec, and does so by factoring q=rs with r rather small. He then gains from r a factor r^{1/2} which is essential, by exploiting the fact that a Ramanujan sum modulo r is bounded (so he gets more than square-root cancellation from r…)! This is an extremely special situation, and right now, it is what seems the most “miraculous” about this proof (at least to me).

It is for the contribution of the complementary divisor s that Zhang manages to position himself into applying the estimate of Birch and Bombieri for the exponential sums which Friedlander-Iwaniec had also encountered.

(4) This use of Deligne’s work is also very delicate: one can not relax the requirement of square-root cancellation, except by very tiny amounts. For instance, obtaining a bound of size p^2 for the three-variable sum modulo p is useless; in fact, the bound p^2 can be considered here as the trivial estimate, since the sum can be written as an average of one-variable Kloosterman sums. With Zhang’s parameters, one needs an estimate of for the sum which is no larger than p^{3/2+1/2000} (or so) in order to get the desired gain. However, as I explained in the last five minutes of my talk today (and as is explained in this note with Fouvry and Michel) the Birch-Bombieri bound is very well understood from a conceptual point of view.

(5) I was very curious, when first looking at the paper, to see how Zhang would handle the residue classes in the Goldston, Pintz, Yıldırım method, since the most uniform results on primes in arithmetic progressions (those of Fouvry-Iwaniec and Bombieri-Friedlander-Iwaniec) are constrained to use (essentially) a single residue class. What happens is that Zhang detects these classes by inserting a factor corresponding to their characteristic function, and by avoiding the Kloostermania approaches that rely on spectral theory of automorphic forms. The important properties of these residue classes are their multiplicative structure (coming from the Chinese Remainder Theorem) and the fact that, on average over moduli, there are not too many of them (the average is bounded by a power of \log N). In particular, his use of the dispersion method is in fact closer in spirit to some of its earliest uses by Fouvry and Iwaniec (for integers without small prime factors instead of primes), which also involved, in the final steps, the classical Weil bound for Kloosterman sums instead of (seemingly stronger) results on sums of Kloosterman sums.