Demystifying (a bit) the arithmetic large sieve

In the development of the large sieve, leading to its arithmetic applications, there is one step which — I must admit shamefully — I never completely understood as a natural argument instead of one with a somewhat mysterious clever trick. Or, at least, until now: just this week-end, one vaguely related thing leading to another, I’ve just stumbled on a very transparent proof of this result.

I’ve written a short note with the details, but here is the outline. For background, it may be useful to read my earlier post on the subject on T. Tao’s blog, though I’ll use the classical setting of the sieve for integers below (as I did in the note, although it extends mutatis mutandis to the general setting described in my recent book).

The point is to provide the link between the analytic and arithmetic large sieve inequalities. The first one of these is the inequality

$(*)\ \ \ \ \ \ \ \ \ \ \sum_{q\leq Q}{\sum_{(a,q)=1}{|\sum_{n\leq N}{a_n\exp(2i\pi na/q)}|^2}}\leq (N-1+Q^2)\sum_{n\leq N}{|a_n|^2}$

which is valid for all complex numbers (a_n); here the sum over a is over all residue classes modulo q, coprime with q, and the resulting exponential is independent of the choice of representatives of those classes. This inequality is discussed in great detail in Montgomery’s survey article; I’ll just say here that the constant N-1+Q², which is not particularly easy to obtain, is of no importance for our purpose, and I will just write

$\Delta=N-1+Q^2$

below to emphasize this.

The second inequality, the arithmetic large sieve inequality, states that, for any choice of subsets Ω_p for primes p, we have

$(**)\ \ \ \ \ \ \ \ \ \ |\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ \text{for}\ p\leq Q\}|\leq \Delta H^{-1}$

where

$H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{|\Omega_p|}{p-|\Omega_p|}}}.$

This is thus, recognizably, a sieve estimate: we bound from above the number of integers remaining in a segment after removing all those which fail to satisfy some constraints on their reductions modulo primes, constraints which are imposed for all primes p up to Q. This Q is a parameter which is adjusted for applications, depending on N.

—————————————————————–

Example. If sieve is completely unfamiliar, here is one of the most traditional examples: let Ω_p be the residue classes 0 and -2 modulo p. Among the integers surviving the sieve process are then all prime numbers r larger than Q such that r+2 is also prime. Hence the arithmetic large sieve inequality implies an upper bound for the number of twin primes up to N, namely

$\pi_2(N)\leq \Delta H^{-1},\ \ \ \ \ \text{where}\ \ \ \ \ H=\sum_{q\leq Q}{\mu(q)^2\prod_{p\mid q}{\frac{2}{p-2}}}$

(where in fact the coefficients 2 must be replaced by 1 when p=2).

Using elementary techniques (see this treatment by Ben Green for example), or general results on sums of multiplicative functions, it follows that

$H\geq c(\log Q)^2$

for Q> 2 and some constant c>0. This leads to an estimate for the number of twin primes which is (conjecturally) of the right order of magnitude

$\pi_2(N)\ll \frac{N}{(\log N)^2}$

and by a partial summation, to the (weaker, but still spectacular) conclusion that the series of inverses of twin primes converges:

$\sum_{p,p+2\ \text{primes}}{\frac{1}{p}}<+\infty,$

which was one of V. Brun’s first achievements in building the beginnings of the modern theory of sieve methods in the early 20th Century.

————————————————————————–

So I will now describe how to derive (*) from (**). Or rather, I will derive (**) from the dual inequality

$(***)\ \ \ \ \ \ \sum_{n\leq N}{|\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}|^2}\leq \Delta\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}$

where the complex coefficients β(q,a) are still arbitrary, and Δ has the same value as before. The equivalence of (*) and (***) is a consequence of the elementary duality theory in finite-dimensional Hilbert spaces. But more importantly, (*) is often proved by means of (***) and this duality principle; in more general contexts, this is because (***) is usually easier to approach. So it is a natural starting point to argue towards the arithmetic inequality (**).

But before we begin, a notational convention: below, q (and sums over q) will always refer to squarefree (positive) integers.The idea is quite simple, in particular if you’ve ever seen the Chebychev inequality in probability: let S denote the sifted set

$S=\{n\leq N\ |\ n\ \text{mod}\ p\ \notin \Omega_p\ for\ p\leq Q\}$ .

Comparing with the structure of (***), we are going to describe an “amplifier” A(n) which is of the form

$A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(\frac{2i\pi na}{q})}}$

and has the property that |A(n)| is “large” whenever n is in S. If we have

$|A(n)|\geq B,$

for n in S, then by positivity we deduce

$B^2|S|\leq \Delta \sum_{q}{\sum_{a}{|\beta(q,a)|^2}}$

and if the last sum can be expressed conveniently and is not too large, we get an upper bound for |S| in this manner.

To build the amplifier, consider first a single prime p less than Q: if n is in S, we know that the reduction of n modulo p is constrained to not be in Ω_p. We can express this analytically by saying that the characteristic function of this set, evaluated at n, is zero. But now expand this characteristic function (say f_p) in terms of the additive characters (the “discrete Fourier transform”): we have

$f_p(x)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi ax/p)}$

with

$\alpha(p,a)=\frac{1}{p}\sum_{x\in\Omega_p}{\exp(-2i\pi ax/p)}.$

Thus for n in S, we have

$0=f_p(n)=\sum_{a\ mod\ p}{\alpha(p,a)\exp(2i\pi an/p)}$

and we rewrite this by isolating the contribution of the 0-th harmonic (which is the constant function 1, which encodes the probability that an integer modulo p is in Ω_p):

$\frac{|\Omega_p|}{p}=\alpha(p,0)=\sum_{(a,p)=1}{(-\alpha(p,a))\exp(2i\pi an/p)}.$

This is the basis of our detector! First, this defines the coefficients

$\beta(p,a)=-\alpha(p,a)$

and then, to extend this to all squarefree integers q up to Q, we use the Chinese Remainder Theorem and multiply the detectors for primes dividing q, getting coefficients β(q,a) for a coprime with q, such that

$\prod_{p\mid q}{\frac{|\Omega_p|}{p}}=\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}.$

For notational simplicity, we write

$c_p=\frac{|\Omega_p|}{p}.$

Thus the final detector for S is

$A(n)=\sum_{q\leq Q}{\sum_{(a,q)=1}{\beta(q,a)\exp(2i\pi an/q)}},$

and we have

$|A(n)|=\sum_{q\leq Q}{\prod_{p\mid q}{c_p}}.$

Calling this quantity B, as previously, the resulting inequality after applying (***), from the sketch above, is

$B^2|S|\leq \Delta A$

with

$A=\sum_{q\leq Q}{\sum_{(a,q)=1}{|\beta(q,a)|^2}}.$

To compute this, we invoke the orthogonality of additive characters, and their compatibility with the Chinese Remainder Theorem: we have first

$A=\sum_{q\leq Q}{\prod_{p\mid q}{\sum_{(a,p)=1}{|\beta(p,a)|^2}}},$

and then, using the (discrete) Parseval formula

$\sum_{a\ mod\ p}{|\alpha(p,a)|^2}=\frac{1}{p}\sum_{x\ mod\ p}{|f_p(x)|^2}=c_p,$

we get by removing the contribution of a=0 that

$A=\sum_{q\leq Q}{\prod_{p\mid q}{c_p(1-c_p)}}.$

Now we have an inequality

$|S|\leq \Delta \frac{A}{B^2},$

which is similar, but not quite identical, to (**). In general, it is somewhat weaker — though barely so for some problems like the twin-prime example above, where the sets Ω_p are quite small (i.e., what are called small sieves…)

To improve this inequality and get (**), we notice that we still can try other amplifiers. In particular, it is natural to notice that the inequality we got is not homogeneous if we replace the coefficients β(q,a) by multipliying by constants (depending only on q, multiplicatively). Indeed, if we consider now

$\gamma(q,a)=(\prod_{p\mid q}{\lambda_p})\beta(q,a)$

we replace B with

$B_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_pc_p}}$

while A is replaced with

$A_1=\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}}$

and we can try to minimize the ratio A₁/B₁² when the coefficients λ_p are allowed to vary. This is the same classical problem that occurs in the Selberg sieve (see for instance the write-up by Ben Green of the application of the Selberg sieve to the twin-prime problem, already mentioned, for an introduction): minimize a quadratic form with respect to a linear constraint (with a different quadratic form), and it is easily and elegantly solved: we casually Cauchy

$B_1^2\leq (\sum_{q\leq Q}{\prod_{p\mid q}{\lambda_p^2c_p(1-c_p)}})\times (\sum_{q\leq Q}{\prod_{p\mid q}{c_p/(1-c_p)}})=A_1H,$

so that, first of all, we can not do better (i.e., make the ratio smaller) than

$\frac{A_1}{B_1^2}=\frac{1}{H}$

and second, by the equality case in Cauchy’s inequality, we can do something this good by taking

$\lambda_q=\prod_{p\mid q}{1/(1-c_p)}.$

This leads to the optimal value of the ratio and therefore proves the arithmetic large sieve inequality (**), as described previously.

A final remark: the simplicity of the argument suggests that it is probably not new. There are (in Montgomery’s survey article, for instance) a number of earlier derivations of the arithmetic large sieve inequality from the dual form of the analytic inequality. However, those I have seen (I must say I had not really looked at any of them until after writing the gist of my argument…) are based on, or inspired by, the connection with the Selberg sieve, and are therefore less elementary (and motivated). Still, the ingredients look always much the same, and I think this write-up is more a question of re-arranging them, instead of a really new proof.

Demystifying (a bit) the arithmetic large sieve

Published by

Kowalski