Historical distributions

Until I borrowed it recently from a friend, I hadn’t looked at the autobiography of L. Schwartz. As it turns out, I found that his memories of the invention of the theory of distributions were quite fascinating (some other parts, like learning that he did not immediately dismiss the possibility of becoming head of some trotskyist party after the war, were amusing; as Wikipedia delicately puts it, he was “leaning towards socialism”).

He explains that he was looking for a definition of “generalized function” in order, roughly, to put on a better footing the notion of weak solutions of certain partial differential equations (such notions already existed in the work of Leray, among others): the problem was that there was a way to define what it means for a non-smooth function f to satisfy a PDE

$\sum_{p}a_p D^p f=0$

but there was no definition or proper meaning of the various terms in such a sum!

Then he says that, during one night in November 44 (la plus belle nuit de ma vie), he had a flash of insight; he worked feverishly (fiévreusement) until the end of 44, having described the solution to Henri Cartan and Bourbaki and received their enthusiastic endorsement.

Then — and this is the fascinating part I had never heard about — he realized his definition was not the right one! It was more complicated than the definition which he found in Grenoble in late 1944, and from almost any rational analysis (after the facts), it should never have come before.

Basically, his former definition was the following: an 11/44-distribution was supposed to generalize (and encompass) the notion of a smooth function φ (defined on the real line) by generalizing the convolution operator Tφ associated to φ:

$T_{\phi}\,:\, f(x)\mapsto f\star \phi (x)=\int_{\mathbf{R}}{f(y)\phi(x-y)dx}$

which he saw (already a new step) as defined on the space of test functions (i.e., smooth functions with compact support), and with image in the space of smooth functions. So he considered all such operators, say

$T\,:\, f\mapsto T(f)$

satisfying some additional properties (T must commute with convolution operators associated to smooth functions with compact support, and T must be continuous in some suitable sense). Because of the well-known property

$(f\star \phi^{\prime})(x)=(f\star \phi)^{\prime}(x)=(f^{\prime}\star \phi)(x)$

linking convolution and differentiation, he could easily define a derivative for all such operators by defining

$T^{\prime}(f)=T(f^{\prime}),$

and thus give a meaning to all the terms at least in a constant coefficient PDE (such as the Laplace equation) applied to such an operator.

What was wrong? As Schwartz tells it, the first “bad taste” (goût amer) was the fact that defining the product of an 11/44-distribution with a smooth function u required a rather painful definition (emberlificotée). The second was that he didn’t succeed at all in defining a Fourier transform.

And then, he said, he suddently realized that he could have used a much simpler point of view; and this is the one which is currently universally used: instead of generalizing convolution operators, a distribution generalizes the linear functional

$F_{\phi}\,:\, f\mapsto \int_{\mathbf{R}}{f(x)\phi(x)dx}$

(there is a link however, since one has

$F_{\phi}(f)=f\star \tilde{\phi}(0),\quad\quad\text{where}\quad\quad \tilde{\phi}(x)=\phi(-x)$

for all f). The derivative is now obtained by generalizing the classical integration by parts:

$F_{\phi^{\prime}}(f)=\int_{\mathbf{R}}{\phi^{\prime}(x)f(x)dx}=-\int_{\mathbf{R}}{\phi(x)f^{\prime}(x)dx=-F_{\phi}(f^{\prime})$

(because f is smooth, we can differentiate it; because it has compact support, the boundary terms vanish), and the product of a distribution with a smooth function u poses no problem:

$(uF)(f)=F(uf)$

which is well-defined since uf is again smooth with compact support. So linear partial differential operators can immediately by applied to distributions. The Fourier transform requires some care: the right space of test functions (the “Schwartz space”…) is needed to make sense of the definition suggested by the Plancherel formula:

$\hat{F}(f)=F(\hat{f}),$

and the Schwartz space is just what is needed so that this definition work perfectly.

As Schwartz points out, this point of view of linear functionals should probably have been the “obvious” one: it is also a generalization of the approach to measure theory and integration by duality with continuous functions, which was popular in France at the time (after A. Weil’s book on integration in topological groups): a measure μ is defined as a (suitably positive and continuous) linear map,

$f\mapsto \mu(f)$

where f is assumed to be continuous and compactly supported. The parallel is clear… at least a posteriori.