Version control

When writing my papers (and for many other things), I have been using some kind of version control software since about 1999. Among other things, what version control does is to allow you to preserve as many intermediate versions of your work as you want; any time I reach a point that I may want to preserve, a suitable command indicates that the current state of things must be remembered. Later on, it is also easy to recover those intermediate stages, and/or to compare one of them to the current state of the work (or indeed to compare two of the previous stages directly). In particular, it is useful to tag a paper as soon as it is submitted for publication. Then, even if I make changes or corrections while waiting for the refereeing process to slowly snail itself to send a report back, it remains very easy to look back (one, two, three or four years later) to check what was the exact text I had sent to the journal.

Version control is of course also particularly useful when collaborating because no amount of mistyping can lead to a situation where local changes by me erase a brilliant ten page development by a co-author (well, not really, since it is a theorem that for any computer there is a finite combination of letters which, typed properly, will erase everything on it…). It was particularly useful when I was translating into English a friend’s book on methods of mathematics for physicists. We both had the software configured so that any changes could be communicated to the other by asking the software for what had changed between our local copy and the reference version it controls. Since there were around a hundred source files for this book, including copies of the French original and of the (sometimes partially translated) English versions, plus figures, control files, etc, this was really convenient.

It is unfortunately not so trivial to set up version control, which is more commonly used by computer scientists for their projects (source code replacing LaTeX files). It should be fairly easy for a university to offer this as a service to its faculty (ETH, for instance, has something like this for software — anyone can register a software project at origo.ethz.ch and this platform provides all that is needed, including version control, bug tracker, forums for interaction with users, access control for closed-source projects, wiki,…) but it doesn’t seem to be common for more academic work. (It could of course be a good project to adapt the system to this type of situation).

After moving to ETH, the setup I had been using in Bordeaux for about 8 years was not easily reproducible, and it took me a while here in Zürich to arrive at a smooth operation. (It’s still not quite as nice as before, but it’s getting there). I use the svn software for version control (with Linux), mostly for historical reasons (I know there are many other options). For local work on a single computer, the setup would be quite simple, but part of the point of this exercise is that I can keep multiple copies of my (past and present) work on different computers, and continue writing any paper on any of them, submitting the changes to a central server from which they can be recovered on all the others (this is useful also for backup, and during travels; if I make changes on some important files on my laptop during a conference or a visit to wonders of the earth, I can usually expect to be able to save it to the central server when I have internet access, and thus avoid losing too much if the laptop is destroyed or stolen — which has happened to me once…).

The main configuration issue is to have this server accessible from other computers in a semi-permanent manner. My setup in Bordeaux did not translate well to Zürich (partly for reasons of complexity and having forgotten part of the operating procedure…) However, the following works fine for the moment: I use a VPN connection to ETH to connect to the internet, which provides an IP address visible from the outside, even from behind a router/firewall. Then dyndns allows me to keep a permanent URL to access the server (which runs on an Apache web server with the required SVN module to permit access, and fortunately I just had to copy the old configuration file to make this work…)

An exercise with orthonormal basis

While writing the general case of the large sieve, one question of minor interest arose, which provides a nice exercise (or exam problem) for a course on finite-dimensional Hilbert spaces.

Since it’s not yet possible (as far as I can see, but I will try to investigate the issue) to include either LaTeX formulas (in the style used in a number of WordPress math blogs) or MathML formulas in the ETH blogs, I’ve resorted to the rather embarassing option of using dvipng to produce an image with the LaTeX content of this post…

[LaTeX text]

Coming back to regular HTML, where one can make links, here’s one to the short note I wrote on this, with the proof of the result indicated. Note that I would be surprised is this were really at all new. There’s one lingering question that I haven’t answered at the moment: does there exist a proof by pure thought that, for the uniform density, there is an orthonormal basis of functions with constant modulus 1?

Mathematical terminology of yore…

While browsing through some issues of Annals of Math. a few days ago, I read the following rather spectacular example of long-forgotten mathematical terminology. It is in a paper by A. Borel, “Groupes linéaires algébriques”, Annals of Math. 64, 1956, p. 20 to 82, which is one of the foundational papers of the whole theory of linear algebraic groups. In this paper, Borel found what groups could play in algebraic geometry the role that “tori” played in the theory of Lie groups (tori being there groups isomorphic to Rn/Zn). Reasonably enough, he decides to call them by the same name (“nous nous permettrons de leur donner le même nom”, or “we allow ourselves the right to give them the same name”). But they had been introduced also by Kolchin earlier (in papers on differential Galois theory from the later 40’s), for whom they were apparently “connected quasicompact (commutative) algebraic groups” (the adjective “commutative” being optional by virtue of a theorem of Kolchin). Similarly, it seems that unipotent groups were called “anticompact groups” by Kolchin.

In the same vein, it is also amusing to see Borel write what translates to “Algebraic linear groups” instead of the now dominating terminology “Linear algebraic groups”; the change of emphasis is quite interesting. But Kolchin’s earlier works spoke of “Algebraic matric groups”, which looks like a misprint, but is not. In fact, the Oxford English Dictionary (which we are lucky to have available online here at ETH…) does confirm that “matric” is an English word, as an adjective, meaning “Of, or relating to a matrix or matrices”. The earliest quotation given is from 1921, and the latest is from 1994 (in a paper in the Rendiconti del Circolo Matematico di Palerma; now I would have bet that this latest example was a misprint, as many occurences of the word “matric” in Mathscinet undoubtedly are, but it doesn’t seem so, since the word occurs in the title and in the body of the article. Interestingly, the OED does not gives the names of the authors for this citation (they are I. Bajo and J. Torres Lopera). I wonder (idly) how many mathematicians (in particular, how many born after 1900) have the honor of being quoted in the OED…

Beyond the example of tori (which shouldn’t be taken too seriously, and certainly not as a negative comment on Kolchin!), maybe there is some lesson in this. Often, in particular when a new theory emerges, it’s not clear which concepts should be emphasized and which are less important (or more derivative; for Kolchin, the pivotal role of algebraic tori was probably not obvious at all). However, once the picture clarifies, the terminology should also bring this into focus, and the most basic concepts should have basic (and hopefully, short) names.

For instance, one reason that modern algebraic geometry may seem appealing (for budding mathematicians of a certain kind in particular, whatever their technical inclinations might be) is that quite a lot of the vocabulary not only sounds nice (even almost poetic, of a sort), but is also well considered (from sheaves and schemes to étale and crystals…). On the other hand, I’ve tried a few times to read books on C*-algebras, and always find the terminology painful.

There’s probably not much mystery on why bad terminology should be dominating. To find good terminology requires a particular type of creativity; to create great mathematics another type, and there’s no reason the two should usually be combined in the same persons. Except in very few cases, creating the terminology is left to the creators of the theory; if they have no ear for poetry, or no interest in spending some time looking for the right words when new mathematics calls them instead, one may well be left to deal with terrible choices. (It’s a good thing that someone – Weil maybe? or Chevalley? – decided that “valuation vector”, as used by Tate in his thesis, was an awful name for an adèle…) Of course, to avoid the worse, there is the usual escape route of naming a concept after someone, creating bad blood and long-running misunderstandings instead. Maybe Borel should have also tried hard to find a nice name for maximal connected solvable subgroups…

One may wonder why to bother at all about such terminological issues. After all, in terms of mathematical content, a torus is a torus is a connected quasicompact commutative algebraic group. I must say I can’t really explain why I find this of any importance, but I do!

What mathematical objects do computers really know? (Part I)

In this first post, I am coming back to the first of the two papers already mentioned in my (second) first post, which is joint work with F. Jouve and D. Zywina, and titled “An explicit integral polynomial whose splitting field has Galois group W(E8)“.

The paper’s avowed purpose is very straightforward to state: we give an explicit polynomial (call it P), monic and with integral coefficients (in fact, palindromic), of degree 240, with the stated property in the title. Here’s a PDF file containing it (if you print this file and stitch the pages together, any resemblance of the resulting outline shape with a surfboard, when viewed from afar, is entirely coincidental).

In fact, I am not going to spend much time discussing the mathematical arguments in the paper. Rather, since the proof that P does indeed have the stated properties depends on computer help, I’m going to describe how this comes about, and more importantly explain why I believe we do have a proof of our statement, which is a valid mathematical proof of it (except maybe for the most paranoid among mathematicians). Note that I’m essentially a full-blooded mathématicien platonicien, and so it is a fairly serious matter to me to be sure that there is no cheating.

Of course, there has been a lot of discussion of the use of computers in mathematical proofs, involving much more important results, but as is often the case, things can become a lot sharper in one’s mind when dealing with a specific problem, and for me this was the first time the question really arose.

[Just a few words on the mathematical background: the result is a particular case of the “explicit” Inverse Galois Problem, and the explicitness is indeed what is of interest; it has been known for a long time (though a formal statement is not easy to find) that infinitely many integral polynomials exist with splitting field the group W=W(E8), and in particular Shioda described a way to find, in principle, all such polynomials. Várilly-Alvarado and Zywina have managed to create concrete examples from Shioda’s method, but the coefficients of the ones they obtained are huge. Ours can be seen as rather reasonable in a way (the largest coefficients have about 100 decimal digits). Our construction is based on completely different ideas, where the origin of the “low complexity” is pretty clear: the principle is that, given a “generic” matrix in some (algebraic) matrix group G, with rational coefficients, the Weyl group of G is typically going to be the Galois group of the splitting field of the characteristic polynomial of this matrix. Although we haven’t found formal statements of this type, it’s actually unlikely that this is very new. For instance, for GL(n), where the Weyl group is the symmetric group on n letters, this principle is more or less obvious (but not necessarily easy to prove!). We apply this to the exceptional group of type E8 (which can be seen in a natural way as a subgroup of GL(248)), and to get a sufficiently generic element, we follow the idea of random walks: we take some elements in E8(Q), multiply them, and manage to check that the resulting matrix works. We multiply very simple matrices (analogues of the elementary matrices in GL(n), as I’ll describe a bit in more detail in the second part), and few (in fact, 16) of them, so the resulting matrix has small coefficients, which explains the low complexity of our polynomial.]

To describe how our result depends on computer help, recall first that the Galois group (say G) of P can be seen in a natural way as a group of permutations of its 240 roots. Similarly, it turns out that W can be seen as a permutation group of 240 objects, which may be interpreted either as the vectors of minimal (non-zero) length in the lattice (which is often also denoted E8, though here this would only create confusion) of which it is the automorphism group, or as the so-called roots of the exceptional group E8 with respect to a chosen maximal torus (and you may find confusing this use of the same word “root” for two apparently very different things, but in fact it is precisely because of the link with zeros of characteristic polynomials that the terminology was introduced in Lie theory…)

So, roughly, we must check that the Galois group (say G) of P is (1) not smaller than W; (2) not bigger than W. Both facts require some computer help, but in very different ways. I’ll discuss (1) now, and (2), which is more significant, in another post.

For (1), the Magma mathematical software is used as some kind of database holding extensive information about W, and able to perform computations in it. Since W has order about 700,000,000, some mechanical help may seem indispensible, but in fact there is some chance that a pure thought proof of this first part could exist. And even if some computations are needed, one shouldn’t think it is impossible to get one’s hand (usefully) dirty with W: Élie Cartan, in 1896, computed its Jordan-Hölder series (there is a normal subgroup of index 2, and the quotient has a normal subgroup of order 2, with quotient simple, in fact isomorphic to an orthogonal group of dimension 8 over the field with 2 elements), and it’s safe to assume that he couldn’t just use his laptop for this purpose.

We get information on the “size” of G by looking at the factorization pattern of P modulo primes: it is a standard, and very useful, fact of algebraic number theory that if P has no repeated root modulo a prime p, and factors as a product of n1 distinct linear factors, n2 distinct irreducible quadratic factors, and so on, then as a subgroup of the permutation group of the roots, G contains a permutation which is product of n1 1-cycles, n2 (disjoint) 2-cycles, and so on.

Now, the computer knows about all (conjugacy classes of proper) maximal subgroups of W, and for each of them (say H), knows the cycle types of all conjugacy classes of elements of H. If by reduction modulo primes we find cycle types which do not all occur in such a maximal subgroup, then it is not possible that G be isomorphic to a subgroup of W. This is what happens when reducing modulo 7 and 11 (no more!). Note that here it is assumed implicitly that the cycle types which arise from reduction of P correspond to cycle types of element of W; this is not obvious a priori, and is in fact linked with the second part of the proof, and so I’m leaving some explanation for the second post. (For some context: there are only 112 conjugacy classes in W; they can be described in an abstract way which makes them manageable, as done by Carter for all Weyl groups).

At first, this fact looked just like a lucky coincidence, highly dependent on being able to list the data mentioned above (maximal subgroups and their conjugacy classes). However, a closer look shows that we got (by luck, maybe) some not so random conjugacy classes. The first one is the square of the so-called Coxeter class c. Now, the latter is very canonical (it can be defined for all finite reflection groups, at least, and for the symmetric group on n letters, for instance, it is the class of the n-cycles), and it has many interesting properties.

The second conjugacy class is less remarkable, but is in fact the biggest in W (which certainly explains why it comes out so easily!), and the only property we need is that it is an odd permutation (any other odd permutation would do). Now, of course, the other class c2 is an even permutation, and so what we actually need about W is the following:

Fact of life: Let G be a proper subgroup of W, the Weyl group of E8, containing an element in the conjugacy class c2. Then G is contained in the kernel of the signature homomorphism.

(Note that the signature can be defined on W without referring to the embedding in the permutation group of 240 objects).

This statement (to me) seems reasonable enough that a direct proof could well exist (it does seem somewhat surprising, on the other hand). And, in a way, such a proof already exists: using the Jordan-Hölder series of W, one can check this using the printed Atlas of Finite Groups, and hence depend only on standard mathematical arguments and published literature, which is at least the usual standard of mathematical proof. But maybe there could be an argument even more conceptual.
Now, with this done, let’s wait until the next post to discuss the second part of the proof…

On Murphy’s law…

Welcome to the second version of my first blog. The first version turned out to be quite ephemeral since the ETH blog database failed on Friday in some way, leading to its utter destruction. Hopefully this one will last longer, and at least I will try to backup more carefully (by chance, I didn’t lose anything, but that’s only because I had an open window with all the content at work…)

Anyway, here’s what I said in the first post of the first blog:
Before writing a “real” post, I will just point out my home page at ETH, and in particular the list of papers and preprints, where I have recently added two articles, which I will probably describe later on:

(1) An explicit integral polynomial whose splitting field has Galois group W(E8), (joint work with F. Jouve and D. Zywina) which does pretty much what the title says…

(2) The large sieve, monodromy and zeta functions of algebraic curves, II: independence of the zeros, where the issue of relations between zeros of L-functions is investigated in the case of finite fields, with the help of sieve methods and the Riemann Hypothesis of Deligne.