In the “never too late to learn that” section…

As a number theorist interested in probability, I have accumulated my knowledge of that field (such as it is…) in a rather haphazard fashion, as is probably (pun intended) typical. The gaps in my understanding are wide and not really far between, and I can’t generally expect to take advantage of teaching a class to fill them (we have very good probabilists in Zürich), though I did teach introductory probability two years in Bordeaux. So it’s not so surprising that I stumbled this morning in my lecture on expanders when defining a Markov chain (which deserves some “tsk, tsk” from my ancestors, since my mathematical genealogy does trace back to Markov…)

More precisely, given a graph \Gamma, say k-regular, I defined a random walk on \Gamma as a sequence (X_n) of random variables taking values in the vertex set V, and which satisfy the conditional probability identity
\mathbf{P}(X_{n+1}=y\mid X_n=x)=\frac{a(x,y)}{k}
for all n\geq 0, and all vertices x, y, where a(x,y) is the number of edges (unoriented, oeuf corse) between x and y.

But, as one of the CS people attending the lecture pointed out, this is wrong! The definition of a Markov chain requires that be able to condition further back in the past: for any choices of x_0, x_1,\ldots, x_n, one asks that
\mathbf{P}(X_{n+1}=y\mid X_0=x_0,\ldots, X_n=x_n)=\mathbf{P}(X_{n+1}=y\mid X_n=x)=\frac{a(x,y)}{k}
which corresponds to the intuition that the behavior at time n be independent of the part trajectory of the walk, except of course for the actual value at time n.

What puzzled me no end was that I had failed to notice the problem, simply because this more general condition did not play any role in what I was doing, namely in proving the convergence to equilibrium of the random walk! The difference, which was clarified by my colleague M. Einsiedler, is that although the “wrong” definition allows you to compute uniquely the distribution of each step X_n as a function of the initial distribution X_0, and of the transition probabilities a(x,y)/k, one needs the actual Markov condition in order to compute the joint distribution of the whole process (X_n), i.e., in order to express such probabilities as
\mathbf{P}(X_n=x_n,\ldots,X_n=x_0)
as functions of the distribution of X_0 and the transition probabilities.

(My confusion arose partly from the fact that, up to this class, I exclusively used random walks on groups as Markov chains, defined using independent steps, and these are automatically Markovian; I’ve actually exploited the “strong” Markov property in a recent paper with F. Jouve and D. Zywina, but in that specialized context, the point above — I’m not sure if it deserves to be called “subtle!” — didn’t register…)

Published by

Kowalski

I am a professor of mathematics at ETH Zürich since 2008.

One thought on “In the “never too late to learn that” section…”

  1. That must have been an unsettling experience indeed…

    A funny thing, as you noticed, is that the difference between the two hypotheses is invisible when one looks at the distribution of any given position X_n, and even when one looks at two successive positions (X_n,X_{n+1}), but that it can become visible when one looks at *three* successive positions (X_n,X_{n+1},X_{n+2}).

    For a simple example of this, fix a positive parameter a\text{\textless} 1/2 and consider two different processes on a discrete circle S=\mathbb Z/d\mathbb Z with d\ge 3. The Markov version is the Markov chain on S such that, when at x, one stays at x with probability 1-2a, one jumps at x+1 with probability a and one jumps at x-1 with probability a.

    In the non Markov version, one first draws a sign U=\pm1 at random, with equal probabilities for U=+1 and for U=-1. The value of U decides whether the process moves clockwise or anticlockwise. More precisely, when at x, one stays at x with probability 1-2a, and one jumps at x+U with probability 2a.

    Finally, assume in both cases that the initial position X_0 is unfoirmly distributed on S. Then, for every time n, one can check that in both cases X_n is uniformly distributed on S and (X_n,X_{n+1}) is uniformly distributed on the set of the 2d pairs (x,x\pm1) with x in S.

    But the event [X_n=X_{n+2}\ne X_{n+1}] (aka *doing a U-turn*) has positive probability in the Markov case and zero probability in the non Markov case.

Comments are closed.