On the Impact of Hurricane Sandy on the Internet

On Oct 29 2012 in the evening the storm surge that hurricane Sandy caused hit the east coast of North America, and in particular New Jersey and New York. Wikipedia reports that in total 22 US states and a large portion of Eastern Canada are suffering from the direct consequences of the storm. The implications on the people living in these areas can hardly be imagined, and we send our sympathies and our best hopes for a fast recovery from the damage. Indeed, the damage done is large enough to have global impact. Even Internet reachability is impaired, and the results can be perceived from far-away Europe, as will be detailed in this blog post.

Dominik Schatzmann, a Ph.D. student in Bernhard Plattner‘s group (EDIT 2012-11-30: now Dr Schatzmann), has been investigating Internet reachability problems for his thesis for the last 4 years. His approach is orthogonal to conventional techniques which rely on control plane information or active probing for finding outages. In contrast, Dominik is using the information contained in NetFlow data from live traffic to detect which destinations could not be reached. The rational behind this approach is two-fold. First, by using passive measurements which are conducted for billing purposes anyway, there is no additional load on the Internet. Second, as the outage detection is driven by actual connection attempts, it is very simple to see which of the unreachable portions of the Internet are most important to the users. Both of these properties are desirable from an ISP’s perspective, as this reduces measurement cost and allows to solve the most important problems first. A summary about the Flow-Based Approach for Connectivity Tracking (FACT) used to analyze the impact of the hurricane on the Internet was recently presented at the Passive Active Measurement Conference (PAM). In the meanwhile, a master student of ours (thesis to be published) reduced the noise level of the approach by priming FACT with “stable server sockets”.

Remarkably, the technique is sensitive enough to detect data center and routing outages on a global level based on measurement data collected on the border of the SWITCH network, the swiss national research and education network. The plot below shows how many BGP prefixes have been found “unreachable” within 5 minute time bins by a certain number of local clients. The grey area corresponds to at least a single client, the red area to at least ten clients, and the blue area to at least ten clients, but filtered for US destinations only. The picture actually shows two independent incidents: First, the impact of Sandy on US reachability, and second a router outage several days later.

As can be seen a certain level of permanent background noise manifests in the gray area, showing a time-of-day pattern induced by the local client population. However, when comparing the outage periods that affect at least 10 clients we see that the noise is mostly canceled out and almost all of the affected prefixes are located in the US. The outages started only a few hours after Sandy entered the continent, and continued for a couple of days loosely following the time-of-day pattern with peaks around noon.

On Nov 2nd around midnight UTC a different kind of incident happened. In particular, a router inside the SWITCH network crashed due to a bug in the router firmware. This pattern looks quite different. First, the fraction of US based prefixes affected is a lot lower (note the log scale!). Second, the effect manifests in the early morning hours when typically there is a minimum in the time-of-day pattern. Yet, the overall magnitude is higher than for the outages induced by Sandy, thus highlighting the effect of locality.

We further broke down the locations of prefixes located in the US on the first day of the Sandy outage with the Maxmind geolocation database. The result is shown in the picture below.

There are two interesting observations. First, the prefixes hosted on the Bermudas – completely covered by the large red dot on the picture – caused a lot of discontent among SWITCH clients. One may be wondering why this is the case. Indeed, here are two factors involved. First, the QuoVadis certificate service, part of which is hosted on the Bermudas, was not reachable any more. And second, the connectivity to the Bermudas is relayed through New York, either directly or via New Jersey or, a few hops longer, through Brazil. Similar to the Bermudas, other locations in the US have been affected as well, including the center and the west coast. Checking the global picture, even some reachability impairment towards China could been observed.

Overall, we think it is fair to claim that Sandy had a global impact on Internet reachability. FACT can be used to detect such problems and quantify their importance from an ISP’s perspective. As such it is complementary to control plane analysis and to active measurement based approaches.

Dominik will defend his Ph.D. thesis tomorrow. We all wish him the best of luck! Moreover, we want to thank SWITCH and in particular Simon Leinen for supporting us and making this research possible.

EDITED 2012-12-18: Gave Dr Schatzmann his proper title — neuhaus

Posted in General | Leave a comment

The Axiomatic Method and Security Metrics

[Note: this blog post is a commentary and does not necessarily reflect the opinion of the Communication Systems Group.]

I have just returned from MetriSec 2012, which was a complete success in my opinion. Peter Gutmann delivered an excellent keynote, the participants had a great roundtable discussion, and the refereed papers and the invited talk were of high quality.

I did, however, have a point of (passionate but polite) disagreement with Riccardo Scandariato. As far as I could tell, Ric advocated the use of the axiomatic method in security metrics, by which he meant defining the properties of a security metric beforehand and axiomatically and then looking for actual metrics that satisfy the axioms. I am assuming that this is not mere requirements engineering, which is of course a sensible thing to do, but an honest-to-god axiomatic approach. The reasoning behind this is that usually, you poke around in the dark, take whatever metrics you find useful, and then figure out the properties of the metric you found. This will usually not lead to metrics that have desirable properties; therefore much energy is wasted because you have to throw away many metrics. Another plus is that you can build a theory of metrics and find out their properties simply by following the axioms, combining them, and seeing where logic takes you.

All of this is true, but the axiomatic method simply cannot be used for security metrics. Axiomatic methods can be used for mathematical objects, but what we have here looks like a mathematical object, but it doesn’t quack like one. The reason is that security is not an abstract property, but a property that only holds in the real world, because it is intimately connected with actual machines, an actual environment, and actual humans. If you axiomatise, you might (and, I would argue, will) discard metrics that tell you something useful about the system under consideration, simply because they don’t have some nice theoretical, axiomatic properties. In fact, I would argue that because details matter in security, a metric that is good for one system might be complete rubbish for another, similar system. Axiomatisations, being necessarily rather short, would be unable to find the minute differences between systems that decide on the usefulness of a particular metric.

I will go even further and bet that for every (short and general) axiomatisation that Ric finds, I can come up with (a) a metric that satisfies the axioms, and (b) a plausible system for which this metric is rubbish.

But even if I could not find counterexamples to an axiomatisation: no discipline that I know, and in which measurement plays a role, uses an a priori axiomatisation to find metrics. Physics, Chemistry, Mechanical Engineering, and even Psychology and other softer sciences use the “poke around in the dark” approach until they find something that gives useful results. I believe that the reason is that you need a good theory of the thing you’re studying before you can come up with a reasonable axiomatisation. For example, Newton’s laws can be used as axioms, but they were empirically observed facts before. In security, we simply do not have such a body of knowledge that would allow us to formulate a theory of secure systems. Trying to find metrics this way seems to me to put the cart before the horse.

Case in point: Riccardo mentioned that axiomatic approaches could be used to find out composition rules for metrics. That means that if I have a metric for system A and one for system B, and if I know how the two systems are to be put together, I can then find a metric for this combination of A and B. Now, security is a property that is at best brittle with respect to composition. (At the conference, I actually said that “security is not composable”, but this is of course wrong; what I meant, and what I should have said, is that security is not necessarily composable.) The composition of A and B might well have security properties that are neither a function of A, nor of B, nor of the particular manner in which they are connected. Details matter, and the environment matters. I find it impossible to believe that a reasonably general axiomatisation of such a composition could be small enough to lead the search for metrics in a useful direction.

A perfect example of this can be found in a paper that takes thirty security protocols and combines them.  From the abstract: “Formal modeling and verification of security protocols typically assumes that a protocol is executed in isolation, without other protocols sharing the network. We investigate the existence of multi-protocol attacks on protocols described in literature. Given two or more protocols, that share key structures and are executed in the same environment, are new attacks possible? Out of 30 protocols from literature, we find that 23 are vulnerable to multi-protocol attacks.” (Cas Cremers, Feasibility of Multi-Protocol Attacks, First International Conference on Availability, Reliability and Security (ARES’06), April 2006, p.287. Hat tip: Peter Gutmann.)

The only way to find good security metrics is to study actual systems in great detail, not to invent axioms. That is an activity best left to mathematicians.

[Edit: Changed “Riccardo wants axiomatically to find metrics that can be composed” to “Riccardo mentioned that axiomatic approaches could be used to find out composition rules for metrics” Thanks for the clarification, Ric!]

Posted in Commentary | 1 Comment

Come to MetriSec 2012 (Part 2)!

Yesterday’s post was about the exciting keynote at this year’s MetriSec. Today’s post is about another highlight, the panel.

One of the biggest problems in empirical studies about computer security is the data. Usually you can’t control the data acquisition process yourself; instead, you need to take other people’s work and use that. For example, you could be using Mozilla Foundation Security Advisories, or the National Vulnerability Database. Then the question is, to what extent can you trust this information to be complete and unbiased?

The answer is that you cannot, at least not without knowing the process by which these databases are created. For example, many researchers have for years believed that the NVD constitutes some kind of ground truth. If that were true, then one would expect that entries that have been in the NVD for some time will in general not change. Work currently being done here at ETH indicates, however, that the amount of change, or churn, in the NVD is quite high, and that even very old entries get changed!

The panel discussion at MetriSec will discuss these problems. I will moderate, and participants will be at least Laurie Williams, Peter Gutmann, and Fabio Massacci. All three have much experience with empirical work, so I expect a high-class discussion.

And this is why you should come to MetriSec 2012, too!

[Edit 2012-07-18: Added Fabio as panelist.]

Posted in General | Leave a comment

Come to MetriSec 2012 (Part 1)!

This post is not a technical article, but in-house advertising.  I am a proud co-chair of MetriSec 2012, an international workshop on security metrics and related topics.  This year’s programme is a bit unusual. Sure, we have papers, but we also have two more big attractions: the keynote speech and a panel discussion. This post is about the keynote speech; the panel discussion will be the subject of a future post.

This year’s keynote will be given by Peter Gutmann, whose book Cryptographic Security Architecture: Design and Verification should be on every security practicioner’s bookshelf. I know Peter from way back, when we were members of the PGP 2.0 development team.  That must have been in 1992 or so. Peter is one of those rare people who have deep knowledge at magnifications spanning several orders of magnitude. You can talk to him about X.509 minutiae or about human factors in security engineering (a topic about which, in his own words, he occasionally grumbles about).

For most of us mere mortals, even a single paper at Usenix Security is a lifetime dream that often goes unfulfilled. Peter has seven papers published at Usenix Security, the first in 1996 and then in six straight years from 1998 to 2003. (One wonders what happened in 1997.) His most often cited paper is at the same time his most misunderstood: Secure Deletion of Data from Magnetic and Solid-State Memory (Usenix Security 1996) showed, first, how overwriting data on MFM- or RLL-encoded hard drives leaves enough traces of the original data to allow its eventual recovery, and, second, how to choose bit patterns that will cause the magnetic fields generated by the write head to fluctuate in just the right manner that deletion will be “deep” and recovery of the original data all but impossible. The paper gave 35 such bit patterns that resulted from an analysis of the specific data encodings, but of course, what happened later was that the article was very badly misunderstood. It’s worth quoting from Peter’s epilogue to his paper:

In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don’t understand that statement, re-read the paper). If you’re using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, “A good scrubbing with random data will do about as well as can be expected”. This was true in 1996, and is still true now.

Indeed, when you use Apple’s Disk Utility to securely delete data on a drive, you will find three options: the first option is to use one pass (probably with zeroes or pseudorandom data), the second option uses seven passes (probably a variation of DoD 5220.22-M), and the third option, called “Most Secure”, uses, you guessed it, thirty-five passes. And this despite the fact that “you never need to perform all 35 passes”.

Peter’s keynote, From Revenue Assurance to Assurance: The Importance of Measurement in Computer Security, will draw lessons from the way telecoms used metrics to bill for mobile phone usage and apply them to the field of security. I am greatly looking forward to this talk, and if you are involved with security or measurement or both, I dare say you could benefit greatly from it.

So, do consider registering to MetriSec, I’m sure it will be worth your while.

[Edited 2012-07-09: Re-worded intro, fixed small language issues.]

Posted in General | Leave a comment

Happy 2012!

A happy 2012 to all, from the Communication Systems Group!

Posted in General | Leave a comment

Read the Classics!

I have recently read a book about the history of statistics, and the author made me aware of several books by R. A Fisher, or, to give his full name, Sir Ronald A. Fisher, Sc.D., FRS, one of the giants of the field.

Fisher produced three books in particular that are the main source of his fame. These are “Statistical Methods for Research Workers”, “The Design of Experiments”, and “Statistical Methods and Scientific Inference”, and all three are available in an omnibus edition for a reasonable price.

Continue reading

Posted in General | Leave a comment

Sonya Peter’s Face

Recently I was contacted on Facebook by a good-looking girl called Sonya Peter, roughly at my age. Like me, she is working at ETH Zurich and she even has a common friend with me. Also her taste of music matched mine quite well. But no matter how hard I tried, I couldn’t remember meeting her. Continue reading

Posted in General | Leave a comment

The Value of Scientific Presentations

In conferences, I often come across science that is great. Results are amazing, people have done something that was thought not to be doable, or they have built a system that has incredible properties.

However, that same science is often presented in a way that makes it clear that the presenter has more confidence in his science than in the presentation.

Continue reading

Posted in General | Leave a comment

Johnny Still Can’t Encrypt

There once was a very good article about user interface issues in PGP 5.0, called “Why Johnny Can’t Encrypt“. In this year’s Usenix Security Symposium (Simply called “Security” by those in the know), there was an article called “Why (Special Agent) Johnny (Still) Can’t Encrypt“, and I, being a fan of the former paper, immediately downloaded and read the latter, especially because it had won an outstanding paper award. Continue reading

Posted in Security | Leave a comment

Stats Tip #4: Make Use of Unparametric Tests

In statistical hypothesis testing, you often have the choice between tests that assume a certain distribution of the underlying data and tests that don’t make these assumptions.  For example, when evaluating a drug trial, you can choose between, e.g., the t-test, which assumes that the experimental data is normally distributed, or the Mann-Whitney U test, otherwise known as the Wilcoxon rank-sum test, which makes no assumption about that distribution. As Rebecca Black might say, “which test should you take?”

Continue reading

Posted in Stats Tips | Leave a comment