In my last articles, I argued that next-generation Internet security requires collaboration and that privacy concerns are the main road block for such cooperative solutions. I’ve also discussed network trace anonymization as a potential solution to the privacy issues with network data. Unfortunately, the delicate privacy-utility tradeoff involved in anonymization makes it impractical for real-world use.
In the current article, I’d like to explore a completely different approach to privacy-preserving collaboration. Instead of further tuning the ill-fated privacy-utility tradeoff for anonymization, we set out for a very ambitious goal:
- No compromise on privacy!
- No compromise on utility!
What sounds impossible at first sight, has in fact been studied in cryptography under the name of secure multiparty computation (MPC) for almost three decades. However, as we know, there is no such thing as a free lunch. The price to pay for this miracle is a substantial overhead in computation and communication costs.
Kissing the Sleeping Beauty
For almost thirty years, MPC techniques have been studied for solving the problem of jointly running computations on data distributed among multiple parties, while provably preserving data privacy without relying on a trusted third party. In theory, any computable function on a distributed data set is also securely computable using MPC techniques. However, designing solutions that are practical in terms of running time and communication overhead is far from trivial. For this reason, MPC techniques have mainly attracted theoretical interest in the last decades.
Adopting MPC techniques to network monitoring and security problems introduces the additional challenge of having to deal with voluminous input data that require online processing. For example, anomaly detection techniques typically require online monitoring of how traffic is distributed over ports or IP addresses. Such input data impose stricter requirements on the performance of MPC protocols than, for example, bids in a distributed auction. In particular, network monitoring protocols must process potentially thousands of input values while meeting near real-time guarantees.
One reason why MPC operations are inefficient for networking purposes is that the prevalent design paradigm aims at reducing the number of synchronization rounds between participants at almost all costs. The overhead of synchronization rounds is believed to be the main source of delay for MPC protocols, while the contribution of local computation time is considered neglectable. In our USENIX Security 2010 paper, we show that with many MPC operations running in parallel, as required by our scenarios, the time for starting synchronization rounds is amortized over all running operations and is not dominating the running time anymore. On the contrary, running time is dominated by local computations and the number of intermediate values to be exchanged over the network.
Using this insight, we developed the SEPIA library, which provides MPC operations optimized for parallel execution. Compared to state-of-the-art alternatives, SEPIA is much faster, executing, for instance, around 800 times more equality checks per second than similar frameworks.
A Deployment Scenario for SEPIA
A typical deployment scenario for SEPIA is shown below.
On the left, a number of independent network domains monitor their networks, e.g., using NetFlow data export on a router or IDS alert logs. Although the network data are sensitive and therefore kept secret, network operators want to aggregrate their data with the other networks. For example, they might be interested in whether other networks see similar intrusion alerts or not. Of course, they would never just hand over their alerts to another operator. But if they knew that indeed, the other guy is likely to be attacked by the same group of hackers, he might be willing to cooperate in order to come up with a more effective collaborative defense.
So what these operators could do, is installing SEPIA input peers in their premises. These input peers take sensitive local data and share it over the group of SEPIA privacy peers. The privacy peers (PPs) together simulate a trusted third party (TTP). That is, each privacy peer gets only one share of each data item. From this share, the PP cannot derive any information about the original data. Only if a majority of PPs comes together and combines their shares, they can reconstruct the information. In addition, the PPs can perform computations on the shared secrets without reconstructing intermediate values. When the computation has finished, the PPs reconstruct only the final result (for instance all similar intrusion alerts reported by three or more networks) and distribute it to the input peers.
Since the basic MPC operations, such as private addition, multiplication and comparison are not very useful by themselves, we implemented a number of composite protocols for distributed multi-domain network monitoring and security applications.
The entropy protocol allows input peers to aggregate local histograms and to compute the entropy of the aggregate histogram, which is commonly used for network anomaly detection. Similarly, the distinct count protocol finds and reveals the number of distinct, non-zero aggregate histogram bins. The most general protocol is our event correlation protocol, which correlates arbitrary events across networks and only reveals the events that appear in a minimum number of input peers and have aggregate frequency above a configurable threshold. Finally, our top-k protocol PPTKS estimates the global top-k items over private input lists using a novel approach based on a sketch data structure that trades off a slightly lower estimation accuracy for lower MPC computational overhead.
We evaluated our protocols in realistic settings and with real traffic data from the SWITCH network. Our evaluation shows that the protocols are indeed practical and can be used in various scenarios. For instance, we demonstrate that aggregating port histograms with 65,535 bins, computing the port entropy, or generating top-k reports from 180,000 distinct IP addresses is possible within 1-5 minutes, even if input data are distributed among 25 participants. This demonstrates, that MPC-based solutions for collaboration can indeed be practical, even for network data.
In my next article, I will report on our experiences from applying these protocols to real network traces of six SWITCH customers. After all, developing fancy MPC protocols and even demonstrating that they run in near realtime is not useful for the network practitioner unless the protocols solve real-world problems.