SRECon Europe 2016
The advanced computing system association, USENIX, provides opportunities to engineers to meet and discuss information on the developments of all aspects of computing system. Those opportunities are organized as conferences on different topics related to computing systems. One of them is SRECon, Site Reliability Engineering Conference.
SRECon is a conference that brings together engineers to talk about site reliability, systems engineering, and complex distributed systems at scale. The conference started in US and since last year it happens also in Europe. It took place in Dublin in 2015 and 2016, since this city hosts some important IT companies.
The program, https://www.usenix.org/conference/srecon16europe/program,
was very wide and with different tracks happening in parallel. Also, the focus of all companies that were there are different from ours, since they are somehow related to web and distributed systems at very high scale.
I mostly attended to talks related to monitoring and alerting.
Documentation and dealing with accidents on large scale systems was also part of my schedule. In summary, from those talks:
- Monitoring is an essential for any company that provides some infrastructure/service and most of the companies have some monitoring implemented. However, in the last years the whole monitoring system started to change to become more dynamic and fulfill the needs of microservice architecture. Designs and tools that follow this tendency were presented and discussed on the talks.
- Alerting was a big topic on this conference. Most discussions went into the direction of what should trigger an alert and how it should be.
In summary, alerting too much would cause noise and probably hide real and important issues. Having an efficient alerting system is crucial to any business, and new alerting tools such as alertmanager from prometheus become essential.
- Documentation is a difficult topic anywhere, especially to engineers (Software engineers, DevOps, etc). No one wants to do code/service documentation, because most of time it is boring. However, it was shown that having good documentation in all levels (code, service, accidents), reduces the time to solution for a problem and gives much more power to engineers dealing with those problems. New tools for documentation are needed tough. The ones that are there do not fulfill all requisites of current architectures or impose too much work on engineers side.
Besides the talks, there were some exhibition from big companies like Google, Amazon and Facebook. Some hands on workshops, where participants could design or learn a system in detail, were also part of the conference.
Post from Dr. Christiane Pousa
High Performance Computing, Scientific IT Services (ITS SIS) ETH Zurich http://www.id.ethz.ch/