BioIT World 2013

Last week I have attended the BioIT World in Boston¬†¬†again, for the second time since we run SyBIT. This is the U.S. main ‘edition’ of the BioIT World conference series, there are spin-offs in Europe and Asia, but they are much less well attended.

There were 12! parallel tracks focusing on topics from IT infrastructure, Bioinformatics through Clinical Omics and Visualization – see the conference site for details. For me, the highlights were definitely the talks by the BioTeam people, Chris Dagdigian and Chris Dawn. Dag has given a bit of insight into their project with Sanofi and Chris Dawn into the setup of the new New York Genomics Center. The key take-home messages they gave were:

  • If you are an infrastructure resource provider and you do not have a cloud strategy yet, it is almost too late – researchers simply make use of their credit cards on amazon behind your back. The pricing needs to be competitive with that.
  • Private clouds make sense in certain environments, especially for flexibility and manageability, but always in conjunction with a local cluster and data infrastructure. Every cluster should have a virtualization component now. This is great when users want to do their own processing close to the data – just give them an IaaS to run their own VM, but with very close ties to the storage and cluster, delivering the necessary punch. And it makes cloudbursting into the public clouds better controllable.
  • Cloud APIs need to be carefully evaluated – beware of vendor lock-in.
  • DevOps is the future – ‘Infrastructure is Code’. IT infrastructure needs to be programmable through APIs otherwise scalability cannot be achieved. This means that the classical sysadmin job disappears and IT managers need to be very good scripters in Chef, Puppet and other tools. A lot of tooling comes out of the cloud communities for this. Multi vendor and hybrid cloud usage will be the rule not the exception.
  • Small local clusters disappear, being superseded by fat nodes with a lot of memory. This makes sense, a fat node has more CPU and memory in one than the whole small cluster put together previously.
  • Storage is still the biggest expense in the life sciences. Consequence: ‘Data flows downhill’, ie to wherever the cheapest storage can be found, and is left in ‘puddles’ there.
  • Storage arrays are getting smarter now, being able to run applications close to the storage, to be watched.
  • Hadoop/HDFS needs a different hardware design! Classical cluster with lightweight local nodes and parallel FS is NOT for Hadoop – it needs relatively large local disks, going back to the pizzabox model and away from blades again. But real Hadoop use cases are rare, do not just do it without a real need!
  • Software defined networking is still ‘absolutely useless hype’ at this stage.

But there have been a lot of other interesting talks, for example the keynote by Aral Butte from Stanford, who has showcased how treatment for various deseases can be found just by mining the known properties of existing drugs, finding new uses for them. He had some spectacular results. What also struck me that he has given several examples of companies in the U.S. providing not only human tissue samples for deseases online (in a shopping cart!), but also analysis and even lab rat experiments on demand. He could basically outsource all clinical trials, testing the impact of the ‘old’ drug for the ‘new’ disease, in record time, working now on FDA approval.

On the various panels there were several interesting people. One worth mentioning was including Andrea Norris, CIO of the NIH, who gave very interesting insights on how the NIH is thinking on data sharing and publications thereof in the near future. They intend to start funding for centers of excellence who are able to sustain data sharing, working on turning big data into knowledge (bd2k), implementing policies given by the funding agencies.


Leave a comment