BioIT World Europe

I have attended the BioIT World Europe conference, in Hannover Germany. It was co-hosted with Biotechnica, one of the biggest biotech trade fairs in Europe. Although the topics were interesting the conference was not very well attended and it was relatively expensive for the content. Also, the speakers were just coming for their talks and disappeared mostly just afterwards, so there was only a short time window to network with them, but because of the low attendance it was not a problem.

In more detail:

  • Andrew Lyall: ELIXIR. The ELIXIR project is a very large endeavor, with the aim to provide comprehensive access to public research data for all of Europe, federating all data providers. In Switzerland the SIB is a member (of course as the producers of Swissprot) and also aims to become one of the major data providers, but i have not seen on Andrew Lyall’s slides that CH would have signed the initial agreements yet. Maybe Ioannis can tell us the status of this later.
  • Etzard Stolte, former Roche now HP ‘CTO Life Science’: He gave a current state of the art in technology talk, and voiced the opinion that in the future we can just ask high level questions of a dataset in natural language and the computer would give a set of logical answers, like the Watson Jeopardy project or Mathematica’s new Alpha platform. I personally think that science questions cannot be properly voiced in natural language and that this will not work for that context. I will not be able to tell it to write my nobel-prize publication for me..
  • Chris Taylor, EBI: He discussed standards in general, there are many Minimal Information metadata standards now, almost too many, so he set up the MIBBI Project to give information on all of these. I asked him whether i should use it already, and he advised to wait a couple more years for this to settle down.
  • HP Cloud: The first commercial sponsored talk was by HP. They offer both private cloud solutions and have their own HP public cloud into which people can scale. The public cloud is still in beta stadium and will be so until next year April at least, the speaker said.
  • Folker Meyer, Argonne. He gave a reasonable overview of bioinformatics using cloud resources. They built their own Argonne Workflow to manage their jobs.
  • DDN: Second vendor presentation, their Web Object Scaler is a very nice technology, if it works it would save us copying stuff back and forth over the WAN
  • Arrysisbio: Vendor presentation of a genomics platform for data analysis, nicely done. The webpage is a good example for eye candy.
  • Carol Goble, UManchester on using clouds for resesarch. They did a simple ‘naive’ job on putting it on the cloud, but they had lots of technology that they just could reuse, like a ‘Smart Data Upload’. The technology is Ruby on Rails for the web interface. They also reuse a self-made orchestrator, that is deciding to ramp up or tear down instances. She says they built it in 4 days. Taverna was put on the cloud in that time, including tests etc. The costs for development was around 600$, one run on this taverna workflow costs around $5. BUT: Their large data was not shipped! A lot of preprocessing was done locally and only compressed and necessary data was shipped. This kept costs down. Pre and postprocessing on the data server locally which then submits jobs to the cloud. The reference dataset (Ensembl) needs to be available on the cloud. It was on the US Amazon space but not the EU space so they had to run jobs there. She did experience failures and performance variations. She also says that you need to scale with many CPUs if you really want to profit from a time boost by using the cloud, ie. getting your job done more quickly. Interesting comment on the side: You cannot just prototype at home using Eucalyptus and expect it to work on Amazon! The mechanisms are very different, even if the public API is identical.
  • Misha Kapushesky, EBI : “Mapping identifyers is bioinformatics’ biggest problem” Expression Atlas: Transcriptomics data /ebi.ac.uk/gxa .. you can use any ID to look things up anyone can use the R Cloud at EBI : http://www.ebi.ac.uk/tools/rcloud They have also the ability to roll in anyone else’s analysis on their internal cloud
  • Ola Spjuth: Uppsala university, UPPNEX – A Solution for Next Generation Sequencing Data Management and Analysis. Community webpages look nice.
  • Reinhard Schneider, Head Bioinformatics Core Facility, Luxembourg Center for Systems Biomedicine, University of Luxembourg. He talks about ‘exaflops’ in biology or what the applications could be in the future. His point is that supercomputers of today are not really suitable for day-to-day analysis of bench data because the bench scientist and bioinformatician has not time to port the codes to the complex supercomputing platforms. High throughput is needed rather than high capacity. He argues that bioinformatics will not be a supercomputing community. He also gave an interesting reference to a high-throughput IO system, ParaMedic. But his points are:
    • ..problems are data driven, high I/O requirements
    • ..problems do not scale beyond few 100 cores
    • ..applications are constantly under development
    • ..codes are scripts more often than not
    • ..bioinformaticians are not software engineers
  • Hermann Lederer: European Exascale software initiative. Lifescience demands on supercomputing, he basically argues against the previous speaker that there are a lot of life science use cases on supercomputers. He shows the results of the DEISA project and further plans with PRACE. Structure prediction mostly, but also tissue simulation etc. The previous speaker pointed out that these are not data-intensive applications and that his point was that for I/O bound processes supercomputers do not provide too much added value.
  • BlueArc commercial presentation. BlueArc makes use of FPGAs to scale their metadata head servers. He makes a point that NFS (NAS) and pNFS (Cluster) usage puts very different strains and requirements on the system and that it is important to balance the setup between these two usages, as the end-user will not care.
  • Daniel McLean, Sainsbury Norwich bioinformatics team. They have a small ’embedded bioinformatician’ team model that works pretty well for them locally. He makes the point that this is an important service to the scientists. He had a nice slide on the required know-how for certain tasks that i think is reasonable:
    • Understanding Biology: Scientist 100%, Bioinformatician 50%
    • Understanding Analysis: Scientist 50%, Bioinformatician 50%
    • Understanding Management tools: Scientist 20%, Bioinformatician 100%
  • His point is to enable the scientist and make their life easy. They mount their local NetApp to the users desktops using SSHFS. People can change things, but there are monitoring scripts that sends people email if they do not adhere to certain rules and conventions, like annotating data in a README file in data directories, whose structure is also predefined. They make heavy use of Galaxy and scripting (ruby, perl, etc). They have built a nice education site and activities around it, training the scientists in the usage of their tools.
  • Daniel James: Fiji and ImageJ-2 project. See http://fiji.sc . ImageJ is very successful, many people use it but it is ill-engineered. ImageJ2 is an effort to re-do it and this time do it cleanly. It is now done, and it is possible to extend it with tools and things, this is the Fiji project. It is a collaborative development approach that works very well. They communicate through IRC, have dedicated mailinglists, a youtube channel, GIT as code repository.. Sharing code is easy, just write a plugin and people will receive it through the autmatic updater. Very nice.
  • Urben Liebel, Karlsruhe Institute of Technology KIT. They built a dedicated local ‘HPC server’ that processes the data output immediately where the instrument is, in this case a microscope. This gives them real-time processing and feedback to the researcher. The analysis engine needs to be sized such that the real-time experience is not lost. Interestingly they have developed some auxiliary services as well:
    • Harvester, this can query online resources through one portal
    • Image finder, this can search for images in publications, instead of text – nice if you are interested in the diagrams in a paper
    • Sciencenet, like a social media science publication network – vote on the papers you read

1 Comment

  1. Ioannis says:

    For ELIXIR here is the first news : http://www.isb-sib.ch/news-a-events/news/515.html

Leave a comment