Tech Day on Proteomics

The SyBIT Tech Day on Proteomics was held yesterday in Bern. I’d like to thank everyone who contributed and attended, especially our friends from Geneva who had the longest trip!

We had a very broad spectrum of topics that were covered. It was especially interesting to hear about the latest status of the standardization efforts at Swissprot and SIB. We had also a good share of technical talks and more detailed proteomics bioinformatics related presentations.

Posted in Updates. No Comments » Autumn School

The autumn school will be organized by us this year, we will focus on HCS and Proteomics data analysis. We have also invited KNIME to teach their tool to the students that is potentially a very useful integrator for any of our tools and lowers the barrier of their usage by non-expert bioinformaticians.

For details see

Posted in Updates. No Comments »

IEEE eScience Workshop Stockholm

I have attended this week the annual eScience workshop, this time it was in Stockholm. The Program was shared with the Microsoft Research eScience workshop. There have also been several side-workshops. I attended one on computing advances in live sciences.

The idea was to get a current snapshot of the international landscape in eScience, which is really what SyBIT is also doing. There were a lot of reference to Jim Gray, who advocated the ’embedded’ model very early on. Many people referred to his slides,  which he has already shown at the ETH in 2006 when i invited him to give a talk. The ideas and problems he referred to are still the same, but not much has changed in the past 5 years. But at least now the whole community repeats his statements, i remember at the time people were much more skeptical. Still there are voices who say that all this support given to other sciences is not what computer scientists should do (since this is not computer science anymore). True, but i personally view computer science like mathematics – super useful if applicable, but most of the research done is only for the pure beauty of it and is restricted to the ivory tower. Luckily most want to do things that are applicable. But real examples of working with the researchers are still rare, everyone was quoting the SDSS work of Alex Szalay and Jim Gray, which i had the luck of being part of as well – but this was 1999-2001, then there was refinement until 2004 and some crowdsourcing (galaxy zoo). So really not much happened in the past 5 years in terms of collaborations. The UK eScience program was of course quite successful in the early 2000s and Tony Hey gave a nice talk on that.

It was however a bit disheartening that many of the presented papers in the research tracks of the conference were just about yet another workflow, yet another automation, yet another data mining effort. When asked why they did not use existing tools, the answer was always ‘interesting, i did not know about that’. Which means to me that at least the review process is broken..

Still i have collected a few pointers that can be useful in sustainability discussions when talking about SyBIT’s future beyond (in addition to Jim’s slides).

There were also interesting discussions about clouds, grids and e-Infrastructures in general. Basically what we try with SyBIT is very much in line what people are thinking to do, so we are already ahead of the crowd. But most everyone is hitting the same problem concerning sustained funding, like the EGI. There is money to develop and build something but there are no business models available to fund sustained support and operations of the developed tools. My personal conclusion is that it is a misguided effort to try to unify all domains (well maybe some, see the MIT Convergence text) on the e-Infrastructure level. It is much more efficient to let the scientific domains to sort out their problems individually. And then if they want to converge further, let them sort it out. Imposing it from the outside is not working. So we need a Life Science specific effort to support data intensive life science research.

Posted in Updates. No Comments »

Retreat with IMSB

This week we had a small focused retreat or ‘code camp’ with Lars Malmström’s group who is working very tightly with SyBIT on providing the tools and services for proteomics analysis in PhosphoNetX in the Aebersold lab.

We have been 11 people, 5 from Lars’ group and 5 from SyBIT and myself. The idea was to get everyone up to speed with the status of all available components of the toolbox at IMSB and to write usable workflows that can be put to use immediately.

We have been able to go through all components of the toolbox and we have documented a ‘test’ workflow which is a simple genomics search using bowtie. We identified the weak points of the integration of the components, improving them on the spot. Work on other workflows has also started and will progress now back in the office.

The retreat has surpassed my expectations on what can be achieved – one can make a lot of progress in such a way – but we also were at it from morning till late. Of course we also made a couple of excursions to the beautiful landscape of the Vierwaldstätter See.

Posted in Updates. No Comments »

SCI-BUS Project Kickoff

SCI-BUS is an EU FP7 framework project that started Oct. 1st 2011. The ETH is a partner and has signed up to further develop and enhance the proteomics portal solution that is already in place. Since this is work we need to do anyway in SyBIT this is a perfect match, we get an extra person from the EU to perform a technology upgrade to our installation. Most of the work we will do in the first 6-8 months, then it is about extending and operating the portal, and to help others to set it up for their own purpose. The project has 11 different gateways that are to be produced, ours is just one of these 11. The others come from different scientific domains – from physics to citizen web gateways.

See for more information on the project.

Posted in Updates. No Comments »

SSDBM 2010 in Heidelberg, 30th June-2nd July

Scientific and Statistical Database Management Conference has a long tradition of over 20 years. The conference started with an excellent keynote by Daniel Abadi from Yale. He talked about HadoopDB. Hadoop is a data management and query solution for very large data sets based on the Map Reduce paradigm (MR). It is excelent in splitting a query over many processing nodes and then assembling the results. It is free and scalable. However, in comparison with relational databases, it is not as fast as the commercial products in some types of searching. HadoopDB tries to get the best of both worlds, fast storage layer with indexing and compression from relational platforms and parallelisation for free from Hadoop. The future of scientific data processing may well lie in hybrid solutions, such as HadoopDB or openBIS which is being developed by the CISD and SyBIT in Basel. MR is a paradigm that might be adapted to the type of data processing we support in our scientific workflows.

Another important development in scientific databases is the development and management of workflows. Two conference sessions were devoted to this problem, presenting both research issues and existing solutions and innovations. I particulary liked the talk about Taverna given by Paolo Missier who debunked a lot of the myths surrounding Taverna. In particular, Taverna now supports workflow provenance via its ProvenanceDB. The structure of Taverna is similar to the division of roles in the systems we are building – openBIS and iBRAIN2. One database is normally responsible for the management of experimental results data, both raw and derived databasets, and another database manages provenance, i.e. workflow history for a data set.

Visit to Broad Institute

Today I had a short conversation with Anne Carpenter and Thouis Ray Jones at the Broad Institute in Cambridge (Boston). I have given her a short overview of our Swiss ‘landscape’ with, SyBIT and all the institutions working with us, and who is working on what in what context among the people they already interacted with. I also gave her a short functionality description also of openBIS and its relation to iBRAIN.

She said that the large-scale batch analysis of images was not such a high priority for them although they do have a prelimiary web service to provide cluster submission. Ray later said that they do run into scaling problems concerning file storage and large numbers of files sometimes, but it is not a very burning problem yet. They did not think about that yet.

They are interested in assuring that their tools can at least interface to our iBRAIN modules and would be willing to spend time to assure that this is the case, and reinforced their interest to host Pauli and Ela for a couple of days to discuss this.

Posted in Updates. No Comments »

Perkin Elmer – Columbus Presentation

Today we have a presentation by PerkinElmer about their Columbus imange management system. Columbus builds on the Open Microscopy Environment (OME) set of tools, especially OMERO and the Bioformats library. It is capable of dealing with HTS data as well. Columbus provides image management by plate barcode, web interface showing thumbnails, including the acapella cell finding tool, etc. They claim to provide Columbus also as ‘Software as a Service’ solution, but cannot give numbers about remote data upload and remote data storage yet.

Basically the way this looked to me is that Columbus is a very nice well-done tool to look at individual plates and analyze them using some cell classification written in their own tool that depends on pipeline pilot.

They depend on OMERO to keep track of their data. Metadata on images is stored in xml files that are also tracked by OMERO. Since metadata is not kept in a db, they do not support filtering and searches on that, so this is a definite big weakness.

Concering large batch analysis of many plates the presenter gave me the impression that they have not considered that use case yet and do not intend to do so in the near future.

Posted in Updates. No Comments »

SyBIT Tech Day

Today we had a very productive day in Bern. This was the first instance of a ‘tech day‘ where we discussed about very technical aspects of our work. The CISD team has presented in-depth details about how to interact with openBIS, Tomek Pylak even held a hands-on session to code against the public API. It was very informative and gave everyone a much better insight how to set up and use the system.

In the afternoon we had a very engaged discussion about the usage of workflows in our context. The Biozentrum team has shown their concept for automating screening analysis in the iBRAIN2 project, and Jacques Rougemont has given very detailed insights about the work they do at the BBCF and VitalIT for deep sequencing.

Posted in Updates. No Comments »

Meeting with the SIB Education Board

Ela and I also had a meeting with members of the SIB education board, Marc Robinson-Rechavi, Patricia Palagi and Frederique Lisacek. They have shown us what the SIB has organized until now. They have several pages where the education activities in Bioinformatics in all of Switzerland are easily accessible:

There are basically three levels of courses: the regular Masters classes (per semester), PhD courses (semester or summerschools) and courses for further education which can be just a couple of days or a week.

The agreement was that for Bioinformatics we (SyBIT and in general) simply attach ourselves to these pages, ie. if we plan courses we will advertise them here or if someone asks for courses we direct them there as well. We are going to actively suggest to our partners to do the same.

We will keep close contact to exchange information about education needs, and I offered our help to disseminate information about these pages in Zurich and Basel, or also prepare and create courses should some need arise that we could address.

Posted in Updates. No Comments »