SSDBM 2010 in Heidelberg, 30th June-2nd July

Scientific and Statistical Database Management Conference has a long tradition of over 20 years. The conference started with an excellent keynote by Daniel Abadi from Yale. He talked about HadoopDB. Hadoop is a data management and query solution for very large data sets based on the Map Reduce paradigm (MR). It is excelent in splitting a query over many processing nodes and then assembling the results. It is free and scalable. However, in comparison with relational databases, it is not as fast as the commercial products in some types of searching. HadoopDB tries to get the best of both worlds, fast storage layer with indexing and compression from relational platforms and parallelisation for free from Hadoop. The future of scientific data processing may well lie in hybrid solutions, such as HadoopDB or openBIS which is being developed by the CISD and SyBIT in Basel. MR is a paradigm that might be adapted to the type of data processing we support in our scientific workflows.

Another important development in scientific databases is the development and management of workflows. Two conference sessions were devoted to this problem, presenting both research issues and existing solutions and innovations. I particulary liked the talk about Taverna given by Paolo Missier who debunked a lot of the myths surrounding Taverna. In particular, Taverna now supports workflow provenance via its ProvenanceDB. The structure of Taverna is similar to the division of roles in the systems we are building – openBIS and iBRAIN2. One database is normally responsible for the management of experimental results data, both raw and derived databasets, and another database manages provenance, i.e. workflow history for a data set.

OME Users Meeting, Paris 15-16 June 2010

I attended the European Open Microscopy Environment (OME) User Meeting in Paris. There were around 50 attendees showcasing their tools, giving demos and gathering new user requirements. OME produce OMERO, a database and visualisation software suite for microscopy, and the new release of this software was presented in detail. OME software is being developed by a team from Dundee, led by Jason Swedlow and funded by the Wellcome Trust. Two main areas of application for OME software are high content sreening and large 3D imaging of organs, tissues and organisms. Current focus is on image storage and processing, but this is slowly shifting towards providing support for workflows which can automate data analysis, and towards results storage in OMERO. More information is available at the OME website.

Posted in Trip Reports. No Comments »

SysMO2 meeting, near Noordwijkerhout, Netherlands, June 7-9th 2010

As a member of the SysMO2 scientific advisory board, see http://www.sysmo.net, I attended the presentations given by the consortia funded by SysMO (Systems Biology of Model Organisms). The aim of SysMO is the development of system models. In most cases the consortia reported on new metabolomics datasets and the resulting models.

From the data management perspective this translates into two main deliverables supported by the SysMODB project http://www.sysmo-db.org. First, the data are deposited in spreadsheets based on shared templates designed in collaboration between SysMODB team and each project, and then either submitted to a central repository called SysMO-SEEK or linked to that database. Then, models are deposited in a model database called JWS, http://jjj.biochem.sun.ac.za/database/index.html. This website supports simulation.

Projects interact with SysMODB team via PALs (Project Area Liasons) who attend consortium wide meetings where they jointly discuss their data and processing requirements. Current work focuses on making SysMO-SEEK available as a toolbox for local installation and on extending JWS to cater for a wider variety of models.

Visit to Broad Institute

Today I had a short conversation with Anne Carpenter and Thouis Ray Jones at the Broad Institute in Cambridge (Boston). I have given her a short overview of our Swiss ‘landscape’ with SystemsX.ch, SyBIT and all the institutions working with us, and who is working on what in what context among the people they already interacted with. I also gave her a short functionality description also of openBIS and its relation to iBRAIN.

She said that the large-scale batch analysis of images was not such a high priority for them although they do have a prelimiary web service to provide cluster submission. Ray later said that they do run into scaling problems concerning file storage and large numbers of files sometimes, but it is not a very burning problem yet. They did not think about that yet.

They are interested in assuring that their tools can at least interface to our iBRAIN modules and would be willing to spend time to assure that this is the case, and reinforced their interest to host Pauli and Ela for a couple of days to discuss this.

Posted in Updates. No Comments »

Perkin Elmer – Columbus Presentation

Today we have a presentation by PerkinElmer about their Columbus imange management system. Columbus builds on the Open Microscopy Environment (OME) set of tools, especially OMERO and the Bioformats library. It is capable of dealing with HTS data as well. Columbus provides image management by plate barcode, web interface showing thumbnails, including the acapella cell finding tool, etc. They claim to provide Columbus also as ‘Software as a Service’ solution, but cannot give numbers about remote data upload and remote data storage yet.

Basically the way this looked to me is that Columbus is a very nice well-done tool to look at individual plates and analyze them using some cell classification written in their own tool that depends on pipeline pilot.

They depend on OMERO to keep track of their data. Metadata on images is stored in xml files that are also tracked by OMERO. Since metadata is not kept in a db, they do not support filtering and searches on that, so this is a definite big weakness.

Concering large batch analysis of many plates the presenter gave me the impression that they have not considered that use case yet and do not intend to do so in the near future.

Posted in Updates. No Comments »

SyBIT Tech Day

Today we had a very productive day in Bern. This was the first instance of a ‘tech day‘ where we discussed about very technical aspects of our work. The CISD team has presented in-depth details about how to interact with openBIS, Tomek Pylak even held a hands-on session to code against the public API. It was very informative and gave everyone a much better insight how to set up and use the system.

In the afternoon we had a very engaged discussion about the usage of workflows in our context. The Biozentrum team has shown their concept for automating screening analysis in the iBRAIN2 project, and Jacques Rougemont has given very detailed insights about the work they do at the BBCF and VitalIT for deep sequencing.

Posted in Updates. No Comments »

Meeting with the SIB Education Board

Ela and I also had a meeting with members of the SIB education board, Marc Robinson-Rechavi, Patricia Palagi and Frederique Lisacek. They have shown us what the SIB has organized until now. They have several pages where the education activities in Bioinformatics in all of Switzerland are easily accessible:

There are basically three levels of courses: the regular Masters classes (per semester), PhD courses (semester or summerschools) and courses for further education which can be just a couple of days or a week.

The agreement was that for Bioinformatics we (SyBIT and SystemsX.ch in general) simply attach ourselves to these pages, ie. if we plan courses we will advertise them here or if someone asks for courses we direct them there as well. We are going to actively suggest to our partners to do the same.

We will keep close contact to exchange information about education needs, and I offered our help to disseminate information about these pages in Zurich and Basel, or also prepare and create courses should some need arise that we could address.

Posted in Updates. No Comments »

SyBIT Presentation, Lausanne

Today I had a presentation in Lausanne for Sebastian Maerkl’s group (DynamiX project). It was the same kind of presentation as in Bern earlier the same week, describing what SyBIT is about and what we are doing. Jacques Rougemont was also there and most of his group, as well as Ela.

Sebastian gave us a tour of his lab after the talk and explained how their data acquisition and analysis works. They have a very similar data management challenge as the screening community, producing large numbers of images and videos that need to be analyzed using image analysis methods. They have their own cluster in the laboratory for the analysis. We will look at possibilities to help them with the metadata management in a followup meeting next week.

Posted in Updates. No Comments »

EDBT/ICDT 2010

I attended those two conferences in Lausanne, 22-26 March 2010. ICDT is a database theory conference while EDBT focuses on practical database topics, with less theory and more papers involving the benchmarking of new algorithms and solutions which at some later stage get incorporated into software products. The conference featured keynotes, regular papers and demos, and several workshops. The most interesting keynotes in my opinion were by Ian Horrocks on scalable semantic systems – those systems are not currently scalable at all, and by Pierre Fraigniaud on the importance of succinct labeling in XML data storage in databases. The demo session featured a B-Fabric demo by Fuat Akal et al., B-Fabric: The Swiss Army Knife for Life Sciences. This was a very comprehensive presentation of the rich set of databases supporting biological data analysis at the FGCZ in Zurich. I also attended some talks on the efficient querying of workflows, data provenance and data integration. New trends in databases include also the use of new architectures. I found a tutorial by the ETH Systems group to be of particular interest – on the use of FPGAs in databases, and a paper on the use of multicores in suffix tree construction. The PhD workshop had an interesting talk on web page archiving (by French Television/Radio) and also a talk on research funding by Moira Norrie of the ETH. Via informal discussions I also learned about some interesting databases and tools which could be used for prototyping in SystemsX.ch work, in particular I liked the BaseX from Konstanz.

Posted in Trip Reports. No Comments »

SWITCH Storage WG in Bern

Today i attended the SWITCH Storage Working Group meeting. It was very well attended, many of our SyBIT partners were present (FMI, Biozentrum, UZH, ETHZ, EPFL) and also infrastructure people from other universities and universities of applied sciences. In the morning some sites have shown what they have in terms of storage and how they do things. Except for our partners most have focussed on storage infrastructure for the ‘commodity’ services like administration (SAP db, finance files), email, user’s home directories, backup. We have certainly the largest amount of scientific research data, and it easily surpasses all of the other kinds of data. But in terms of technology being used there was a lot of overlap, and it was suggested by several people to cooperate when acquiring hardware and software and to exchange information.

There were presentations on cloud storage, on ideas for distributed archiving and by SWITCH on network pricing. Cloud storage in terms of private cloud storage was seen as useful, but using commercial providers for long-term storage is obviously too expensive. The technology however is interesting also to share data and resources among universities, especially since with SWITCH we would not have any network costs. Small universities could profit from bigger ones, and the big ones might also exchange resources among each other.

The distributed archiving idea is about duplicating data (read only archive data) accross several sites for safety. Each site can have its own long-term archive solution, but for certain data that needs more security and several off-site copies, this is a proposed mechanism to achieve it (storage broker presentation). For the copy mechanism they suggested bittorrent..

Posted in Infrastructure. No Comments »