Discussion with HP

I also had a phone conversation with Joel Jankow, Business Development Manager for Pharma & Life Sciences of HP Switzerland. He works closely with Dominique Gillot but is more on the application side and less on the IT infrastructure. HP has also done projects with other academic and non-academic institutions in our domain, and they organize visits of people who have built service solutions with them. In December a person from MDAnderson will visit Switzerland who could be asked to also give a presentation to us.

Posted in Updates. No Comments »

Discussion with FGCZ

We had a discussion with Ralph Schlapbach about the involvement of the FGCZ. We were very much on the same page concerning the challenges of SyBIT and how to adderss them. Ralph stressed that the FGCZ has a long experience in the very same problems and that they would be very interested in cooperating closer. I asked him to nominate someone that would represent the FGCZ in the coordination team. The reason why the FGCZ has not been involved in SyBIT so far was the lack of direct involvement with a SystemsX.ch RTD, but this changes as of now because the FGCZ is directly involved in BattleX. The coordination meeting for BattleX will be this Friday in Basel.

Ralph also promised to look at the SyBIT community meeting at the all-SystemsX-day in detail and make sure that the right people are there. He also expressed interest in many of the community projects proposed by Dean in the wiki.

Posted in Updates. No Comments »

ELN Discussion

Electronic Lab Notebooks (ELN)s are mostly not in use today in the SystemsX.ch labs. However many would like to introduce a system in order to address the following needs:

  • Track work in the lab electronically, find out what has been done easier
  • Ease lab automation process
  • Facilitate usage of standards
  • Make past effort searchable
  • Documentation, logging, reproducibility, protection against fraud

SyBIT has started an e-notebook project together with interested parties to explore what would need to be done, we had a first discussion on the 23rd with Ernst Hafen (WingX), Fritjof Helmchen and Dominik Langer (Neurochoice). From SyBIT there were Peter, Ela, Dean and Claus Hultschig who just started at CISD.

The detailed minutes of the meeting are in the wiki. We discussed how to address this difficult problem pragmatically and agreed that the two involved labs (Hafen, Helmchen) would try out to simply document their work in their respective wiki’s and share the experience in a month from now. From our side we also said we would do research on existing solutions and show what is around and feasible. The next meeting is December 9th.

Posted in Updates. No Comments »

CINA – SyBIT Meeting

We had a CINA-SyBIT meeting in Basel yesterday. Attending were from CINA: Henning Stahlberg, Bryant Gipson and Thomas Braun. From our side we had Bernd, Michael, Dean, Ela and myself. We were given a tour of the instruments in the basement of the BSSE building, very impressive. The instruments can produce very large amounts of imaging data at a range of resolutions, working even on the same sample.

Prof. Stahlberg has given us an overview of what CINA is about and how the IT is planned to be operated. Currently they store and analyze all the data using the resources in-house of the BSSE (Janos Palinkas’ team). There is a database and service called EMEN2 (nothing can be found on the web unfortunately) that is developed by the group of Steve Ludtke in the Baylor College of Medicine, Houston, Texas. It will be installed to serve the CINA data up to the users of the technology platform. The person to do so will start in November and will interact closely both with the developers in Texas and also with CISD.

The CISD has a dedicated 30% of a person for CINA. This person has been found, Chandrasekhar Ramakrishnan will start in November and will also be contributing to SyBIT to probably 50% of his time. As the immediate next step, Chandrasekhar and the new person at CINA will cooperate tightly to set up an openBIS instance that can communicate with EMEN2 to track the metadata and provide the needed data tracking functionality as described in the CINA proposal.

In addition we have touched on several other topics with various results on next steps to be taken:

  • Request management system: There is no fancy system there but the amount of users is low so it is quite manageable. Bryant has seen the FMI system and quite liked it. SyBIT will keep CINA informed about future developments in this context.
  • Data Integration: The prerequisite here is an overarching registry that can issue unique tracking numbers to each entity (project, sample, whatever makes sense). Also here this is not the currently most urgent need in CINA but the interest is there to add such IDs to EMEN2 if they become used.
  • Image browsing: Currently there are no tools available to browse result sets in a ‘Google Earth’-like fashion. The images of the same sample are of a wide range of different resolutions, 2D and 3D, but of course not everywhere but only at select locations. If there are tools that could be used to this extent around, SyBIT will come back to CINA with pointers.

I have also set up a Wiki Space for the CINA-SyBIT interaction in the future, all CINA and SyBIT people can access and write into this space (currently empty), i will give it some structure and fill it with a few pointers. The openBIS work and EMEN2 integration will also be documented here.

Posted in Updates. 2 Comments »

SME Workshop

SystemsX.ch has held an SME Workshop in Zurich together with the Swiss Biotech Association and toolpoint. I have given a presentation (also in pdf) at this workshop. There have been around 30 people, mostly from the industry. There were 3 of us from SyBIT present (Ela, Bernd and myself). The talks were mostly in German since all of the audience were local. It was interesting giving a talk in German with the slides being in English…

The most interesting contact here was with a company called Xavo that is involved in standardization of the interfaces for many instruments that they call SiLA. They also gave a presentation related to this topic and were very interested in keeping a contact to SyBIT. I believe it is worth keeping a tab on this and if possible (and we are involved in such things) make use of the SiLA standards.

Posted in Updates. No Comments »

HiBi Workshop Trento

I have attended the HiBi09 – High Performance Computational Systems Biology workshop in Trento, Italy last week. It had an interesting program mostly for modeling and issues in optimization of computation, either for parallel processing or CUDA. For me most of it was very theoretical and i could not judge the quality or relevance of the approaches, some of it struck me however as relatively ‘low-scale’, ie. not targeting yet really ‘high performance’ in the supercomputing sense of the word (tens of thousands of processors or terabytes of memory). A lot of it had to do with mathematical and statistical models which is not my domain of expertise.

An interesting talk was given on Thursday about the James II framework for modeling and simulation from Rostock, this might be something to keep in mind also for SyBIT to look at in the future.

What I have learned about Microsoft Research in Trento is that they are really only focussed on research and mathematical models, a bit of programming. It is attached to the University of Trento, there are many young scientists. So this is a relevant point of contact for the modelers in SystemsX.ch. For SyBIT, it turned out that the MSR Cambridge labs are more relevant and I got a contact there to follow up on. The SIB proteomics group in Geneva (Frederique Lisacek) has already a collaboration with the Cambridge team.

Posted in Updates. No Comments »

SyBIT people at IMSB

The two SyBIT posts have been filled at IMSB, our new colleagues are Adam Srebniak and Emanuel Schmid.

Adam has started at IMSB on October 1st. Adam is a young software engineer from Katowice (Poland), holds a Masters degree in Computer Science and has worked previously in Poland and Finland. His strengths are in in OO, Java and web technologies.

Emanuel will start October 19th. Emanuel is a senior software developer and holds a Masters of Science in Mathematics (1993) and more recently also in Biology (2009), both from the University of Zurich. Due to his recent studies in Biology, Emanuel is familiar with the technologies and techniques and will act as the primary interface to the biologists at the IMSB.

Both Adam and Emanuel hold contracts with the SIB with P.K. as direct supervisor but are placed at the IMSB. Their first tasks will be to address the immediate needs of PhosphoNetX and YeastX, but they will also participate in the Lab Notebook project with WingX and Neurochoice.

Posted in Administrative. 2 Comments »

iBRAIN project kickoff meeting

We had a discussion today with Michael Podvinec, Ela and myself from SyBIT; Berend Snijder and Lucas Pelkmans from the IMSB and Gàbor Csucs and Péter Horvàth from the LMC concerning the next steps with iBRAIN.

Berend gave an overview of the design of the system and also how it is actually being operated. We have tried to understand what can be abstracted and automated. The functionality that users really like is the fact that people have ‘full control’ over their image data, they can browse and look at them through their browser directly on their laptop. They can launch matlab jobs etc. What is also great is that data dropped into directories managed by iBRAIN are automatically submitted to the cluster for analysis and people immediately get feedback. The actual file structure on disk is also updated directly. These things can be kept as is.

The ‘skeleton’ functionality that is needed can be summarized as follows:

Data Management:

  • Raw data – Write
  • Cluster access to data – r/w
  • Human access to data over NAS in the laptop – r/w
  • Web server access – Read only
  • Archive – Read only

Computation

  • Batch processing of matlab pipeline
  • Batch processing of pre- and post-computing steps (image stitching, metadata extraction, etc)

Automation

  • Copying data among storage servers – right now the implementation relies on the fact that everything is accessible on the given NAS, this has severe security and performance implications. The users don’t care as long as it looks like it is all there – it can be implemented as a mirroring service or synch service as well.
  • Automatic extraction of human readable images for quick eyeballing to see errors
  • Reporting of errors during processing over the web interface

New functionality that would be interesting to have once we have duplicated everything iBRAIN does today:

  • Setting up new projects, creation of file structure etc over a web interface
  • Data browsing, viewing features that have been extracted by the pipeline using various techniques
  • Access to more modules that can be used for automatic analysis
  • Access to public databases for comparison purposes
  • Standard formats in the image files, including some metadata
  • Standard settings of the instruments
  • Sharing of workflows

We have also discussed that there would be three types of people interacting with the system:

  • End-Users: as described above, they would interact directly through windows explorer and the like, but also through the web interface. They are allowed to run small-scale analysis
  • Power-Users: All large-scale analysis would need to go through them (at least they need to know about it). They should also be able to create new workflows and add new modules
  • Supporters: They control the whole workflow and iBRAIN daemon. Administration and support of the web server, the metadata catalogs etc.

The slides and document Berend has prepared are available through the wiki.

Posted in Updates. No Comments »

ETH Zurich IT – D-BIOL Coordination

We hada a very constructive meeting between the following actors:

  • ETH Informatikdienste (ID): People from the Systemdienste, ie. those running the large cluster and storage installations for scientists – Jürgen Winkelmann, Hans Hiltbrunner, Olivier Byrde, Tilo Steiger
  • The Institute of Molecular Systems Biology (IMSB) big users: Lars for Ruedi Aebersold’s lab, Nicola for Uwe Sauer’s lab, Berend and Lukas Pelkmans
  • The Light Microscopy Center (LMC) – Karol Kozak
  • The Institute for Molecular Chemistry (IMC) who actually have a coordinator for IT – Nico Graf
  • ETH Vice President’s office, Mrs Arangeh who heads the ETH Storage strategy group

I have set up this meeting because there seemed to be several misunderstandings and established ‘bad practices’ on both the ID and the user’s side concerning computing and storage. We had presentations by Lars, Nicola and Nico on the individual needs and experience for their respective institutes, as well as suggestions for improvement on the ID side.

We were discussing during 3 hours and i believe we could clarify a lot of details. Among the issues addresssed were

  • NAS size and firewall: Currently the NAS sizes are too small and the connection to the large cluster goes over a firewall, severely limiting performance. Brutus now has its own large storage array where people can buy shares (currently 250TB, with plans to go to 1PB soon). Now we only need to understand how the data will go from the instruments to this array once its being used, but here the data rate is not high so there should be no bottleneck. Still, also the firewall issue is investigated and faser networks are being looked at for this purpose. D-BIOL can act as a pioneer user in this context. Mirroring schemes for the NAS are also interesting.
  • Virtualization: the pricing scheme is still too expensive for most. Lars said his VMs would not be running all the time – right now he would buy a big Dell box and run over 60VMs on it. The ID said they would look into pricing models based on usage, and already now they lowered the prices for the current VMs massively.
  • Archiving solutions: There are planned upgrades here too for the HSM system so that automatical backup can happen
  • Missing is still the large transactional storage space for very large databases (TB range).
  • The scientists will define their data flow in more detail also which data is there to be kept on expensive storage and which data is to be deleted, which to be archived.

Next steps are for the ID to come back with suggestions for solutions. The D-BIOL is also working on a general strategy where information will be distributed among the present parties, SystemsX.ch will be invited to participate. This has already happened at the time of writing of this blog, and the documents describing the IT strategy for the D-BIOL have been provided to us by Nico Graf (to be kept confidential).

The presentations are availabe on the ID Sharepoint and also on the SyBIT Wiki: Slides by Lars, Nicola and Nico Graf.

Posted in Infrastructure. No Comments »

HP – Visit of Dominique Gillot

I had a visit by Dominique Gillot from Hewlett Packard. Dominique has a long experience in supporting scientific communities, he has managed the CERN account for HP and is also very much involved with VitalIT.

HP has a dedicated team in the US working on services for the Life Science community and they are well connected with Accelrys and other ISVs like Gaussian. They work on tooling also for the new sequencing hardware so have experience in supporting such a community.

HP of course is the most successful vendor in the top500 especially for high-density clusters and blades, but also for storage devices. They also have new lines for cheap disk storage where you get the most PB/$ especially for US government organizations. Their storage solution with Lustre (SFS) is now not secured since Oracle bought Sun but they have bought another company recently providing a global FS called iBrix. This is not a high-speed but a very robust solution.

He also gave an account of what was interesting at the BioIT Europe conference:

  • Phil Butcher from the EBI said they would soon go towards 30PB storage with a cluster of 30’000cores for data analysis – this is the ratio of 1TB/core, interestingly. They do have a very high speed storage solution, what they lack is a large non-high-speed workhorse (middle tier) which is not too expensive. Their new data center (3 years) is already full.
  • Okarina gave a presentation on data reduction on measurement data. So far biologists have been very reluctant in throwing away data like the physicists but now they really have little choice. Okarina presented a few ideas how to reduce the data size by 30%.
  • Virtualization is of course a very hot topic now and with it cloud computing. However, clouds are still geared towards scaling ‘simple’ web servers and not for the scientific community yet. But he said that this might change soon and partners are sought also by HP to this end. He understood the problems of some people keeping data on US servers.

He also said that there HP regularly organizes visits by partnering industries and academic institutions, next time by MDAnderson – but also VitalIT should be in the loop here.

Posted in Updates. No Comments »