ZORA

I have an open task from the SEB to suggest policies for SystemsX.ch concerning publications and their access. I have been surveying what is available in the Swiss and international landscape for a while now (hence also the visits to the NEERI and APA conferences, see other blogs). Simultaneously I have been trying to find out what was already available in CH, see also the links in the APA blog.

So today I had a discussion with Roberto Mazzoni of the Informatikdepartement in the University of Zurich. He heads the project that the UZH is using to implement their open access policy. The system is called Zuich Open Repository and Archive ZORA. The system is very light-weight, it is nothing but a collection of the pdf’s of all publications by the University of Zurich researchers. The UZH makes it mandatory to send a pdf copy of all papers to ZORA. The researchers get a Unique Identifyer back that they can refer their colleagues to. For those publications that have their Copyright transferred to some publisher, the idea is that the publications are protected by a password so that only the UZH researchers can get access, thus keeping the contract. So for example all SystemsX.ch papers with UZH participation will end up in ZORA.

I have found this system very nice and asked Roberto whether it might be possible for them to host also other collections (like all the papers of SystemsX.ch)? The answer is yes, they have done that already once in the past for some Asian Studies. Of course we would need to add security such that all SystemsX.ch partners have access, but this could be done through SWITCH-AAI.

So a possible SystemsX.ch policy could be to mandate to put a copy of all publications into ZORA. All papers would recieve a unique identifier that can be then referred to in other places, especially in data collections that we will need to publish as well in the future.

I am taking this up now with the SyBIT MB and also the SEB, they will need to decide whether this is what SystemsX.ch will want to implement or not.

Posted in Updates. No Comments »

Swiss Grid Day 2009

The Swiss Grid Day is an annual meeting of the Swiss National Grid Association SwiNG. The focus is on the interoperable sharing of computing resources for research in Switzerland and in Europe. The origins of this associations lie with the High Energy Physics community who have built a world-wide computing grid for the LHC data analysis. Switzerland also participates with a Tier2 and several Tier3 centers. However, SwiNG would like to extend the concept to support more communities, also Life Sciences among others (like environmental sciences, chemistry). SwiNG coordinates several projects that are funded through the SWITCH AAA mechanism, one of which is a ‘multi-science computing grid’ where the aim is that resources are provided to all these communities, accessible by a unique Grid middleware.

We had updates on how the other Grid projects in Europe do similar things and what the EU plans in the future. This can be an interesting item for SystemsX.ch once we have mapped out our resources. What Grid Middleware can help with is scaling user managment accross several sites. We need to understand first what we actually need and whether we would want such a relatively complex system at all. We will solve the user management issue in our SWITCH AAA Portal Project, which can also submit jobs to the mult-science computing grid. The outcome of that project will tell us how to proceed.

Posted in Updates. No Comments »

Alliance for Permanent Access Conference

Today i am in the annual conf of the Alliance of Permanent Access. I hope they will make the slides available as some are reflecting my own findings very well.

In summary, everyone is having the same issues and we in CH are not less advanced than anyone else. Quite the opposite, SyBIT is a unique project in many respects in that we do work on a very specific solution for a very specific community where the take-up of what we do is there by construction.

People here struggle with

  • Policies coming from authorities to enforce preservation and data reusability
  • Funding for such a new infrastructure seems impossible to get
  • Change culture of publishing ie. good data is also worth publishing
  • Tools are there but almost all are proof-of-concepts and unmaintained after project finished
  • What does it mean to hand over data to a data archive or library? Does it make sense?
  • There are a lot of ‘recommendations’ but actually nobody DOES much

At the same time people realize that NOT addressing data preservation issues will have very serious repercussions as well. As usual, i have again learned a lot about what is going on in Switzerland by talking to our European friends.

Links

Swiss Links I was given:

Projects:

Talk Highlights

David Giaretta – Parse.Insight plenary says the focus should be on services. Infrastructure is essential. Chokepoints need to be identified early and addressed since they prevent progress.

Salvatore Mele – CERN plenary says that NOT doing data curation, archiving, preservation would be more expensive than building the LHC. First, the data can be reused many times. And, Scientists asked know that the data will be reused for things that the constructors of the system did not think of, and which makes the data of larger! value than the LHC. But he also says that researchers will not deal with data curation if they are not made to by the funding agencies. Incentives are missing.

Veronica Guidetti – ESA plenary also shows the need of her community for data preservation and access to historic data. She stresses the need for end-to-end solutions as users don’t care for details. Support and education for the tools is mandatory for them to be useful. She also identifies and explicitly spells out many barriers for take-up. (see slides)

Andreas Rauber – Univ. of Vienna digital preservation, has given an overview on tools for the archives. There are many such tools, most are just proof-of-concepts. Tools are not sustainable, many prototypes are built and not followed up, research stops at POC, the rest is ‘engineering’ and not publishable. Everyone does a metdata proof-of-concept for common image file formats (exif readers) and all stop there. Training needed to involve more people, awareness needs to be raised. But also experts have to be trained for challenges in their respective domains. Expertise on internals of preservation systems and methods has to be developed everywhere. He recommends actually what SyBIT tries to do: Identify existing tools, take over support, develop the specific additional needs. Only then can ‘toy examples’ be put out to the community and will be actually used and trusted. But that is not enough: He says we also need to research tomorrow’s problems and prepare for it. Current solutions will not last long. The domains should focus on their core competences and not do generic do-it-alls.

Chris Rubridge – DCC, Blue Ribbon Task Force. Lot of interesting points on his slides. He also highlights that the value of data is different to everyone (example by Brian Laurence / brian’s fishy example). Currently data is used only within a community, ie by the same people who deposit it. For research data IPR is not an issue (at least for the communities they deal with so far).

Barbara Sierman – Koninglijke Bibliothek Manager for Digital Preservation. She talks about the policies needed for preserve data. How to measure (audit) and certify preservation entities? She says that audit is needed for the organization, the digital object management, and the repository itself. There are many projects discussing how to set up digital repositories, what policies are needed, how it should operate, etc. Drambora: this is a page helping with such audits. She also elaborates about the importance to establish trust among all stakeholders. Those are the digital repository users, funders and the organization running it. Many projects claim to establish ‘best practices’ but what are those really? Based on what authority? This also needs to be put into policies according to her (if i understood correctly). Standards: Common file formats may be easily addressed, but obscure formats need to be addressed by the organizations needing them.

Discussion: R&D and technical tools

  • Common formats for future data
  • Data management is important, to preserve data, education of young researchers still possible
  • Weakness of library tools is their focus on published data. Datasets do not have a publishing model.
  • Setting up templates to publish is important
  • Referring into a point in a dataset needs to be done – as a structure
  • Thin metadata layer that is necessary to publish data should be given, but every domain should be allowed to specify their own data publishing schema
  • Standards will only work up to a point
  • Flexibility of tools needed, people should be able to come up with tools
  • Library needs to come up with ‘container standards’ just like AVI for video. Inside there can be anything. However, only living standards can converge, if it is just a standard to archive, that is not good. So libraries need to update constantly.
  • Longevity: there needs to be a migration to professionally maintained services.
Posted in Trip Reports. No Comments »

Aebersold Group Meeting

Today we had a very good meeting with the ‘new’ SyBIT Aebersold team. We have gone through all of the needs of PhosphoNetX and made real progress in drawing an actual architecture and detailed components we want to put in place here. There is a more detailed account of the meeting here.

Posted in Updates. No Comments »

IT Department UZH

I had lunch with Christian Bolliger of the IT Department of the University of Zurich. He is the one in charge of the new cluster ‘Schrödinger’ of the University of Zurich and I had a nice tour of the machine. It is one of the largest clusters in Switzerland (see tech data). The cooling is done solely with air, so it is extremely loud in the machine room!

The model how people are allowed on the cluster is much more open than at the ETHZ, basically any researcher at the UZH can apply for time. A large fraction however is dedicated to truly parallel jobs that can really exploit the size and the excellent interconnectivity of Schrödinger.

We have also discussed other topics concerning data storage strategies and data policies where Christian was very open for further exchange of ideas. He also introduced me to one of his colleagues (Roberto Mazzoni) who is working together with the large libraries in the context of digital data repositories. I’ll follow up that topic later, stay tuned.

Posted in Updates. No Comments »