e-Science Curation Study

 

Objectives of the study

The Digital Archiving Consultancy has been asked to carry out an audit of current “curation” of primary research data and to identify the future requirements for future curation of this data, focusing on primary research data, including data generated within the e-Science core programme in the UK.

 

We are entering an era in which digital data resources are becoming a central pillar of scientific research.  Data volumes are growing, in many cases massively, as is the complexity of the data itself.

 

This will be magnified by the spread of “Grid” infrastructure and technologies.  The Grid will allow the efficient manipulation of vast amounts of information such as that contained in the human genome or the results from experiments in CERN's new Large Hadron Collider. It will also allow the ability to mine data again and again by comparing existing data sets collected for one purpose with new and previously unrelated information, so generating new knowledge.

 

Implications for future “curation”

There will be significant implications for the future curation of primary research data if we wish to ensure that such data can continue to be accessed and re-used over time.  Digital information is now enabling new methods of research, dissemination and collaboration in areas ranging from environmental science to genomics.  Digital technology also offers us the ability to exploit data more deeply and more broadly, and build on existing data.  In all these areas there is a considerable amount of research and development work in progress, developing new tools and technologies to support increasingly powerful and sophisticated use and re-use of digital data, enabling inter alia easy collaboration (including between disciplines) and recognition of source.

 (see e-Science link below).

 

Requirements for data curation vary between disciplines but persistence of this information is increasingly important: not only for validation of research but because it contributes to dynamic knowledge bases or future research. Already in the US, many are predicting that the major science driver of high-end computing will soon be data.  Digital preservation is now being seen as a core requirement for the US Cyberinfrastructure and the Research Grid in the UK.

 

Many issues

However, there are many questions.  For instance, how much and what should be kept?  Who should keep it?  How do we pay for it?  Who pays?  How do we keep it?  Preserving and keeping digital data appropriately is not straightforward - for instance, data formats come, vary, and go within just a few years. 

 

The Programme of work

The following summarises the work which is being undertaken:-

v     A desk review of Grid and non-Grid literature.

v     Questionnaire surveys to relevant populations to establish current curation practice and requirements for the future. The populations being canvassed are:

§     - Data generators

§       Policy makers and funders

§     “Service providers (such as libraries, computer centres).

v     Interviews with key individuals.

 

The UK’s e-Science Core Programme

The DTI and the Research Councils committed an initial £118M to a government-industry programme on e-Science.  The reason for this investment is that GRID technology is seen as the natural successor to the world-wide web and the UK wants to take a leading role in order to develop solutions for its scientists and developing opportunities for its industry.

 

The e-Science core programme has been established to co-ordinate the research effort on e-Science in the UK.  Within the programme each research council is funding a number of pilot projects in their own application areas. There are 24 pilot projects, amongst a total of approximately 80 projects.

 

Links

 

For a good introduction to digital preservation issues see the article by Jeff Rothenberg in Scientific American see: http://www.kb.nl/kb/ict/dea/download/dig-info-paper.rothenberg.pdf

 

 

For further information about the JISC Continuing Access and Digital Preservation Strategy for 2002-5 see:

http://www.jisc.ac.uk/dner/preservation/dpstrategy2002b.html

 

 

For further information about the e-science programme see:

http://www.escience-grid.org.uk/

and

http://umbriel.dcs.gla.ac.uk/NeSC/general/

 

 

For further information about the JISC Committee for the Support of Research see:

http://www.jisc.ac.uk/jcsr/index.html

 

 

For further information about US National Science Foundation blue-ribbon committee see: http://www.cise.nsf.gov/evnt/reports/atkins_annc_020303.htm

 

 

For further information about The Data Archiving Consultancy see:

http://www.philiplord.com/index.html.   (Please note this site is undergoing major reconstruction!)

 

If you have any further questions, please contact Philip Lord (telephone  020-8607-9102) or Alison Macdonald (020-8744-9322), or e-mail us at pwl@d-archive.co.uk.