IASSIST 2008 Conference Session: Moving Research Data Into and Out of Institutional Repositories

Robin Rice of EDINA and Edinburgh University Data Library gave an excellent overview of open access to data. She quoted Peter Buneman of the Digital Curation Centre who said, “The best way to preserve your data is to publish it!”

She also gave an overview of open data licences, including the Science Commons Open Data Protocol and the Public Domain Dedication & License (PDDL), which avoids the attribution stacking problems that may occur with Creative Commons licences.

Robin raised the issues:

  • What are the incentives for researchers to manage and share data?
  • How to meet funders’ requirements: researchers need to define how they are going to share their data or why they cannot
  • Capacity of higher education institutions to provide services for data management?

Finally, she gave an overview of the DISC-UK DataShare project (DISC-UK is a collaboration between Southampton University, Edinburgh University, Oxford University and London School of Economics) and what it is doing in tracking the tools and guidelines available relating to open data.

Katherine McNeill from MIT Libraries, DSpace and the Harvard-MIT Data Center gave a talk concerning interoperability between MIT’s institutional repository (IR) and data repositories.

MIT has multiple locations for depositing data – DSpace, Harvard-MIT Data Center (HMDC) and ICPSR.

This presents challenges for searching, unifying collections and archiving. It is also difficult to advise faculty where to deposit their data as each location has its own advantages and disadvantages.

Therefore, there is a need for interoperability.

Opportunity: PLEDGE Project – designed to foster the use of data grid technologies, replicating content across multiple systems for the purpose of preservation.

Developed an ‘agent’ so that DSpace software and HMDC Dataverse software are interoperable. Developer of agent: Mark Diggory.

Goal: to archive, preserve and provide access in DSpace to MIT-authored studies in HMDC.


(1) Workflow for selecting and processing studies

  • Currently this is a manual process and an informal service

(2) Updating of studies

  • Currently, if content updated in HMDC, there is no ‘flag’ to notify repository managers that it needs to be updated for DSpace

(3) License agreements and terms of use

  • DSpace has licensing screens which give DSpace permission to disseminate etc.
  • With the new system content is loaded in the ‘back end’ so the researcher is not actually exposed to the licensing screens
  • Repository manager has to get permission via email
  • How to deal with this in the future?
  • Further, end-users are usually notified of what they can do with data but this way it is more hidden – implications?

(4) Keeping the agent up to date

IASSIST 2008 Conference – Technology of Data: Collection, Communication, Access and Preservation

From 28th to 30th May I attended the IASSIST 2008 conference, Technology of Data: Collection, Communication, Access and Preservation at Stanford University.

I was there representing the Legal Framework for e-Research Project, which is hosted at QUT. Under this project, we have been examining and developing legal and management frameworks for data access, sharing and reuse.

I appeared to be the only lawyer or legal academic at the IASSIST conference, which was somewhat surprising considering the number of times that presenters raised legal questions or concerns in their sessions. The primary concern seemed to be how to determine ownership of data given the vast number of researchers, database managers and other interested parties that may assert ownership interests in the data. Copyright was a concern (does it attach? how do we deal with it so as to provide wide access to the data?), as was privacy. Finally, even where ownership rights could be determined, the big question was: how do we get our researchers to share? It was generally agreed that researchers are notoriously protective (overprotective?) of their data.

My thoughts on these matters were enthusiastically received. In brief, I advocated the use of Data Management Plans from the conception of a research project, which set out:

  • the different parties with an interest in the data collected by the research project;
  • who owns the data and/or who may control the data;
  • who is responsible for managing the data;
  • any legal controls applying to the data, including contractual conditions (arising in a funding agreement, employment agreement or any other agreement), copyright, confidentiality or privacy restrictions;
  • how data collected by the research project will be integrated with existing data from other sources in a way that complies with all responsibilities imposed by law;
  • how data will be disseminated;
  • how data will be attributed;
  • what uses other researchers may make with the data; and
  • data preservation and sustainability.

Whether privacy will be an issue will depend on the type of data collected and whether it can identify an individual. Whether copyright law will apply will depend again on the type of data collected and the jurisdiction in which the data is collected. In Australia, databases may attract copyright protection, but this is unlikely to be the case in the United States. Where data or a data compilation does attract copyright protection, licensing mechanisms can be employed to ensure wide distribution and reuse of the data. One option is applying a Creative Commons licence to the copyrightable elements of the data or database. Alternatively, Science Commons has developed an Open Data Protocol.

I have written more about the legal frameworks surrounding data here (with Professor Anne Fitzgerald).

Another pervasive concern of conference participants (most of whom were – I gathered – data librarians and database/repository managers) was obtaining accurate and reliable metadata from researchers who usually feel that they have a million better things to do than enter metadata into a computer system. This has been a problem faced by librarians dealing with theses and dissertation repositories and journal article repositories for years. Different institutions have different ways of dealing with the problem of reluctant academics. Some institutions have taken it upon themselves to enter the metadata on behalf of the academic. However, I still feel that the best approach is through consistent advocacy, education and demonstrations to show academics the enormous benefits to them of having their work easily searchable, findable and citable online.

The following posts comprise my notes from some of the conference sessions that I found most interesting. Apologies if the notes are a little rough. If you attended the conference and have any corrections, please let me know.

Congratulations to IASSIST for organising a fabulous 2008 conference and to Stanford for hosting it.

Hello world

Well here it is – my inaugural post! I have never kept a blog before so much of the process and conventions of blogging I am still figuring out. My hope is that this blog will help me to improve my analysis of what is going on in the world around me, refine my voice and open me up to online discussions.

This blog will likely cover a wide range of topics corresponding to my work and my interests. I work as a legal researcher at QUT primarily in the field of open access. Therefore, I intend that many posts will be about the law in Australia and about the open access movement worldwide. However, I hope to also touch on politics and society where I feel motivated to comment.

In addition to my law degree, I also have a degree in Creative Writing. So expect some literary references to creep in. The name of this blog is inspired by Italo Calvino’s Invisible Cities, one of my all time favourite novels. You can find the relevant extract in the side bar. I chose this extract because of the multiple metaphors that can be read into it. In venturing to host a blog and expose my thoughts and opinions to the world on the internet, I feel a little like I am tightrope-walking over an abyss! But the image of Octavia – of a community suspended, never knowing when the rope will break – can apply equally to the state of copyright law in the digital age, or more generally to the world in which we live today where the nature of our fundamental human rights are at times uncertain. Read into the name and extract what you will. Read into my blog what you will and please feel free to comment on any posts that interest you.