To follow on from my copyright and data presentation post –
Professor Anne Fitzgerald and I have produced two short guides for the Australian National Data Service (ANDS): one on Copyright and Data and the other on Creative Commons and Data. The Copyright and Data guide is now available (in html and pdf formats) from the ANDS website, the Creative Commons and Data guide should (hopefully) be available next week.
On Thursday 2 September, I attended the Australian National Data Service (ANDS) Workshop at the eResearch Australasia Conference 2008. This was a full day workshop, but the ANDS team did a great job of keeping the workshop interesting and highly interactive, and the day went very quickly.
In the morning, there were a few brief presentations – notably from Andrew Treloar of Monash University and the ANDS Establishment Project and Tracey Hinds from CSIRO. I particularly enjoyed Tracey’s presentation, which at a conference that seemed dominated by IT issues, focused on the social issues and the governance issues involved in data management and sharing research data. My notes from Tracey’s talk are below.
The rest of the day was spent in small round-table discussions. The most lively discussion surrounded questions about what institutions and research bodies need to help them in managing and sharing their data, and how ANDS could help. The group found that there was a need for:
- an openly accessible registry of ontologies for metadata of datasets, so that institutions can start using common and enduring metadata to describe their data;
- training for researchers, repository managers, research management staff, librarians, archivists and IT staff about data management (including the legal issues surrounding data management), database/repository infrastructure (how to make the database easy to use and sustainable), open access (why should you share your data?) and metadata. It was agreed that the training materials might have a generic introduction component that could be used by all groups, but then there should be different kinds of training materials that provide relevant detail to different groups (e.g. research management staff will have different concerns to IT staff; science researchers may have different concerns humanities researchers);
- developing conventions for the citation of data, so that researchers can get credit for sharing their data; and
- proper and comprehensive data management plans (DMP).
There was a consensus that data management plans were particularly important and that it would be useful to develop template DMPs which included specific sections that could be added or deleted as appropriate (for example, a section about compliance with privacy laws might be relevant to medical research but not to astronomy research). It was also thought that ANDS could select a few research projects from different disciplines and assist these projects in formulating a DMP. The resulting DMPs could then be made available online for other projects to use and adapt.
In relation to ANDS selecting particular projects to assist, in a broader way, with their data management and release (“engagement targets”) in the hope that these projects might then appear as “exemplar projects” for other groups, it was considered that appropriate selection criteria might be:
- broadness of audience and impact;
- potential for reuse of data and the ongoing reusability/sustainability of the data;
- the project’s willingness to assist others to develop their data management skills;
- wide inter-disciplinary appeal;
- willingness to transfer data around; and
- projects which will have good exemplary value to attract other communities.
I believe that ANDS will make the notes taken from the workshop available online.
Here are my notes from Tracey’s talk:
Tracey Hind – CSIRO
- ownership of data should stay with researcher
- but still need to manage CSIRO’s data at a higher level – maybe provide an “enabling” service for this rather than dictate a “one size fits all” approach
- As of now, CSIRO still does not formally recognise the idea of data management
- Real challenges are not technology – it is the human factors – issues of acceptance, understanding, people being prepared to share their data, IP etc
- High demand for storage, but storage is not management
- Scientists are not working as well across disciplines as the Flagship vision as hoped, much of this is because “you don’t know what you don’t know” – and it’s hard getting insight into other research disciplines
- Making data easily discoverable is the key to achieving multi-disciplinary outcomes
- Lesson is that data is a complex issue – especially when researchers don’t understand the potential benefits – you need exemplar projects to demonstrate the benefits of data management to get buy in.
- CSIRO’s data management vision (eSIM) – CSIRO scientists will be able to…gather, analyse and share scientific information securely and efficiently, leading to greater scientific outcomes for Australia
- Four layers – people, processes, technology and governance
- People challenges = incentives for deposit into a repository;
- Processes challenges = making sure that the work flows created actually support the technology and make things easy
- Governance = making sure all of this is properly funded and that data management is a part of the decision making (i.e. make sure researchers have a DMP before they are awarded funding)
- CSIRO’s exemplar projects = Auscope project; Atlas of Living Australia; Corporate Communications
Dr Andrew Treloar – ANDS Establishment Project
Blue print for ANDS = Towards the Australian Data Commons (TADC) – developed during 2007 by ANDS Technical Working Group
TADC: Why data? Why now? – increasing data-intensive research; almost all data is now born digital; “Consequently, increasingly effort and therefore funding will necessarily be diverted to data and data management over time”
TADC: Role of data federations – with more data online, more can be done; increasing focus on cross-disciplinary science
Changing Data, Changing Research – e.g. Hubble data has to be released 6 months after creation
ANDS Goal = to deliver greater access, easier and more effective data use and reuse
ANDS Implementation assumptions:
- ANDS doesn’t have enough money to fund storage, and so is predicated on institutionally supported solutions
- Not all data shared by ANDS will be open
- ANDS aims to leverage existing activity, and coordinate/fund new activity
- ANDS will only start to build the Australian Data Commons
- ANDS governance and management arrangements are sized for the current funding
Realising the goal – need to:
- Seed the commons by connecting existing stores
- Increase (human) capability across the sector in data management and integration
ANDS structure = four programs:
- Developing Frameworks (Monash) – about policies, national understandings of data management, and research intensive organisations = assisting OA by encouraging moves in favour of discipline-acceptable default data sharing practices
- Providing Utilities (ANU) – Services Roadmap, national discovery service, collection registry, persistent identifier minting and management = assisting OA by improving discoverability particularly across disciplines (ISO2146)
- Seeding the Commons (Monash) – recruit data into the research data commons = assisting OA by increasing the amount of content available, much of it (hopefully) OA
- Building Capabilities (ANU) – improving human capability for research data management and research access to data – esp. early career researchers teaching them good data management practices from the beginning = assisting OA by advocating to researchers for changed practices