Category Archives: data management

Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCA 44

In the 2009 IceTV case, Gummow, Hayne and Heydon JJ of the High Court of Australia remarked that there was a need to treat with caution the emphasis in Desktop Marketing v Telstra upon labour and expense per se when determining whether copyright subsists in a compilation. Following this decision, I expected that we would see, in the next few years, some judicial consideration of the High Court’s remarks in a subsequent compilation case. I didn’t expect that it would come so soon or be so on point.

Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCA 44 is an important decision because not only it is the first major case to consider subsistence of copyright in data compilations since IceTV, but it reconsiders the question of whether copyright subsists in Telstra’s Yellow Pages and White Pages telephone directories.

Some background

Desktop Marketing Systems Pty Ltd v Telstra Corporation Ltd [2002] FCAFC 112 considered the question of whether Telstra held copyright in their Yellow Pages and White Pages directories. The court approached this as a question of originality – were the directories (which were essentially just compilations of name, address and phone number data, arranged alphabetically) sufficiently original to attract copyright protection? The court held that copyright can subsist in a compilation produced as a result of the exercise of skill, judgment or knowledge in the selection, presentation or arrangement of materials or where substantial labour or expense has been invested in collecting the materials included in the compilation (at [409]). Telstra had undertaken substantial labour and incurred significant expense in compiling the Yellow Pages and White Pages directories, and therefore owned copyright in the directories as compilations.

In my blog post on the IceTV decision in 2009, I wrote:

Since Desktop Marketing v Telstra, there has been significant uncertainty around a user’s ability to reproduce material contained in a copyrighted data compilation because the test for originality was so wide. This meant that a copyright holder could assert such control over a database that at times they appeared (and often purported) to be able to control use of what essentially amounted to mere facts and information in circumstances where copyright law should not extend. The [IceTV decision], while not bringing us in line with the US decision of Feist v Rural Telephone Services in regards to whether copyright should subsist in a compilation that lacks creative input, at least takes a step in the right direction of tightening the originality threshold to provide that reproduction of the purely informational material within a compilation will not constitute a substantial part sufficient to give rise to an infringement claim.

The Telstra Corp v Phone Directories decision

In Telstra Corp v Phone Directories, the question again arose as to whether copyright subsists in the White Pages (WPDs) and Yellow Pages (YPDs) directories.

Gordon J considered that the proper starting point was the Copyright Act ([7]) and that the completion of four steps could assist in determining whether copyright subsists in a work ([28]):

  1. Identify the work
  2. Identify the author/s of the work
  3. Determine when first publication of the work occurred
  4. Identify how the work is original.

Her Honour placed significant weight on the necessity of being able to identify an author before copyright can be held to subsist in a work. She states at [20], “The centrality of authorship is self evident”. She then sets out ten points in support (and elaboration) of this statement. I reproduce these in full because they are central to Justice Gordon’s reasoning in this case:

  1. The “theoretical underpinnings” of the Copyright Act strike a balance between rewarding authors of original literary works against policy considerations concerning “the public interest in maintaining a robust public domain in which further works are produced”: IceTV [2009] HCA 14; 254 ALR 386 at [24] and [71]. The genesis of copyright legislation in England was to protect the rights of authors of work from the reproduction of their work without their consent: see IceTV [2009] HCA 14; 254 ALR 386 at [25].
  2. The Copyright Act fixes on the author: ss 32, 33, 35 and 127 of the Copyright Act; IceTV [2009] HCA 14; 254 ALR 386 at [22][25] and [96]-[97] and Vawdrey Australia Pty Ltd v Krueger Transport Equipment Pty Ltd (2009) 83 IPR 1 at [147] per Lindgren J.
  3. The author is the person or persons who bring the work into existence in its material form: s 10(1), 31 and 32 of the Copyright Act and IceTV [2009] HCA 14; 254 ALR 386 at [26], [33] and [98]-[99]. To be considered as an author of a literary work the person or persons must have exercised “independent intellectual effort” (IceTV [2009] HCA 14; 254 ALR 386 at [33] and [48]) and / or “sufficient effort of a literary nature” (IceTV [2009] HCA 14; 254 ALR 386 at [99]).
  4. The Copyright Act provides for the possibility of joint authors: s 10(1) of the Copyright Act and IceTV [2009] HCA 14; 254 ALR 386 at [23] and [100]. A “work of joint authorship” requires that the literary work in question “has been produced by the collaboration of two or more authors and in which the contribution of each author is not separate from the contribution of the other author or the contributions of the other authors”: s 10(1) of the Copyright Act; see also Levy v Rutley (1871) LR 6 CP 523 at 529 per Keating J; Cala Homes (South) Ltd v Alfred McAlpine Homes East Ltd (No. 1) [1995] FSR 818 at 835-836 per Laddie J; Prior v Lansdowne Press Pty Ltd (1975) 12 ALR 685 at 688 per Gowans J.
  5. The Copyright Act also provides for compilations – the bringing into existence of a literary work which gathers and organises material from various sources: IceTV [2009] HCA 14; 254 ALR 386 at [72], quoting William Hill (Football) Ltd v Ladbroke (Football) Ltd [1980] RPC 539 at 550 per Diplock LJ. The fact a work is a compilation will itself inform the issues of authorship to be considered: IceTV [2009] HCA 14; 254 ALR 386 at [99]. The author or authors will be those who gather or organise the collection of material and who select, order and arrange its fixation in material form: ss 10(1), 31 and 32 of the Copyright Act and of IceTV [2009] HCA 14; 254 ALR 386 at [73][74] and [99]. However, it is a question of fact and degree which one or more of them will have expended “sufficient effort of a literary nature” to be considered an author under the Copyright Act: IceTV [2009] HCA 14; 254 ALR 386 at [99].
  6. Original works emanate from authors: ss 32, 33 and 35 of the Copyright Act and IceTV [2009] HCA 14; 254 ALR 386 at [22], [24], [33], [48] and [96]. Authorship and originality are correlatives: IceTV [2009] HCA 14; 254 ALR 386 at [33], [34], [47]-[49], [52] and [54]. In that context, as mentioned in [20(3)] above, “originality” under the Copyright Act “means that the creation (ie the production) of the work required some independent intellectual effort” and / or the exercise of “sufficient effort of a literary nature”: IceTV [2009] HCA 14; 254 ALR 386 at [33], [47]-[48] and [99]; see also at [187]-[188] and discussion of the need for some “creative spark” and exercise of “skill and judgment”. The phrases adopted are different. However, each phrase confirms that for a work to be sufficiently original for the subsistence of copyright, “substantial labour” and / or “substantial expense” is not alone sufficient. More is required. What that more is will, of course, vary from case to case but must involve “originality” by an identified author in an identified work. Where the expression of the work is dictated by the nature of the information the subject of expression without such effort, it will go against a finding of originality: IceTV [2009] HCA 14; 254 ALR 386 at [42] and [170].
  7. The Copyright Act does not protect facts, ideas or information contained in a work, to ensure a balance is struck between the interests of authors and those in society: IceTV [2009] HCA 14; 254 ALR 386 at [28] and the cases cited therein. The Copyright Act does not provide protection for skill and labour alone: IceTV [2009] HCA 14; 254 ALR 386 at [49], [52], [54] and [131].
  8. The Copyright Act protects the particular form of expression of the information: IceTV [2009] HCA 14; 254 ALR 386 at [26], [28], [40], [70], [102] and [160]; Hollinrake v Truswell [1894] 3 Ch 420 at 424 per Lord Herschell LC; Victoria v Pacific Technologies (Australia) Pty Ltd (No 2) (2009) 177 FCR 61 at [17] per Emmett J; see also Larrikin Music Publishing Pty Ltd v EMI Songs Australia Pty Limited [2010] FCA 29 at [40], [41] and [212]. Copyright is not given to reward work distinct from the production of a particular form of expression: IceTV [2009] HCA 14; 254 ALR 386 at [28] and [31]. Accordingly, it is “unhelpful to refer to the ‘commercial value’ of the information, because that directs attention to the information itself rather than to the particular form of expression”: IceTV [2009] HCA 14; 254 ALR 386 at [31] and [166].
  9. As compilations often contain facts and information, it is necessary to focus on the nature of the skill and labour required to create the work and ask whether it is directed to the originality of the particular form of expression: IceTV [2009] HCA 14; 254 ALR 386 at [31], [33], [47]-[48], [52] and [54].
  10. “Fixation” or identification of the original work is essential: ss 8 and 3135 of the Copyright Act and IceTV [2009] HCA 14; 254 ALR 386 at [15], [24]-[28] and [102]-[105]. Copyright does not subsist in a work unless and until the work takes a material form: IceTV [2009] HCA 14; 254 ALR 386 at [26] and [103].

A number of things become very clear from Gordon J’s enumeration of these points. Firstly, she has relied strongly on the reasoning in IceTV in her judgment, preferring that to the Desktop Marketing decision even though it was argued by the Applicants that Desktop Marketing had greater relevance. In fact, Gordon J addresses this argument directly at paragraph [46]:

Before turning to the facts, mention must be made of the decision of the Full Court of the Federal Court in Desktop Marketing Systems Pty Ltd v Telstra Corporation Ltd [2002] FCAFC 112; (2002) 119 FCR 491 (Desktop Marketing). In that decision, copyright was found to subsist in certain editions of WPDs and YPDs. The Applicants submitted that the resolution of the present case remains governed by the outcome in Desktop Marketing [2002] FCAFC 112; 119 FCR 491 and that the High Court’s comments on copyright subsistence in IceTV [2009] HCA 14; 254 ALR 386 should be regarded as obiter dicta. I reject that contention. Firstly, IceTV [2009] HCA 14; 254 ALR 386 is binding authority on the proper interpretation of the Copyright Act. The reasoning of both plurality judgments establishes principles of law beyond copyright infringement. Secondly, the High Court directly warned of the need to treat Desktop Marketing 119 FCR 491 with particular care: see IceTV [2009] HCA 14; 254 ALR 386 at [52], [134], [157] and [188]. Thirdly, Desktop Marketing [2002] FCAFC 112; 119 FCR 491 did not deal directly with the issue of authorship. Rather, all issues in respect of copyright had been conceded other than that of originality. In fact, Finkelstein J (at first instance) questioned the assumptions the parties had made about authorship: Telstra Corporation Ltd v Desktop Marketing Systems Pty Ltd [2001] FCA 612; (2001) 51 IPR 257 at [4]. Finally, the facts of this case are significantly different. The WPDs and YPDs in question are different. Moreover, the Genesis Computer System which stored the relational database and which was used in the production of some of the WPDs and YPDs in issue in these proceedings (after September 2001 in the case of YPDs and late 2003 in the case of WPDs) was not in use in Desktop Marketing [2002] FCAFC 112; 119 FCR 491. (The Genesis Computer System is considered in detail at [60]ff below).

Secondly, consistent with IceTV, she rejects the notion that skill, labour or expense alone can give rise to copyright protection (see also [341]). There must be something more, and that something more is the exercise of “independent intellectual effort” and /or “sufficient effort of a literary nature”.

Finally, she draws strong correlations between the concepts of “authorship” and “originality” in copyright law, such that a consideration of the latter is dependent on identification of the former. She says at [45]: “It would be absurd to assume that I am bound only to determine whether copyright subsists in the Works whilst ignoring any question of ownership. Copyright is a form of property created by statute for the benefit of the author or authors who, in the absence of some other arrangement, is the owner or are the owners of the work.”

Ultimately, the case turned on two factors. Firstly, Telstra’s inability to identify with any degree of certainty the “authors” of the WPDs and YPDs. Gordon J found that there were numerous people who had contributed in part to the production of the directories – some of these people were employees and some were independent contractors; some were still in the Applicants’ employ but others were not; and some had played only minor roles whereas others had played more significant roles. The exact number of contributors was unknown and the Applicants had not identified who the contributors were.

Secondly, Gordon J held that even if the authors could be identified with sufficient clarity and certainty (which they could not), the people suggested to be the authors of the works did not exercise “independent intellectual effort” and/or “sufficient effort of a literary nature” ([338]). The majority of the processes creating the WPD and the YPD was heavily automated. A system of computer-imposed “Rules” (which the judge considered extensively at [88]-[166]) controlled the content and prescribed the form of expression of the works (see [163]). Any human discretion had to be exercised in accordance with the Rules ([164]). The system was designed to limit originality, not provide for it ([341]). The tasks performed by individuals applying the Rules were mechanical and often able to be completed in large numbers swiftly ([341]). Very few people had any part to play in the final presentation of the works or the particular form of the expression of the information ([338]). Gordon J rejected the Applicants’ contention that the relevant intellectual effort involved was understanding and applying the Rules, holding that the independent intellectual effort required must be directed to the creation of the work and that the independent effort claimed by the Applicants was not ([165]). Consequently, Gordon J held that none of the works were original and none of the people said to be authors of the works exercised “independent intellectual effort” or “sufficient effort of a literary nature” in creating the works ([340]).

Conclusion

At paragraphs [343]-[344], Gordon J summarised her position:

It is not sufficient to demonstrate the subsistence of copyright by asserting that someone (and I do not accept that such a person has been found in this matter), who may in certain broad circumstances, in an unspecified number of relevant instances, have done an act that constitutes some unknown contribution to a work in question “no matter how unimpressive” will be enough to make good the Applicants’ claim.

Authorship and originality are correlatives. The question of whether copyright subsists is concerned with the particular form of expression of the work. You must identify authors, and those authors must direct their contribution (assessed as either an “independent intellectual effort” of a “sufficient effort of a literary nature”) to the particular form of expression of the work. Start with the work. Find its authors. They must have done something, howsoever defined, that can be considered original. The Applicants have failed to satisfy these conditions. Whether originality be the product of some “independent intellectual effort” and / or the exercise of “sufficient effort of a literary nature”, or involve a “creative spark” or the exercise of “skill and judgment”, it is not evident in the claim made by the Applicants.

Accordingly, Gordon J held that copyright did not subsist in Telstra’s WPDs and YPDs.

My thoughts

The broad test of originality set down in Desktop Marketing only served to create uncertainty for both creators and users in this area (especially users). It blurred the boundaries between facts, which are not protectable by copyright, and compilations of facts or data, which could be protected (raising questions of what constituted a “substantial part” of a factual database or compilation – some of those non-copyrighted facts?) It extended copyright protection to a realm of works not previously contemplated as falling within copyright’s scope, and encouraging overly broad copyright assertions by opportunistic compilers of data.

In my opinion, the result reached by Justice Gordon in Telstra Corp v Phone Directories continues the sensible approach of the IceTV High Court in reining in the overreach of copyright in this area.

The Future of Data Policy

The Microsoft External Research Division has launched a book entitled, The Fourth Paradigm: Data-Intensive Scientific Discovery (2009) edited by Tony Hey, Stewart Tansley, and Kristin Tolle. The book was launched on the opening day of the Microsoft eScience Workshop that took place in Pittsburgh, USA from 15-17 October 2009. The book includes a chapter, ‘The Future of Data Policy’ (pp 201-208), authored by Professor Anne Fitzgerald, Professor Brian Fitzgerald and myself. The book is licensed under a Creative Common Attribution Share Alike 3.0 United States licence, and can be download in its entirety or by chapter at The Fourth Paradigm.

Copyright protection and data compilations

Over on the Digital Curation Blog, Chris Rushbridge has an interesting post entitled, “Are research data facts and does it matter?”

I have just posted a response, which I am reproducing here:

I would like to take this opportunity to explain some of the research we have undertaken in the OAK Law Project and conclusions we have reached regarding copyright protection of data compilations in Australia. We have two primary publications addressing this area: Building the Infrastructure for Data Access and Reuse in Collaborative Research: An Analysis of the Legal Context and Practical Data Management: A Legal and Policy Guide.

s10(1) of the Australian Copyright Act 1968 defines a literary work to include a “compilation”. This is where protection for data compilations under Australian law derives from. Any data that is collected, arranged, organised and presented in a logical fashion will usually be regarded as a compilation.

Chris makes a good point that many data compilations will require a great deal of effort, analysis and creativity. In the US, creativity is a requirement before a data compilation can be protected by copyright. In Australia, creativity is not required. Only that the compilation is a result of the exercise of skill, knowledge or judgment in the arrangement of the data, or the investment of substantial labour or expense in collection the material (Desktop Marketing v Telstra).

It can often be difficult to tell whether a compilation is one that would attract copyright protection. In our work, we have tended to err on the side of caution and assume that most compilations will attract copyright protection. This is because the threshold in Australia is so low. The main case in this area, Desktop Marketing v Telstra, involved the copying of a telephone directory. A telephone directory is merely a compilation of names and numbers listed in alphabetical order. If this is a compilation that attracts copyright, then most other compilations are likely to be protected by copyright under Australian law as well.

Copyright law does not protect mere facts or information. Rather, it protects the expression of facts or information in a material form. This means that generally there would not be a problem with copying some of the basic facts contained in a compilation. For example, if I were to list the names and numbers of a small collection of my colleagues on my website, that would not usually be a problem. I have extracted the data that I need, in a fairly “random” fashion (in that I have not just copied a few pages of names and numbers in alphabetical order directly from the White Pages). I have not copied the way that the data is arranged in the telephone directory (the “expression”).

In regards to Science Commons’ decision to discontinue advocating the application of Creative Commons licences to data compilations, my understanding is that they came to this decision for two reasons:
(1) It was not always clear in the US whether the relevant compilation attracted copyright. If it did not but a person had put a CC licence on the compilation in the mistaken belief that it did, then restrictions would have been imposed on that dataset (e.g. that it could only be used non-commercially) which actually had no legal basis for being imposed; and
(2) CC licences all contain an attribution requirement and Science Commons were concerned about what they call “attribution stacking” – i.e. where a dataset is compiled from data contributed by many different researchers, it would be extremely difficult for a user to attribute all of those researchers.

At OAK Law, we still believe that CC licences can be applied to datasets in Australia because the concerns noted by Science Commons do not arise to the same degree in Australia. Firstly, we have a lower threshold test for copyright protection, meaning that copyright will more readily attach to datasets in Australia and the first problem noted by Science Commons is less likely to occur. Nevertheless, to be sure, we usually advocate that the widest CC licence – the attribution only licence – be applied to datasets. Secondly, unlike in the US, Australian copyright law includes Moral Rights, meaning that creators have to be attributed anyway, regardless of whether a CC licence is applied or not. We think there are various ways of getting around the “attribution stacking” problem – for example, a group of researchers could agree on a common way to be attributed (e.g. we could be attributed as “the OAK Law Project”), or the data could be attributed using a URL, which an interested party can visit and which can list all the contributors (and this list can be added to over time). The advantage of applying CC licences to data, in our view, is that it provides some certainty to users about what they can and cannot do with that data.

If you are interested in this issue, I would also advise reading these posts by Robin Rice and Rufus Pollock.

Access to Victorian fire data

Yesterday afternoon an interesting story appeared on ZDNet Australia: “Vic Govt limited Google’s bushfire map”. I encourage you to read the full post on ZDNet Australia, but in summary, the post documents Google’s trouble in gaining access to Victorian Government data about the movement of bushfires in Victoria.

According to the post, Google has been working with the Commonwealth Fire Authority, which manages fires on private lands, to overlay the Authority’s data onto Google Maps to produce a real-time map of the locations of the fires. The map also uses a colour scheme to convey the seriousness of the fires: green (safe), yellow (controlled), orange (contained) and red (ongoing).

Naturally, this map is immensely beneficial to those in Victoria and elsewhere who are attempting to track the bushfires.

However, Google has run into some problems gaining access to data to plot fires on public lands. This data is owned and controlled by the Victorian Department of Sustainability and Environment, and is covered by Crown copyright. As such, permission is required from the government before the data can be used, and for Google this permission has not been forthcoming. The result is that Google has been unable to plot this data onto their map.

As noted in the ZDNet Australia post, this is not the first time Google has had trouble accessing and using Australian government data. They were expressly denied permission from the Commonwealth Department of Health and Aging to overlay data from the National Public Toilet Map onto a Google Map.

Why is the government so unwilling to share its data? My guess is that there are two possible reasons. The first is that in some cases, the government has a misguided idea that data can be used to build online systems or services (usually these will be geospatial systems or services) which can be used to generate revenue by charging for access. The other is that the government is naturally risk-averse and would prefer to control their data as tightly as possible.

What the government is forgetting is that it is a representative of the people and the government-owned data has been collected using public funds. We, the Australian public, have paid for that data through our taxes and as such, we should have the benefit of that data. Surely it is most beneficial for the public if we can have ready access to that data in the most efficient and convenient way possible. And if that is through a Google Map, then the government should enable this. There can be no argument that in the face of tragedy such as the Victorian bushfires, the government should not hinder our ability to access as much information as possible about that tragedy. This includes the ability to easily track those bushfires via a Google Map.

Arguments have been made that as the access and use issue can be traced back to Crown copyright, then Crown copyright should be removed, as is the case in the United States where government data and publications are held to be in the public domain. I do not believe that this is the answer. Rather than remove Crown copyright completely, the government should be encouraged to release their material where possible under open licences such as the Creative Commons Attribution licence. This should be the default position, unless access to the material must be restricted due to privacy or national security concerns. The government must engage in a “push” model – where it systematically “pushes” its material out to the community – rather than a “pull” model – where members of the public must seek permission or lodge a Freedom Of Information request to access that material. Crown copyright can serve an important purpose, if only through the operation of the requirement of attribution (a requirement imposed through the Creative Commons licence, similar to moral rights), which requires that the author of a material (in this case, the government) to be attributed wherever the material is reproduced. The requirement of attribution for government copyright material can serve a two-fold purpose – (1) it allows the government to retain some control over the material it produces; and (2) it verifies to the public that the material has come from a reliable source.

Our research group at QUT has done some work on this area. See the auPSI website for more information.

ARROW Repository Day

On 14 October 2008, I attended the ARROW Repository Day held in Customs House in Brisbane. I presented on the legal issues surrounding management of data for inclusion in a repository. You can access my slides here.

Chris Rusbridge of the Digital Curation Centre in the UK also presented. Some brief notes from his talk are below. Chris was live blogging the day, so if you are interested I suggest you read his notes at the Digital Curation Blog.

Chris Rusbridge (Digital Curation Centre) – Moving the repository upstream

The resistant scholar

  • Uncertainty, risk – about copyright; about Ingelfinger Rule
  • Change
  • Too busy
  • Doesn’t fit into the way they do things now
  • Not well motivated by advantages to others
  • Little in it for them!

Research workflow

  • many different tasks in parallel
  • all different stages
  • teaching (several), research (several), writing up research, writing grant proposals, reviewing papers, administrative tasks etc

On negative clicks

Asked – how many extra clicks are you willing to make to ensure preservation of your record?

Answer – zero

Negative click repository?

Can the repository help rather than hinder?
Towards a Research Repository System? [diagram]

Maybe we could…

  • help with publisher liaison
  • support multiple authoring across several institutions
  • more permissive identity management
  • support multiple versions
  • fine grained access control
  • checkpointing
  • support supplementary data
  • provide basic data management capability
  • provide simple, cross-platform, persistent storage
  • provide some longevity
  • provide additional benefits

ANDS Workshop at eResearch Australasia Conference

On Thursday 2 September, I attended the Australian National Data Service (ANDS) Workshop at the eResearch Australasia Conference 2008. This was a full day workshop, but the ANDS team did a great job of keeping the workshop interesting and highly interactive, and the day went very quickly.

In the morning, there were a few brief presentations – notably from Andrew Treloar of Monash University and the ANDS Establishment Project and Tracey Hinds from CSIRO. I particularly enjoyed Tracey’s presentation, which at a conference that seemed dominated by IT issues, focused on the social issues and the governance issues involved in data management and sharing research data. My notes from Tracey’s talk are below.

The rest of the day was spent in small round-table discussions. The most lively discussion surrounded questions about what institutions and research bodies need to help them in managing and sharing their data, and how ANDS could help. The group found that there was a need for:

  • an openly accessible registry of ontologies for metadata of datasets, so that institutions can start using common and enduring metadata to describe their data;
  • training for researchers, repository managers, research management staff, librarians, archivists and IT staff about data management (including the legal issues surrounding data management), database/repository infrastructure (how to make the database easy to use and sustainable), open access (why should you share your data?) and metadata. It was agreed that the training materials might have a generic introduction component that could be used by all groups, but then there should be different kinds of training materials that provide relevant detail to different groups (e.g. research management staff will have different concerns to IT staff; science researchers may have different concerns humanities researchers);
  • developing conventions for the citation of data, so that researchers can get credit for sharing their data; and
  • proper and comprehensive data management plans (DMP).

There was a consensus that data management plans were particularly important and that it would be useful to develop template DMPs which included specific sections that could be added or deleted as appropriate (for example, a section about compliance with privacy laws might be relevant to medical research but not to astronomy research). It was also thought that ANDS could select a few research projects from different disciplines and assist these projects in formulating a DMP. The resulting DMPs could then be made available online for other projects to use and adapt.

In relation to ANDS selecting particular projects to assist, in a broader way, with their data management and release (“engagement targets”) in the hope that these projects might then appear as “exemplar projects” for other groups, it was considered that appropriate selection criteria might be:

  • broadness of audience and impact;
  • potential for reuse of data and the ongoing reusability/sustainability of the data;
  • the project’s willingness to assist others to develop their data management skills;
  • wide inter-disciplinary appeal;
  • willingness to transfer data around; and
  • projects which will have good exemplary value to attract other communities.

I believe that ANDS will make the notes taken from the workshop available online.

Here are my notes from Tracey’s talk:

Tracey Hind – CSIRO

  • ownership of data should stay with researcher
  • but still need to manage CSIRO’s data at a higher level – maybe provide an “enabling” service for this rather than dictate a “one size fits all” approach
  • As of now, CSIRO still does not formally recognise the idea of data management
  • Real challenges are not technology – it is the human factors – issues of acceptance, understanding, people being prepared to share their data, IP etc
  • High demand for storage, but storage is not management
  • Scientists are not working as well across disciplines as the Flagship vision as hoped, much of this is because “you don’t know what you don’t know” – and it’s hard getting insight into other research disciplines
  • Making data easily discoverable is the key to achieving multi-disciplinary outcomes
  • Lesson is that data is a complex issue – especially when researchers don’t understand the potential benefits – you need exemplar projects to demonstrate the benefits of data management to get buy in.
  • CSIRO’s data management vision (eSIM) – CSIRO scientists will be able to…gather, analyse and share scientific information securely and efficiently, leading to greater scientific outcomes for Australia
  • Four layers – people, processes, technology and governance
  • People challenges = incentives for deposit into a repository;
  • Processes challenges = making sure that the work flows created actually support the technology and make things easy
  • Governance = making sure all of this is properly funded and that data management is a part of the decision making (i.e. make sure researchers have a DMP before they are awarded funding)
  • CSIRO’s exemplar projects = Auscope project; Atlas of Living Australia; Corporate Communications

eResearch Australasia Conference 2008 – Tuesday morning (30 September)

John Wilbanks – Uncommon Knowledge and e-Research

Once again, John Wilbanks gave an informative and dynamic presentation. It was geared towards the audience in attendance here at the eResearch Australasia Conference (who are somewhat more IT and science focused than the audience at the OAR conference last week) and so described in detail many aspects of the NeuroCommons Project. If you are interested, I suggest that you see the Neurocommons website. I don’t think any summary that I could provide here would do the project justice. But here are some notes from the beginning of John’s presentation:

Why “eResearch”?

1. eResearch is a requirement imposed on us by the flood of data

  • the web doesn’t give us the same results for science as it does for culture
  • so what can we do?
  • We can…collaborate
  • Eg – Watson and Crick – their success was composed, by building on a series of blocks of knowledge that were available to them from a range of sources
  • But humans can’t build models to scale anymore
  • We need to utilize digital resources

One way to think about eResearch is that it is about:

  • Finding the right collaborator;
  • making big discoveries;
  • getting credit for one’s work

2. We need to convert what we know into digital formats that support model buildings

  • “the web” – no organising topics – hyperlinking allows us to organise things in a dynamic way
  • all the data and all the ides: building blocks
  • open access attempts to solve the legal problems – giving credit where credit is dues; allows humans to read the papers; allows publicly funded research to be accessed by the public
  • but it doesn’t solve the technical problem of paper-based formats that cannot be read by machines
  • we need to develop machine-searchable formats

Kerstin Lehnert, Columbia University – New Science Communities for Cyberinfrastructure: The Example of Geochemistry

Kerstin described eResearch as a vision to provide a genuine infrastructure of highly reliable, widely accessible ICT capabilities to assist researchers in their work – ultimately about people

She discussed the cultural issues involved in sharing data. She identified data citation (what I would call “attribution”) as a big problem. How can all scientists and contributors be cited? Many want to be attributed personally (not just by a project), but there are so many contributors and this quickly becomes a big and messy problem. This observation reflects the problem that we at the OAK Law and Legal Framework to eResearch Projects identified in assessing whether Creative Commons licences could be applied to data compilations. Attribution is an important condition of the CC licence. Researchers and research projects need to decide and identify (before applying a CC licence) how the data compilation is to be attributed, otherwise users could run into all sorts of problems and confusion.

Jane Hunter (UQ) – National Committee for Data in Science (NCDS)

A committee of the Australian Academy of Science – established in February 2008; member of CODATA

Mission – to promote enduring access to Australia’s scientific data assets in order to drive national research and innovation
And to provide a National Data Science voice
Encourage and facilitation cross-fertilisations, between specific science disciplines and other data generation/management disciplines

Future activities include engaging with Chairs of other national committees, including looking at what role they can play within ANDS (Australian National Data Service) to support their goals.

APSR Workshop – The Data Management Plan: Putting Policy into Practice

On Friday 8 August 2008, I attended the Australian Partnership for Sustainable Repositories (APSR) Workshop, “The Data Management Plan: Putting Policy into Practice” at the University of Melbourne.

Professor Anne Fitzgerald, with whom I work at QUT, gave an excellent and very well received presentation on the legal issues surrounding data management. Her slides can be viewed here.

Here are my notes from the workshop (made roughly during the day):

Data management plans: from idea to reality (10:15am – 10:45am)

Dr Markus Buchhorn (ANU) for Karen Visser

  • We need enduring systems that outlive projects and programs
  • Individuals are human – seven deadly fears:
  1. fear of missed “nuggets” in their data – milk it for everything, for ever and veer
  2. fear of missed errors
  3. fear of unknown custodians/stewards
  4. fear inappropriate leaks (privacy/ethics) – can ruin trust relationships with others
  5. fear the cost of effort
  6. fear lack of recognition
  7. fear trusting someone else’s data

Plan ahead – help researchers to help themselves as far as possible
Build relationships of trust with researchers – engage with researchers as early as possible

Mark Euston (ANU – Information Literacy Program)

  • tasked with developing a training course, workshop and online, for early to mid career researchers, on Data Management Plans (DMP)
  • Objectives of the course –
  1. what is Data Management (DM)?
  2. benefits and requirements
  3. raising awareness of DM services
  4. DMP
  • Manual based on Guidance on Data Management (UK) and Guide to Social Science Data Preparation and Archiving
  • get researchers in by stressing how they can work with their data more effectively and efficiently

What’s happening at… (11:10am – 12:30pm)

Belinda Weaver (UQ)

Issues for the data survey:

  • no ‘joined up’ services
  • no help
  • inequity – not fair – nothing works etc.
  • costs
  • lack of training (people felt insecure about what they were doing)
  • uncertainty
  • no incentive, no rewards

Recommendations from focus group:

  • standardised DM template for funding applications
  • legal advice centralised and accessible
  • service focused support teams for research projects – specific to the discipline
  • survey of all existing data
  • central data storage system
  • develop a clear UQ data management policy
  • templates

Central management of research data – issues:

  • trust
  • data integrity
  • accidental disclosure
  • control
  • sharing
  • re-use (want to know what use has been made of their data – auditing – and if they give data to a person for a particular purpose, they want to know if the person doesn’t end up using the data or not using it for the particular purpose)
  • the long term

Wish-lists:

  • clear policy and guidelines
  • account manager
  • specialists on teams (want to know who to go to for advice)
  • career path?
  • rewards
  • templates for everything
  • funding to do it properly
  • advice and consultancy
  • institutional support
  • tools (but they want to be told only when they want to be told, and be told how they want to be told)

presentations from workshop available at: http://www.library.uq.edu.au/escholarship/orca.html
UQ developing a expert curation advice service

Lyle Winton (Uni of Melbourne)

  • Uni of Melbourne have a research DMP template
  • looking at training for undergrad students
  • looking at how to keep this up to date
  • possible data management registries
  • from 500 charges of research misconduct, 40% could have been avoided by good data management

http://www.esrc.unimelb.edu.ay/dmp/references.html

Suzanne Clarke (Monash)

  • Monash has a Data Management Committee
  • Research Data Management Toolkit for librarians so they know what to talk about to researchers
  • Identified needs: more education required for researchers on statutory requirements for data, IP and the ownership of research data

Gillian Elliot (University of Otago – NZ)

  • As far as she is aware, NZ has no policies surrounding data management
  • so NZ in quite a different position to Australia
  • Survey in 2007 – researchers in NZ had a lot of data and a lot of stuff loosely stuck together that were unpublished and hard to classify – need help with data management
  • data management and copyright concerned researchers – 48% of survey respondents
  • Atlas of Living Australia; Convention on Biological Diversity; Department of Conservation and Land Information New Zealand; Land Care NZ; National Vegetation Survey Databank

Dr Ashley Buckle (TARDIS – Monash University)

  • TARDIS is a multi-institutional collaborative venture that aims to facilitate the arching and sharing of raw X-ray diffraction images
  • Protein Data Bank – growing exponentially – too much data?
  • Benefits to making raw data available – experiment reproducability/validation

Discussion Groups: Group 2 – Processes for Data Management Planning (1:15pm-2:45pm)

How do we make DM part of the usual research practice?

How can we make raw data count as a citation? – for funding etc. – this is very important, if there is greater recognition of the value data in itself as a citable object then researchers will be more willing to manage their data properly.

Ashley Buckle – we need “data journals” – essentially the same as a database but greater recognition

DM needs to give you a reward at the end that is at the same level as rewards from publication

Better tools – build the researchers tools that are so good that they do not actually realise that they are managing their data.

Reporting back to main group and discussion (2:45pm-4:00pm)

  1. Roles, rights and responsibilities
  • Anne Fitzgerald’s domains of responsibility
  • Policy plus principles
  • disseminate research data as widely as possible
  • develop practical toolkits
  • risk management for universities
  • simple for universities to completing
  • ongoing legal and policy advice
  • insert data management requirements into research proposals and grants
  • get recognition via NHMRC, ARC and ERA to provide regulatory and reward structure
  • need for national centre for legal policy and advice in regard to the data lifecycle including reuse
  • universities to incorporate data management into risk management strategies
  • provide pragmatic family of licences/responsibility statements (like CC) to identify roles and policies
  • DMPs to be built into research project formulation and management

2. Processes for data management planningbetter tools and incentives: build better workflows

  • allowing data management in their modelling: harness tools onto repositories
  • citation: make sure that citation of datasets happens and is rewarded, as incentive for researchers to create good data
  • persuade ARC to make explicit expression of intent in ERA eventually to credit data citation (at least down the road). This as formal submission from this workshop
  • infrastructure: development of a COHERENT NATIONAL NETWORK of repositories, emphasis on discipline specific repositories (though institutionally supported) as a centre for research activity

3. Making it work

  • know what you don’t know
  • each institution needs to:
  1. identify the needs of its researchers (possible role for ANDS here)
  2. map the available services (needs to happen locally)
  3. strategically target the gaps
  4. identify candidate services to drop to fund this
  • Make it easy
  • provide a visible point of contact for the users
  • not necessarily through one channel only
  • not necessarily a one size fits all solution
  • embed regular formal training in how to use services
  • needs to be as easy to use as “MyFlickBook”
  • outreach, marketing, publicity
  • Start small and scale
  1. seed the service and gradual expand it as understanding grows
  2. start with young researchers and use peer group pressure over tie
  3. get good examples going first to generate some quick wins
  4. use growth in tandem with policy
  • Reward innovators in shared services
  1. provide annual performance incentives for going beyond meeting strategic goals
  2. encourage shared services staff to learn new skills
  3. create new job descriptions for new people in management