Copyright protection and data compilations

Over on the Digital Curation Blog, Chris Rushbridge has an interesting post entitled, “Are research data facts and does it matter?”

I have just posted a response, which I am reproducing here:

I would like to take this opportunity to explain some of the research we have undertaken in the OAK Law Project and conclusions we have reached regarding copyright protection of data compilations in Australia. We have two primary publications addressing this area: Building the Infrastructure for Data Access and Reuse in Collaborative Research: An Analysis of the Legal Context and Practical Data Management: A Legal and Policy Guide.

s10(1) of the Australian Copyright Act 1968 defines a literary work to include a “compilation”. This is where protection for data compilations under Australian law derives from. Any data that is collected, arranged, organised and presented in a logical fashion will usually be regarded as a compilation.

Chris makes a good point that many data compilations will require a great deal of effort, analysis and creativity. In the US, creativity is a requirement before a data compilation can be protected by copyright. In Australia, creativity is not required. Only that the compilation is a result of the exercise of skill, knowledge or judgment in the arrangement of the data, or the investment of substantial labour or expense in collection the material (Desktop Marketing v Telstra).

It can often be difficult to tell whether a compilation is one that would attract copyright protection. In our work, we have tended to err on the side of caution and assume that most compilations will attract copyright protection. This is because the threshold in Australia is so low. The main case in this area, Desktop Marketing v Telstra, involved the copying of a telephone directory. A telephone directory is merely a compilation of names and numbers listed in alphabetical order. If this is a compilation that attracts copyright, then most other compilations are likely to be protected by copyright under Australian law as well.

Copyright law does not protect mere facts or information. Rather, it protects the expression of facts or information in a material form. This means that generally there would not be a problem with copying some of the basic facts contained in a compilation. For example, if I were to list the names and numbers of a small collection of my colleagues on my website, that would not usually be a problem. I have extracted the data that I need, in a fairly “random” fashion (in that I have not just copied a few pages of names and numbers in alphabetical order directly from the White Pages). I have not copied the way that the data is arranged in the telephone directory (the “expression”).

In regards to Science Commons’ decision to discontinue advocating the application of Creative Commons licences to data compilations, my understanding is that they came to this decision for two reasons:
(1) It was not always clear in the US whether the relevant compilation attracted copyright. If it did not but a person had put a CC licence on the compilation in the mistaken belief that it did, then restrictions would have been imposed on that dataset (e.g. that it could only be used non-commercially) which actually had no legal basis for being imposed; and
(2) CC licences all contain an attribution requirement and Science Commons were concerned about what they call “attribution stacking” – i.e. where a dataset is compiled from data contributed by many different researchers, it would be extremely difficult for a user to attribute all of those researchers.

At OAK Law, we still believe that CC licences can be applied to datasets in Australia because the concerns noted by Science Commons do not arise to the same degree in Australia. Firstly, we have a lower threshold test for copyright protection, meaning that copyright will more readily attach to datasets in Australia and the first problem noted by Science Commons is less likely to occur. Nevertheless, to be sure, we usually advocate that the widest CC licence – the attribution only licence – be applied to datasets. Secondly, unlike in the US, Australian copyright law includes Moral Rights, meaning that creators have to be attributed anyway, regardless of whether a CC licence is applied or not. We think there are various ways of getting around the “attribution stacking” problem – for example, a group of researchers could agree on a common way to be attributed (e.g. we could be attributed as “the OAK Law Project”), or the data could be attributed using a URL, which an interested party can visit and which can list all the contributors (and this list can be added to over time). The advantage of applying CC licences to data, in our view, is that it provides some certainty to users about what they can and cannot do with that data.

If you are interested in this issue, I would also advise reading these posts by Robin Rice and Rufus Pollock.