Data and text mining

168. The growth of digital technology and online social networking has seen increasing amounts of data—text, images, numbers—stored in databases and repositories.[195] The UK Government has defined data and text mining as:

Automated analytical techniques … [that] work by copying existing electronic information, for instance articles in scientific journals and other works, and analysing the data they contain for patterns, trends and other useful information.[196]

169. Data mining is used across a number of research sectors, including medicine, business, marketing, academic publishing and genomics. Some examples include:

  • mining of human DNA sequences to discover the individual risk of developing diseases;

  • systematic reviews of literature and text to establish the current state of knowledge in a particular field;[197] and

  • mining Twitter feeds to gain knowledge about ‘consumer sentiment’.[198]

170. The Terms of Reference refer to the general interests of Australians to ‘access, use and interact with content in the advancement of education, research and culture’. Researchers and research institutions have highlighted the value of data mining in paving the way for novel discoveries, increased research output and early identification of problems.[199]

171. At the commercial level, the ability to extract value from data is an increasingly important feature of the digital economy. For example, the McKinsey Global Institute suggests that data has the potential to generate significant financial value across commercial and other sectors, and become a key basis of competition, underpinning new waves of productivity growth, innovation and consumer surplus.[200]

Current law

172. There is no specific exception in the Copyright Act for data mining. Where the data mining process involves the copying, digitisation, or reformatting of copyright materials without permission, it may give rise to copyright infringement.[201] For example, a researcher who seeks to mine the data from a back catalogue of a journal may need to copy entire works (individual articles) as part of the technical process, but cannot do so without the permission of the copyright owner and publisher.

173. One issue is whether data mining, if done for the purposes of research or study, would be covered by the fair dealing exceptions. The reach of the fair dealing exceptions may not extend to data mining if the whole dataset needs to copied and converted into a suitable format. Such copying would be more than a ‘reasonable portion’ of the work concerned.[202] Nor is it clear whether copying for data mining would fall under the exception relating to temporary reproduction of works as part of a technical process, under s 43B of the Copyright Act.

174. Data mining overlaps with the issue of database protection. The High Court’s decision in IceTV Pty Ltd v Nine Network Australia Pty Ltd emphasises that copyright protection does not subsist in the underlying data that forms a database, but rather in the particular form of expression.[203] The Court referred to the apparent lack of protection of databases as a gap in the law.[204] This concern raises arguments, for example, that an unremunerated exception would remove incentives to convert data into the right forms, or to develop or provide services to the research sector. However, the scope of copyright protection of databases is outside the ALRC’s Terms of Reference.

Reform options

175. The need for a specific data mining exception has been hotly contested in the UK. The Hargreaves Report recommended that the UK Government ‘press at EU level for the introduction of an exception allowing uses of a work enabled by technology which do not directly trade on the underlying creative and expressive purpose of the work’.[205] The report also recommended that the Government ensure that such an exception cannot be overridden by contract.[206]

176. As a result of the Hargreaves Report, the Joint Information Systems Committee examined the value and benefit of data and text mining to the UK higher education sector. Their report broadly affirmed the potential value of text mining to the UK economy, and that its potential benefits are limited by current copyright law.[207]

177. A follow-up report of the Business, Innovation and Skills Committee of the UK Parliament, in response to the Hargreaves Report, did not endorse a exception to deal with data mining for research. Rather, it hinted that appropriate licensing models may be appropriate:

We believe that policy … should also recognise the potential benefits of content mining, the core contribution of researchers and the need for ready access. We believe that publishers should seek rapidly to offer models in which licences are readily available at realistic rates to all bona fide licensees.[208]

178. No jurisdiction appears to have an existing exception specifically dealing with data mining. However, any new broad flexible exception based on a concept of ‘fair’ or ‘reasonable’ use may also be expected to cover some data mining processes.[209]

Discussion

179. It appears that copyright issues related to data mining are most prominent in the academic and scientific arenas. The Hargreaves Report suggested that an exception is particularly appropriate to facilitate non-commercial research, because

the technology provides a substitute for someone reading all the documents—these uses do not compete with the normal exploitation of the work itself—indeed, they may facilitate it. Nor is copyright intended to restrict the use of facts.[210]

180. The legal uncertainty and the transaction costs involved in rights clearance may impede access to data for researchers and this may have an impact on research output. The lack of a data mining exception may also act as a disincentive to the uptake of innovative data mining technology.

181. The ALRC is interested in stakeholder views about how data mining tools are being used in Australia and whether such uses are impeded by the Copyright Act. If a specific exception to allow data mining is needed, how should such an exception be framed? Should it be confined to non-commercial research? Or are there other, better ways of providing for the legitimate use of data mining and data analytics software?

Question 25. Are uses of data and text mining tools being impeded by the Copyright Act 1968 (Cth)? What evidence, if any, is there of the value of data mining to the digital economy?

Question 26. Should the Copyright Act 1968 (Cth) be amended to provide for an exception for the use of copyright material for text, data mining and other analytical software? If so, how should this exception be framed?

Question 27. Are there any alternative solutions that could support the growth of text and data mining technologies and access to them?

[195] S Sirmakessis, Text Mining and its Applications: Results of the Nemis Launch Conference (2004).

[196] As defined by UK Government Intellectual Property Office, Consultation on Copyright (2011), 80. See also, D Sašo, ‘Data Mining in a Nutshell’ in S Džeroski and N Lavrač (eds), Relational Data Mining (2001). Data mining programs are often called data-analytics software.

[197] Joint Information Systems Committee, The Value and Benefit of Text Mining to UK Further Higher Education (2012), 15.

[198] F Filloux, Datamining Twitter: Making Sense of the Twitter Noise is About to get Easier (2011) <www.guardian.co.uk/technology/2011/dec/05/monday-note-twitter> at 26 July 2012, referring to companies such as DataSift and Lexalytics that provide data mining software.

[199] United Kingdrom Government, Consultation on Copyright: Summary of Responses (2012), 17.

[200] McKinsey Global Institute, Big Data: The Next Frontier for Innovation, Competition and Productivity (2011), Executive Summary. It is suggested that big data equates to financial value of $300 billion (US Health Care); 250 billion Euros (EU Public sector administration); global personal location data ($100 billion in revenue for service providers and $700 billion for end users).

[201] See Copyright Act 1968 (Cth) s 31, giving the copyright owner the exclusive rights to the work.

[202] Ibid s 40(5) setting out what is a ‘reasonable portion’ with respect to different works.

[203]Ice Tv Pty Ltd v Nine Network Australia Pty Ltd (2009) 239 CLR 458.

[204] Ibid, [137]–[139]. For example, Directive 96/9/EC of the European Parliament and the Council of 11 March 1996 on the legal protection of databases, OJ L 77, 27.3.1996 (entered into force on 16 April 2006).

[205] I Hargreaves, Digital Opportunity: A Review of Intellectual Property and Growth (2011), 47.

[206] Ibid, 51. See the section below, ‘Copyright and Contracts’.

[207] Joint Information Systems Committee, The Value and Benefit of Text Mining to UK Further Higher Education (2012), 49.

[208] House of Commons Business, Innovation and Skills Committee, The Hargreaves Review of Intellectual Property: Where next? (2012), 19.

[209] See the section ‘Fair use’.

[210] I Hargreaves, Digital Opportunity: A Review of Intellectual Property and Growth (2011), 47.