Data-matching and data-mining

9.48 Rapid advances in information and communication technology since the 1970s have enabled agencies and organisations to collect and store vast amounts of personal information. This information is often generated by individuals conducting everyday activities, such as

withdrawing cash from ATMs; paying with debit or credit cards; using loyalty cards; borrowing money; writing cheques; renting a car or a video; making a telephone call or an insurance claim; and, increasingly, sending or receiving e-mail and surfing the Net.[93]

9.49 In addition, some technologies enable large amounts of personal information to be organised and analysed. Two methods of processing and analysing information are discussed in this section—data-matching and data-mining. This chapter discusses data-matching and data-mining outside the health and research context. A number of models that enable the linking of non-identifiable personal information for the purposes of health and medical research are discussed in Chapter 66.

9.50 Data-matching is ‘the large scale comparison of records or files … collected or held for different purposes, with a view to identifying matters of interest’.[94] Developments in information technology in the 1970s made data-matching economically feasible and it is conducted regularly in Australia, particularly by government agencies.[95] Data-matching can be conducted for a number of purposes, including to detect errors and illegal behaviour, locate individuals, ascertain whether a particular individual is eligible to receive a benefit, and facilitate debt collection.[96]

9.51 Data-mining has been defined as ‘a set of automated techniques used to extract buried or previously unknown pieces of information from large databases’.[97] Data-mining can be used in different contexts to achieve different goals. For example, it is increasingly used by organisations to enable them to ‘design effective sales campaigns, precision targeted marketing plans, and develop products to increase sales and profitability’.[98] Data-mining can also be used by law enforcement agencies to investigate criminal activities. For example, in 2006 it was reported that the National Security Agency in the United States was collecting telephone records of millions of Americans to analyse calling patterns in an effort to detect terrorist activities.[99]

9.52 There are three main steps in the data-mining process: (1) the data are prepared (or ‘scrubbed’) for use in the data-mining process; (2) a data-mining algorithm is used to process the data; and (3) the results of the data-mining process are evaluated.[100]

9.53 Data-matching and data-mining practices that involve personal information raise a number of privacy concerns. A major concern is that the practices can reveal large amounts of previously unknown personal information about individuals.[101] This concern is exacerbated by the fact that data-matching or data-mining can occur without the knowledge or consent of the data subject, thereby limiting the ability of the data subject to seek access to information derived from a data-matching or data-mining program.[102]

9.54 Another concern relates to the accuracy of the data derived from a data-matching or data-mining process. Data-matching and data-mining involve using information collected for different purposes and in different contexts.[103] If information is incorrect or incomplete at the time of collection, or ceases to be accurate some time after collection, the information generated by the data-matching or data-mining process will be inaccurate. In the case of data-mining, an additional concern is that it is often difficult to inform the data subject of the exact purpose for which his or her personal information is to be collected or used. This is because data-mining activities aim to discover previously unknown information. Further, there is concern about the storage of large amounts of personal information gathered for the purpose of data-matching or data-mining.[104]

[93] Information and Privacy Commissioner Ontario, Data Mining: Staking a Claim on Your Privacy (1998), 1.

[94] Office of the Federal Privacy Commissioner, The Use of Data Matching in Commonwealth Administration—Guidelines (1998), [14].

[95] A Caine, E-government: Legal and Administrative Obstacles to Sharing Data Held by Australian Government Agencies (2004) Australian Government Information Management Office.

[96] R Clarke, ‘Computer Matching by Government Agencies: The Failure of Cost/Benefit Analysis as a Control Mechanism’ (1995) 4 Information Infrastructure and Policy 29, 33.

[97] Information and Privacy Commissioner Ontario, Data Mining: Staking a Claim on Your Privacy (1998), 4.

[98] Ibid, 1.

[99] L Cauley, ‘NSA has Massive Database of Americans’ Phone Calls’, USA Today, 10 May 2006, <www.usatoday.com>.

[100] J Bigus, Data Mining with Neural Networks (1996), 10–11, cited in Information and Privacy Commissioner Ontario, Data Mining: Staking a Claim on Your Privacy (1998), 5.

[101] V Estivill-Castro, L Brankovic and D Dowe, ‘Privacy in Data Mining’ (1999) 6 Privacy Law & Policy Reporter 33, 34.

[102] See, eg, Information and Privacy Commissioner Ontario, Data Mining: Staking a Claim on Your Privacy (1998), 14.

[103] See, eg, Ibid, 10–11.

[104] Office of the Privacy Commissioner, Getting in on the Act: The Review of the Private Sector Provisions of the Privacy Act 1988 (2005), 240.