17.11.2011

Introduction

The Australian Law Reform Commission (ALRC) released its National Classification Scheme Review Issues Paper on 19 May, 2011. The ALRC received over 2,300 submissions in response, over five times as many submissions as it has received for any previous inquiry.

Of the submissions received, the ALRC had 819 requests that the submission remain confidential. It is the policy of the ALRC to make submissions publicly available on its website unless otherwise requested. In all but two cases, the confidentiality request came from a member of the public not wishing to have their identity made available on the website, rather than from an organisation presenting material that was commercial-in-confidence or restricted for other reasons. Two submissions were not able to be made public in full, due to the graphic nature of some content.

In addition, there were submissions that could not be made public either because they divulged personal details, they were not in a format lending itself to online posting, or were blank submissions.

Over 95 per cent of submissions responding to the Issues Paper came from individuals, with the balance from organisations, industry or professional associations, or companies. The category of “Organisation” includes government agencies, religious organisations, lobby and interest groups, political parties and other entities with an official status. Among the “Individual” submissions were expert submissions from academics, former classifiers, parliamentarians and recognised experts in the field. Nonetheless, the vast majority of submissions came from members of the general public.

The ALRC makes an online submission form available and encourages its use by submitters. For those making use of the online submission form, there is a template response form, which invited comment on each of the 29 questions that the ALRC had raised in its Issues Paper. Where the online submission form was not used, most respondents nonetheless took the opportunity to address questions individually, accompanied by some overall commentary on classification issues from their perspective. It was not obligatory to answer all questions, so some questions received considerably more responses than others.

Reflecting its commitment to transparent inquiry processes, the ALRC has provided a preliminary analysis of responses to seven key questions, as well as a general overview of submissions received. This will be updated to include the full range of responses as they become available.

Analysing “big data”

It is estimated that the submissions received by the ALRC constituted about 1.4 million words in total, generating what is known as “big data”. The ALRC has made use of Leximancer software to represent the data in graphical form, in order to provide a picture of the overall pattern of submissions.

Leximancer is a text analytics program that enables the user to form concepts out of clusters of words that appear alongside each other in texts, and to visualise and interrogate their inter-connectedness and co-occurrence. It enables larger themes to be identified, and to be represented in graphical form, for data users.

This is not simply a matter of using computing tools, nor is it about substituting automated processes for expert analysis and judgement. Rather, it is about using intelligent computing systems to: fuse large amounts of data into succinct meanings; process meanings in contextually relative ways; generate new insights from large amounts of data; infer hypotheses and relationships; provide easier access to the intuitions of others; and to present information in relevant ways that complement expert knowledge. ^[1] The use of computational research tools such as Leximancer can be useful in generating information graphics (infographicics) that capture key concepts emerging from large amounts of data.

In this preliminary overview of responses to the Issues Paper, the ALRC has provided an overview of all submissions, as well as responses to individual questions 1, 2, 3, 12, 16, 24 and 25, using the graphical formats enabled by the Leximancer software.

Key terms

The Leximancer software makes use of certain key terms in data analysis:

Theme: a theme is a cluster of related concepts that are mentioned together often in the sampled text. These are represented graphically as the large circles (e.g. red circle titled “children” on p. 5);
Concept: a concept is a group of words that travel together in the text, within a space that was set for this exercise at two sentences. It draws together related words in the text and suggests these as concepts e.g. “child” and “children”, “rating” and “ratings”. Concepts are represented as the smaller coloured dots. The larger the dot, the more the concept was used.
Connectivity: the connectivity of a concept is measured by the number of times it occurs alongside other concepts, within a space set for this exercise at two sentences. Connectivity is represented graphically by the lines that link dots.

Another relevant concept is that of proximity. Where concepts and themes appear close to one another in an infographic, it indicates that they tended to be used together. On p. 4, for instance, the theme “states” appears close to that of “classification”, suggesting that respondents tended to refer to states in relation to their role in the current classification scheme. At the same time, it has some distance from the theme of “children”, suggesting that statements relating to children and classification did not typically make reference to the role played by states in the current scheme.

^[1] Gary Klein, Brian Moon and Robert Hoffman (2006), “Making Sense of Sense Making”, IEEE intelligent Systems 21(4): 70-73; Terry Flew, Christina Spurgeon, Anna Daniel and Adam Swift (2012) “The Promise of Computational Journalism”, Journalism Practice (forthcoming), available at http://dx.doi.org/10.1080/17512786.2011.616655.