Introduction
When publishing data, Statistics New Zealand has an obligation to protect the information of individuals and businesses who have been surveyed. At the same time, the point of collecting the information is to make use of it for statistical purposes, so the aim is to make available as much useful information as possible while maintaining the confidentiality of respondents. This report provides an overview of how data can be changed in the application of confidentiality techniques.
The term confidentiality refers to protecting data that is accessed by anyone other than the publishing agency. Confidentiality can be defined as "the agreement, explicit or implicit, made between the data subject and the data collector regarding the extent to which access by others to personal information is allowed" (National Research Council and Social Science Research Council, 1993:22). This ranges from protecting output tables that are published, to modifying microdata for access by other government departments or researchers. Confidentiality methods are applied to reduce the risk of disclosures*.
Statistics New Zealand data originates from sample surveys, population censuses and the collection of administrative data. All three produce unit record datasets, otherwise known as microdata datasets. In these, every individual (person, enterprise, event, etc) has one record in the dataset.
Statistics New Zealand releases information for microdata datasets in two distinct ways. The first is by publishing tables; the second is by removing identifiers and granting access to researchers, under strict conditions, to modified versions of the datasets.
The confidentiality techniques applied to tables will depend on the type of data in the table. A table consists of cells defined by categories, containing (usually) aggregated (combined) responses. The two main types of output tables, each with their own specific confidentiality risks and particular modification methods, are:
- Count data tables
- Magnitude data tables.
*Disclosure Recognition of confidential information.