1 The statistics in this publication were compiled from the 2016 Australian Migrants and Census Integrated Dataset (ACMID).
2 The statistics in this publication relate to people who have migrated to Australia under a permanent Skill, Family, Humanitarian or Other Permanent visa stream and arrived in Australia between 1 January 2000 and 9 August 2016 (see Note 8). In this publication, this population is referred to as Permanent Migrants.
3 The 2016 Australian Migrants and Census Integrated Dataset (ACMID) Project linked the 2016 Census of Population and Housing dataset with Department of Social Services (DSS) Permanent Migrant Data.
Permanent Migrant Data
4 The Permanent Migrant Data (PMD) is administrative data pertaining to permanent settlers in Australia from various departmental systems and a number of external sources, including the Department of Home Affairs (Home Affairs) and Department of Human Services (Medicare Australia). The Department of Social Services (DSS) is the custodian of the data. The data provides information for the evaluation and planning of settlement services within DSS and for other government and community agencies involved in the settlement of migrants.
2016 Census of population and housing
6 The scope of the 2016 Australian Migrants and Census Integrated Dataset (ACMID) is restricted to people who responded to the 9 August 2016 Census of Population and Housing and who had a permanent migrant settlement record with a date of arrival between 1 January 2000 and 9 August 2016 (inclusive).
7 The 2016 ACMID excludes:
- Persons whose Census record indicated that they were an overseas visitor
- Persons on a Temporary or Bridging visa
8 The date of arrival on which the scope is based reflects an individual's latest arrival pertaining to their latest permanent visa. For an offshore applicant, the arrival date is when the applicant arrives in Australia on that permanent visa. However, for a person who applies onshore for a permanent visa, the date of arrival listed is the date of their last entry into Australia.
9 Statistical data integration involves combining information from different data sources such as administrative, survey and/or Census to provide new datasets for statistical and research purposes.
10 Data linking is a key part of statistical data integration and involves combining records from different source datasets using variables that are shared between the sources. Data linkage is performed on unit records that represent individual persons.
Linkage between the Permanent Migrant Data and the 2016 Census
11 The 2016 Permanent Migrant Data records were linked to the 2016 Census of Population and Housing data using a combination of deterministic and probabilistic linkage methodologies.
12 Deterministic data linkage, also known as rule-based linkage, involves assigning record pairs across two datasets that match exactly or closely on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information to efficiently identify links. It is less applicable in instances where there are issues with data quality or where there are limited characteristics. The deterministic linkage method used in this project is considered a silver standard linkage because encoded name and address information was used in this phase of the linkage.
13 Probabilistic linking allows links to be assigned in spite of missing or inconsistent information, providing there is enough agreement on other variables to offset any disagreement. In probabilistic data linkage, records from two datasets are compared and brought together using several variables common to each dataset (Fellegi & Sunter, 1969).
14 A key feature of the methodology is the ability to handle a variety of linking variables and record comparison methods to produce a single numerical measure of how well two particular records match, referred to as the 'linkage weight'. This allows ranking of all possible links and optimal assignment of the link or non-link status (Solon and Bishop, 2009).” This probabilistic linkage method used in this project is considered a silver standard linkage because it also used encoded names and address, date of birth, country of birth, year of arrival and codes representing small geographic areas. Further information about name and address encoding can be found in Information paper: Name encoding method for Census 2016.
15 At the completion of the linkage process 1,924,551 (88%) out of 2,166,014 records from the Permanent Migrant Data were linked to the 2016 Census data. The overall linkage accuracy (precision) for this project was estimated to be around 99%. Of the final 1,924,551 linked records, 549,361 (28%) records linked using the deterministic linkage method and 1,375,190 records (71%) were linked using the probabilistic method.
16 While the linkage is of high quality, there is a small chance of linkage error: false links and missed links. False links are influenced by the similarity of linking information in records that actually represent different individuals. This may be due to random chance but is primarily driven by low-quality information in linking variable: the less information available to discriminate two individuals, the more likely they will match by chance. Missed links are primarily influenced by the absence of an individual from Census and a lack of sufficient quality in linking variables.
17 The estimates in this publication are obtained by assigning a "weight" to each linked record. The weight is a value which indicates how many Permanent Migrant Dataset records are represented by the linked record. Weights aim to adjust for the fact that the linked Permanent Migrant Dataset records may not be representative of all the Permanent Migrant Data records. The weights on the Australian Census and Migrants Integrated Dataset, 2016 (ACMID 2016) range from 1.0 to 3.2.
18 A person level file containing 2,167,501 records which consisted of information about a permanent migrant’s basic demographics (age, sex marital status etc.), migration characteristics (visa subclass, applicant status, location of visa grant, etc.) and a history of address changes was used in a two-step calibration process.
19 The first step of the calibration process adjusted for non-response. The methodology adopted was developed to adjust for non-response in sample surveys. Concepts of non-response and non-links differ in that the former is a result of an action by a person selected in a sample, and the latter is the failure to link a record likely as a result of the quality of its linking variables. However, both situations may result in under/over representation, and as such the methodology developed to adjust for non-response is suitable to apply to adjust for non-links. Like its 2011 counterpart, ACMID 2016 is unique in that many characteristics of the non-linked records are known, and these characteristics can therefore be used as inputs into an adjustment for unlinked records.
20 The propensity of a Permanent Migrant Data record to be linked to a Census record was modelled using a logistic regression, which outputs the probability of linking for each record based on that record’s characteristics. Each record was then assigned an initial weight given by the inverse of this probability.
21 The second step of the calibration process uses the weights derived from the first step as an input into the calibration to the known Permanent Migrant Dataset subpopulation totals such as visa group, location of visa grant, applicant status and state/ territory of residence. Calibration was then conducted to the following benchmark totals from the Permanent Migrant Data file:
- Visa Stream by Location by Principal Flag
- Visa Sub Group
- Refugee Status (visa subclass 200) by Location of visa grant
- Country of Birth (Major group - 1 digit level)
- Country of Birth (Top 15 countries - 4 digit level)
- Arrival Year
- State by Visa Stream (Skill, Family, Humanitarian)
- Sex By Age group (10 year level)
22 The two-step calibration process then weighted the original 1,924,551 linked records up to 2,166,014 in scope records from the Permanent Migrant Data population.
23 Estimates in this publication are obtained by summing the weights of persons with the characteristic of interest. Cells in this publication have been randomly adjusted to avoid the release of confidential data. Discrepancies may occur between sums of the component items and totals.
Reliability of estimates
24 Error in estimates produced using the 2016 Australian Migrants and Census Integrated Dataset may occur due to false links and the non-random distribution of unlinked records.
25 The calibration process does not mitigate against the error introduced by false links or error introduced in the statistical linking process. Due to the quality issues mentioned above, estimates should generally be treated with caution.
26 Error introduced by under/over representation of characteristic based groups in unlinked records has been mitigated to some extent by the two-step calibration process.
Measures of error
27 In survey data sampling error is estimated using a measure of Relative Standard Error (RSE). Whilst RSEs can be produced for this data, they would not represent the error introduced by false links or error introduced in the statistical linking process, and have therefore not been included in this publication.
28 Statements made in the text of this publication that compare proportions between two population groups have not been tested for significance. Statistical significance testing requires an estimate of the magnitude of the error for each statistical estimate, which is not yet available for statistical estimates produced using the 2016 Australian Migrants and Census Integrated Dataset.
Interpretation of results
29 There are several variables common to the two source datasets which have definitional differences.
Year of arrival
30 Estimates in this publication are produced using the 2016 Census year of arrival variable (YARP). The year of arrival question on the Census asks overseas-born people to report the year they first arrived in Australia with the intention of staying for at least one year. The year the person first arrived in Australia to live here for one year or more may have occurred many years before their 'arrival date' as reported in the Permanent Migrant Data.
31 The 'Prior to 2000' year of arrival group represents those permanent migrants (whose Permanent Migrant Data arrival date is 1 January 2000 to 9 August 2016) who reported on the Census that they first came to Australia to live for one year or more prior to 2000. For some individuals, their year of arrival as reported on the Census is different to their Permanent Migrant Data arrival date pertaining to their permanent visa. The Permanent Migrant Data arrival date reflects an individual’s latest arrival pertaining to their latest permanent visa (see Note 8). Where the Census year of arrival precedes that of the Permanent Migrant Data, it is likely that the person was a temporary migrant for a period of time before attaining permanent resident status.
32 Due to the conceptual differences discussed (Notes 30-31) the year of arrival estimates in this publication will not reflect the Department of Home Affairs reported migrant intake for individual years of arrival, nor will they reflect year of arrival estimates from the 2016 Census of Population and Housing.
Country of birth
33 Estimates in this publication are produced using the 2016 Census country of birth variable (BPLP). The concept measured for country of birth is the same for both the Census and Permanent Migrant Data. However, the Census variable was coded using the Standard Australian Classification of Countries (SACC 1269.0) as it was on 9 August 2016, whilst the Permanent Migrant Data country of birth variable has been coded at the time of record creation over an 16 year period and therefore is based on a classification that has evolved over time.
34 For a substantial number of records, the 4 digit country of birth reported on the Census is different to the 4 digit country of birth recorded on the Permanent Migrant Data. For the majority of these records the 2 digit country of birth code is the same and the difference at the 4 digit level is due to differences in coding and the classifications.
35 Due to the conceptual differences described in Note 33 and 34 estimates for individual 4 digit country of birth may not necessarily reflect the Department of Home Affairs reported migrant intake from that country of birth.
Comparability with other data
36 Estimates from the 2016 Australian Migrants and Census Integrated Dataset will differ from the estimates produced from other ABS collections and estimates produced from the Permanent Migrant Data for several reasons. The estimates are a result of integrating data from two data sources, one an administrative dataset and the other a census. The linked records have been calibrated to known population totals from the Permanent Migrant Data, and the resulting dataset is unique from both the Census and the Permanent Migrant Data. Due to the quality issues mentioned in Notes 24 to 28, estimates should generally be treated with caution.
37 The ABS respects individuals rights to privacy and is committed to keeping information safe and secure. The ABS is subject to strong legislation protecting the confidentiality of information, including the Census and Statistics Act 1905 which makes it a criminal offence to breach secrecy provisions.
38 We handle personal information in accordance with the Privacy Act 1988 and the Australian Privacy Principles, and abide by the High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes.
39 In accordance with the Census and Statistics Act 1905, data are subject to a confidentiality process before release as noted above. This confidentiality process is undertaken to avoid releasing information that may allow the identification of particular individuals, families, households, dwellings or businesses.
Perturbation of data
40 To minimise the risk of identifying individuals in aggregate statistics, a technique is used to randomly adjust cell values. This technique is called perturbation. Perturbation involves small random adjustments of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics.
41 The introduction of these random adjustments result in tables not adding up. While some datasets apply a technique called additivity to give internally consistent results, additivity has not been implemented on the 2016 ACMID. As a result, randomly adjusted individual cells will be consistent across tables, but the totals in any table will not be the sum of the individual cell values. The size of the difference between summed cells and the relevant total will generally be very small.