Archive:EU statistics on income and living conditions (EU-SILC) methodology – sampling

This article has been archived.

This article is part of the Eurostat online publication EU statistics on income and living conditions (EU-SILC) methodology.

EU-SILC is a sample survey. The legislation specifies that data shall be based on nationally representative probability samples and prescribes minimum effective sample sizes, but leaves to the country the choice of a specific sampling design. This article describes the main characteristics of the sampling: sampling frame, sample design and sample size. It contains also information on the sampling errors. Finally tracing rules are explained.

Full article

Sampling frame

The big strength of EU-SILC is the usage of the best sampling frames available in each National Statistical Institute (NSI). According to the EU-SILC Framework Regulation, data are to be based on a nationally representative probability sample of the population residing in private households within the country, irrespective of language, nationality or legal residence status. All private households and all persons aged 16 and over within the household are eligible for the operation. Persons living in collective households and in institutions are generally excluded from the target population. The sampling frame as well as methods of sample selection should ensure that every individual and household in the target population is assigned a known probability of selection that is not zero. As shown in Table 1, the vast majority of countries used for the 2013 EU-SILC operation population registers, or national census or a master sample derived from the census.

Table 1: Source of the sampling frame (2013)
Source: National Quality Reports 2013

Coverage errors

Coverage errors are caused by the imperfections of a sampling frame for the target population of the survey. The target population is the set of elements for which estimates are desired while the frame population is composed of the units which are eligible for inclusion through a given sampling procedure. Ideally, there must be a one-to-one relation between target and frame population elements. If not, the following frame imperfections can be encountered:

over-coverage which relates either to wrongly classified units that are in fact out of the scope, or to units that do not exist in practice;
under-coverage which refers to units not included in the sampling frame;
misclassification which refers to incorrect classification of units that belong to the target population.

Detailed information about coverage problems and errors for each country when such kind of information is available can be found in the national quality reports.

Sampling design

The sample design describes all the steps to be carried out when selecting a sample of households or persons. It aims to improve the quality of the estimates produced and to control costs. Various strategies are in place in different countries to achieve this objective. The table 2 summarizes the sampling design used in each country for the 2013 operation. Countries choose a specific sampling design according to the structure of the country and the population, according to existing information and taking into account budgetary constraints. The most used sampling design is stratified multistage sampling. Only five countries do not use stratification criteria to draw their sample. In details: Malta, Denmark, Island and Norway use a simple random sample design and Sweden uses a systematic sample. Concerning all the remaining countries, they apply one or more stratification criteria, mainly a geographical stratification. Among them, the majority uses a multi-stage sampling with the exception of Luxembourg, Germany, Cyprus, Slovakia, Switzerland, Austria and Lithuania which use a stratified simple random sample. Estonia uses a systematic stratified sample and Hungary is the only country to apply a different sampling design for drawing each rotational group. Countries send every year to Eurostat general information on the sampling design used and detailed information at the level of micro-data on the strata and PSU from which each household is drawn. The efficiency of the sampling design has a big impact on standard error and should be monitored over time. On the other side, changing it is extremely costly.

Table 2: Main characteristics of countries’ sampling designs
Source: National Quality Reports 2013

Integrated design

Although one characteristic of EU-SILC is flexibility in terms of sampling design, Eurostat recommends a rotational design with four sub-samples or replications. All countries adopted for their 2013 operation the four-year rotational design recommended by Eurostat, with the exception of France and Norway where a longer panel duration (eight and nine years, respectively) was used.

Sampling unit

Concerning the sampling unit, it can be the address/dwelling, the household or the individual accordingly to the design chosen by the country. In the case of a sample of dwellings /addresses, if more than one household share the same dwelling, dwellings must be regarded as clusters of households. Households are clusters of individuals and all members of the household aged 16 and over at the end of the income reference period are eligible for inclusion in the sample. Countries that carry out a sampling of individuals, instead, only select persons of age 16 and over and the household is defined as the household of which the selected person is a member at the beginning of the survey. As showed in the Table 3, Nordic countries as well as Slovenia and the Netherlands select a sample of individuals while thirteen other countries select a sample of dwellings or addresses and only eleven countries select a sample of households.

Table 3: Sampling units

Sample size

Concerning the sample size, three different definitions can be applied:

the actual sample size, that is the number of sampling units selected in the sample;
the achieved sample size which is the number of observed sampling units (household or individual) with an accepted interview;
and finally, the effective sample size which is defined as the achieved sample size divided by the design effect.

The Framework Regulation of EU-SILC and its updates define the minimum effective sample size, which is the size that would be required if the survey was based on a simple random sampling (see Table 4). The actual sample sizes have to be larger to the extent that the design effect exceeds 1.0 in order to compensate the loss of effectiveness namely by the use of complex sampling design. The design effect is basically the ratio of the actual variance, under the sampling method actually used, to the variance computed under the assumption of simple random sampling. Below different concepts used when defining the sample size as well as the relation between them are presented.

Table 4: Minimum effective sample sizes for countries

Sampling errors

Given the high policy relevance of EU-SILC there is increasing demand from the stakeholders for accuracy measures of the published indicators and for measures of the significance of net change of indicators over time for correct monitoring of the evolution of social exclusion phenomena. As seen, EU-SILC is a complex survey involving different sampling design in different countries. For this reason, "to the book" standard methods for calculating accuracy measures are not directly applicable. Eurostat with the substantial contribution of Net-SILC2 has put in place a simple method for standard error estimation based on linearization and coupled with the ultimate cluster approach. Linearization is a technique based on the use of linear approximation to reduce non-linear statistics to a linear form, justified by asymptotic properties of the estimator. This technique can encompass a wide variety of indicators, including EU-SILC indicators. The "ultimate cluster" approach is a simplification consisting in calculating the variance taking into account only variation among Primary Sampling Unit (PSU) totals. This method requires first stage sampling fractions to be small which is nearly always the case. This method allows a great flexibility and simplifies the calculations of variances. It can also be generalized to calculate variance of the differences of one year to another.

For further details on this method for standard error estimation, please consult the working paper Standard error estimation for the EU-SILC indicators.

Sampling error calculations for main EU-SILC indicators

The method for estimating the standard error described above has been already applied for many EU-SILC indicators. In order to present how it can be implemented in practise, the example of its application for the indicator AROPE (At-risk-of poverty or social exclusion) is described. This indicator is the proportion of persons being in one or more of the three following situations: at-risk-of poverty, i.e. below the national poverty threshold (60% of median national equivalized income), severely materially deprived, living in a household with very low work intensity. This indicator has been considered as a proportion making the assumption that the poverty threshold is a fixed amount and equal to the point estimate. According to the characteristics and availability of data for different countries different variables have been used to specify strata and cluster information. In particular, countries have been split into three groups:

Belgium, Bulgaria, Czech Republic, Ireland, Greece, Spain, France, Croatia, Italy, Latvia, Hungary, the Netherlands, Poland, Portugal, Romania, Slovenia and the United Kingdom whose sampling design could be assimilated to a two stage stratified type we used DB050 (primary strata) for strata specification and DB060 (Primary Sampling Unit) for cluster specification;
Germany, Estonia, Cyprus, Lithuania, Luxembourg, Austria, Slovakia, Finland, Switzerland whose sampling design could be assimilated to a one stage stratified type we used DB050 for strata specification and DB030 (household ID) for cluster specification;
Denmark, Malta, Sweden, Iceland, Norway, whose sampling design could be assimilated to a simple random sampling, we used DB030 for cluster specification and no strata.

The approach used can take account of stratification, multi-stage selection, unequal probabilities of inclusion for the sample units and re-weighting for unit non-response. However it does not reflect the gain in accuracy caused by calibration weighting. The effect of calibration on variance could be significant especially in the countries where powerful auxiliary information from income registers has been used to adjust the sampling weights. This in some cases may lead to overestimation of sampling errors. Results are shown in Table 5.

Table 5: AROPE indicator (2013) standard error and 95% confidence intervals
Source: Eurostat

The same approach has been used to calculate variance of net change over two different years. In order to monitor the process towards agreed policy goals, particularly in the context of the Europe 2020 strategy, users are particularly interested in the evolution of social indicators. However, interpreting differences between point estimates at different wave may be misleading. It is therefore necessary to estimate the standard error for these differences in order to judge whether or not the observed differences are statistically significant.

Estimated standard errors and confidence intervals (based on normality assumption) for net changes in the AROPE between 2008 and 2013 are shown in Table 6. If a confidence interval does not include 0, it can be assumed the difference in the AROPE between 2008 and 2013 is statistically significant (at a given level of confidence).

Table 6: Estimated standard errors for estimators of net change in the AROPE between 2008 and 2013
Source: Eurostat

Tracing rules

EU-SILC is composed of two components – the cross-sectional and the longitudinal one. The main objective of the longitudinal component is to study changes over time at individual level, such as transitions from school to work and from work to retirement, flows into and out of economic activity and work and, above all, changes in the level of income and poverty of individuals and households. It should be noted that one of the most important EU-SILC indicators – “At persistent-risk-of poverty” is based on longitudinal component. In each country the longitudinal component of EU-SILC consists of one or more panels or subsamples (four subsamples in the recommended four-year rotational design). For each panel/ sub-sample, sample households and sample persons (see explanations in Table 7) representing the target population at the time of its selection are followed for a minimum period of four years on the basis of specific tracing rules.

Table 7: Definition of sample household, sample person and co-resident

The objective of the tracing rules is to reflect any changes in the target population drawn in the initial sample and to follow up individuals over time. In order to study changes over time at the individual level, all sample persons (members of the panel/subsample at the time of its selection) should be followed up over time, despite the fact that they may move to a new location during the life of the panel/subsample. However, in the EU-SILC implementation some restrictions are applied owing to cost and other practical reasons. Only those persons staying in one private household or moving from one to another in the national territory are followed up. Sample persons moving to a collective household or to an institution, moving to national territories not covered in the survey, or moving abroad (to a private household, collective household or institution, within or outside the EU), would normally not be traced. The only exception would be the continued tracing of those moving temporarily (for an actual or intended duration of less than six months) to a collective household or institution within the national territory covered, as they are still considered as household members.

The longitudinal sample must also remain representative of all age groups in the population. This means than in principle, persons of all ages should be followed up. However, in view of cost and other practical considerations, separate follow-up may be restricted to persons above a certain age. The minimum EU-SILC requirements are for a follow-up of individuals in the longitudinal sample for a period of four years. For panels of such short duration, it is acceptable (in view of cost and other practical reasons) to separately follow-up only persons aged 14 or over at the time of selection of the initial sample for a panel. The table 8 presents details on follow-up of sample persons, sample households and co-residents.

Table 8: Rules for the follow-up of sample persons, sample households and co-residents