Statistics Explained

Archive:Sample size and non-response - quarterly statistics

This Statistics Explained article has been archived - for recent articles on labour market see here.



Data extracted in March 2022

Planned article update: February 2023

Highlights


In the fourth quarter of 2020, 72.5 % of the EU-LFS interviews were done by CATI (Computer Assisted Telephone Interviewing), that is 20 p.p. more compared to the pre-pandemic situation.
The negative impact of the COVID-19 crisis in terms of number of EU-LFS interviews was smaller in Q4 2020 than in the other quarters of 2020.
Portugal and Lithuania were the most affected countries by the COVID-19 crisis in terms of EU-LFS average weekly sample size in the second quarter and third quarter of 2020 respectively.
Unit non-response rate at EU level and for the set of all EU-LFS participating countries, 2011-2019 and Q1,Q2,Q3,Q4 2020 (%)
Note: 2011-2017 not available for Montenegro and Serbia, 2019 and Q1-Q4 2020 not available for Iceland.
Source: Eurostat (Annual quality reports and quarterly accuracy reports)

The global spread of Sars-CoV-2 at the beginning of 2020 and the resulting COVID-19 pandemic have had a lasting impact on large parts of public and private life. Due to many government-imposed restrictions to contain the pandemic, the working lives of some citizens have changed enormously in the European Union (EU). At times, kindergartens, schools, shops and businesses were closed in the EU Member States, and employees worked from home.

This health crisis has affected the European Union Labour Force Survey (EU-LFS) in two ways. Firstly, the working reality of many people has changed extensively, so that the EU-LFS data collected during the pandemic may show significant differences from the previous year's figures. Secondly, this specific situation may also have introduced additional random and systematic errors in the EU-LFS data.

In this context, this article assesses the quality of the data gathered through the EU-LFS for the EU as a whole, for each EU Member State individually, as well as for three EFTA countries (Iceland, Norway and Switzerland) and four candidate countries (Montenegro, North Macedonia, Serbia and Turkey). The analysis is based on data and information available for the four quarters of 2020, that is mainly compared with data for the same quarters of 2019 (for sample size and sampling errors) in order to avoid comparability issues and bias due to seasonality. Regarding unit non-response, a comparison with annual averages for previous years is also included. Please note that this procedure is legitimate since unit non-response is usually not particularly affected by seasonality and therefore quite stable throughout the year.

This article is part of the online publication Labour market in the light of the COVID-19 pandemic - quarterly statistics.


Full article


In the fourth quarter of 2020, the unit non-response rate of the EU-LFS was equal to 31 %

The unit non-response occurs when no data are collected about a population unit (usually a person or a household) designated for data collection. Consequently, the unit non-response rate is the ratio of the number of units for which data has not been collected to the total number of units designated for data collection (sampling frame).

Since the beginning of the time series (2011), the unit non-response rate gradually increased at EU level (see Figure 1). The same conclusion can be drawn when all countries carrying out the European Labour Force Survey (EU-LFS) are considered (i.e. when EFTA countries and candidate countries are also included).

However, between the year 2019 and the second quarter of 2020, the unit non-response rate sharply increased at EU level due to the lockdown measures adopted by most Member States to cope with the COVID-19 pandemic: it rose from 24.7 % in 2019 to 33.7 % in Q1 2020 and 34.6 % in Q2 2020, which corresponds to a difference of almost ten percentage points (p.p.) between the annual average for 2019 and the value for Q1 and Q2 2020. Face-to-face (CAPI - Computer Assisted Personal Interviewing and PAPI - Paper and Pencil Interviewing) data collection methods have been stopped because of the health crisis and replaced as much as possible by remote collection methods (CATI - Computer Assisted Telephone Interviewing or CAWI - Computer Assisted Web Interviewing). Unit non-response mainly increased due to phone numbers or email addresses that were not always immediately available.

In the last two quarters of 2020, containment measures were partly lifted in Member States so that the data collection returned more or less to its usual methodology. Non-response decreased to 32.7 % in Q3 2020 and to 31.1 % in Q4 2020, which was still more than 6 p.p. higher than in 2019.

Figure 1: Unit non-response rate at EU level and for the set of all EU-LFS participating countries, 2011-2019 and Q1,Q2,Q3,Q4 2020 (%)
Note: 2011-2017 not available for Montenegro and Serbia, 2019 and Q1-Q4 2020 not available for Iceland.
Source: Eurostat (Annual quality reports and quarterly accuracy reports)


Even if a significant increase in the unit non-response rate between the year 2019 and all quarters of 2020 can be seen at EU level, variations exist across countries. In Bulgaria, Germany, France, Latvia, Hungary, Portugal and Slovenia, the unit non-response rate increased by more than 10 p.p. in Q2 2020 compared with the average of 2019 (see Figure 2). In other countries, the rise between 2019 and Q2 2020 in the unit non-response rate was smaller, even though still significant (Belgium, Czechia, Ireland, Greece, Spain, Italy, Lithuania and Romania as well as Serbia and Turkey). Afterwards, from Q2 to Q3 2020, the unit non-response rate raised in Estonia, Greece, Malta, Portugal, Slovakia and Finland as well as in Montenegro and Turkey. A further increase in Q4 2020 was registered for Greece, Malta, Portugal and Slovakia. By contrast, it started to decrease from Q2 to Q3 2020 in Bulgaria, Czechia, Germany, Spain, France, Italy, Latvia, Lithuania, Hungary, Romania and Slovenia as well as in North Macedonia and Serbia, and continued to decrease in Q4 2020 in Bulgaria, Germany, Spain, Italy and Lithuania as well as in Serbia. Please note that Austria was the only country with a continuous decrease of the unit non-response rate between 2019 and Q4 2020. Some countries, as Denmark, Croatia, Luxembourg, the Netherlands and Sweden as well as Norway and Switzerland, seem not to be impacted by the pandemic with regard to the LFS non-response in 2020, as their response rate did not record significant increase or decrease in any quarter.

The highest value of the unit non-response rate for the year 2019 was recorded in Ireland (51.0 %), followed by the Netherlands (50.7 %). Ireland was also the EU Member State with the highest non-response rate in the whole 2020 (54.7 % in Q1 2020, 59.5 % in Q2 2020, 60.2 % in Q3 2020 and 60.3 % in Q3 2020).

Germany is a special case, scoring a unit non-response rate of around 50 % in the first three quarters of 2020 (54 % in Q1, 55 % in Q2 and 45 % in Q3) while having a much lower rate (i.e. 5.8 %) in 2019. The situation improved in Q4 2020 with a non-response rate that scored 33 %. This large difference between the 2020 and 2019 rates was partly due to technical issues related to the introduction of a new system of integrated household surveys, and it is unfortunately not possible to separate the effect of the COVID-19 pandemic from the one of the change in the methodology. Also, results for Poland as regards the unit non-response rate should be treated with caution given that the structure of the sample changed between Q1 and Q2 2020.

Countries that reported the sharpest increases between 2019 and 2020 in their unit non-response rate were using (fully or partly) face-to-face interviewing techniques (CAPI and PAPI) for the EU-LFS data collection. By contrast, countries relying exclusively on CATI or CAWI techniques did not observe a large increase in their unit non-response rate between 2019 and 2020 due to the COVID-19 crisis, such examples are Denmark, Finland, Sweden as well as Switzerland.

Figure 2: Unit non-response rate by country, 2019 and Q1, Q2, Q3, Q4 2020 (%)
Note: Data not available for Iceland.
Source: Eurostat (Annual quality reports and quarterly accuracy reports)


Starting from 2020, Eurostat collects quarterly data on survey mode used by countries to carry out the EU-LFS, while previously data on survey mode was only sent by countries on an annual basis. This quarterly information is available for all EU Member States except for Czechia, Denmark, Germany, France, Lithuania, the Netherlands, Romania, Slovenia and Finland. The impact of the pandemic on the EU-LFS data collection is presented in Figure 3, which contains information on the distribution of interviews by survey mode. In 2020, remote interviewing modes (CATI in particular) increased over the whole year; CATI scored on average almost 20 p.p. higher compared with the previous years. Also, CAWI interviews amounted to 3.1 % and 3.0 % in Q2 2020 and Q3 2020 respectively, whereas they only reached 1.0 % in 2019. Note that in Q4 2020 CAWI interviews decreased back to 1.8 %. The “other” methods reached a peak of 8.8 % in Q2 2020; this category consists most of the time in interviews copied from previous waves (mainly for people outside the labour force or aged 75 years or more) but it did not exist as a separate category until 2019.

Figure 3: Data collection by mode at EU level in 2017, 2018, 2019 and Q1-Q4 2020 (%, annual data from 2017 to 2019 and quarterly data for 2020)
Note: Data not available for Czechia, Denmark, Germany, France, Lithuania, the Netherlands, Romania, Slovenia and Finland.
Source: Eurostat (Annual quality reports and own calculations based on microdata)


The Q4 2020 achieved sample size corresponded to 98.7 % of the Q4 2019 one

At EU level, the achieved sample size in 2020 was generally lower than in 2019. Comparing the number of people interviewed in Q4 2020 with Q4 2019, a ratio of 98.7 % is obtained at EU level (see Figure 4). In the other three quarters, the ratio was equal to 93.7 %, 96.4 % and 94.4 % respectively. The impact of the COVID-19 crisis in terms of number of EU-LFS interviews was consequently a bit smaller in Q4 than in the other quarters of 2020. Nevertheless, some differences can be observed at country level.

In the first quarter of 2020, at the beginning of the pandemic, almost all countries reported a reduction in their achieved sample size compared with the same quarter of the previous year. Only five EU Member States (Belgium, Estonia, Malta, Austria and Sweden) had a bigger quarterly sample in Q1 2020 compared with Q1 2019. For the second quarter, all EU countries but six (Estonia, Luxembourg, Hungary, Austria, Poland and Sweden) had a smaller sample in Q2 2020 than in Q2 2019. Regarding the third quarter, all EU Member States except nine (Estonia, Croatia, Cyprus, Malta, Austria, Poland, Slovenia, Slovakia and Sweden) recorded a smaller sample in 2020 than in 2019. Finally, in the fourth quarter, all EU Member States recorded a smaller sample in 2020 with the exception of ten countries (Estonia, Croatia, Luxembourg, Hungary, Malta, the Netherlands, Austria, Poland, Slovenia and Sweden).

Sweden, Estonia and Austria were consequently the only countries with a larger average weekly sample in 2020 than in 2019 in all quarters.

The situation was particularly critical in Portugal, Ireland and Greece in terms of number of EU-LFS interviews in Q4 2020. Indeed, the ratio of the sample size in 2020 compared to 2019 was less than 80 % in Portugal and around 85 % in Ireland and Greece.

By contrast, with an increase of the sample size of more than 10 p.p. between Q3 and Q4 2020, the situation improved a lot in Bulgaria, Denmark, Latvia, Lithuania, Luxembourg, Hungary and Poland.

Figure 4: Quarterly sample size in Q1-Q4 2020 compared to same quarters of 2019 (%)
Note: Data for Germany and France is not included due to change in survey methodology.
Source: Eurostat (own calculations)


Slight increase in sampling errors for the first two quarters of 2020

The estimates produced by the EU-LFS are subjected to specific precision requirements specified in the Council Regulation (EC) No 577/1998, which establishes the organisation of the survey. In particular, specific requirements are defined concerning the reliability of estimates on the number of employed and unemployed persons. The coefficient of variation (CV) for the employed and unemployed population, as an indicator of the precision of the EU-LFS estimates, is transmitted by countries to Eurostat on a quarterly basis.

As can be seen from Figure 5, the COVID-19 crisis also had an impact on the precision of the measurement of employment and unemployment. Thirteen EU Member States (Bulgaria, Czechia, Ireland, Greece, Spain, Croatia, Italy, Lithuania, Austria, Romania, Slovakia, Finland and Sweden) reported a deterioration in the accuracy of the estimates for both employment and unemployment between the first quarter of 2019 and the first quarter of 2020 due to the increase of the coefficient of variation. Bulgaria and Sweden corresponded to the biggest deteriorations, with an increase of more than 20 % for the coefficient of variation for both the employment and unemployment estimates. In addition, a further seven EU countries (Belgium, Cyprus, Latvia, Malta, the Netherlands, Portugal and Slovenia) showed deterioration only for unemployment figures and one (Poland) for employment. This deterioration is linked to the continuous increase in the unit non-response (already visible in previous years), which reduces the achieved sample size, especially in the specific segment of the population attached to the labour market, and also to an additional increase in greater intensity coming from the COVID outbreak. By contrast, Germany, Estonia, Luxembourg and Hungary showed an amelioration in the accuracy of the estimates for both employment and unemployment.

The situation slightly improved in Q2 2020, where ten EU Member States had less precision for both indicators compared with Q2 2019, and another ten EU countries for employment estimates only.

In Q3 2020, only four EU countries (Greece, Spain, Italy and Portugal) showed a fall in the precision for both estimates and thirteen (Belgium, Czechia, Ireland, Croatia, Cyprus, Luxembourg, Hungary, Malta, the Netherlands, Romania, Slovenia, Slovakia and Finland) for employment only. Consequently, 22 EU Member States saw the quality of their unemployment estimates improving in Q3 2020 compared with Q3 2019.

Finally, in Q4 2020, five EU countries (Ireland, Greece, Spain, Italy and Portugal) still showed a worse precision for both estimates of employment and unemployment compared with Q4 2019. By contrast, seven Member States (Germany, Latvia, Lithuania, Hungary, Poland, Slovenia and Sweden) recorded an improvement for both indicators between Q4 2019 and Q4 2020.

Figure 5: Percentage change in sampling errors for employed and unemployed persons by country for Q1-Q4 2020 (%) compared with the same quarters of 2019
Note: Data not available for Iceland.
Source: Eurostat (Quarterly accuracy reports)


Data sources

All figures in this article are based on quarterly results from the European Union Labour Force Survey (EU-LFS).

Source: The European Union Labour Force Survey (EU-LFS) is the largest European household sample survey providing quarterly and annual results on labour participation of people aged 15 and over as well as on persons outside the labour force. It covers residents in private households. Conscripts in military or community service are not included in the results. The EU-LFS is based on the same target populations and uses the same definitions in all countries, which means that the results are comparable between countries.

European aggregates: EU refers to the sum of EU-27 Member States. If data is unavailable for a country, the calculation of the corresponding aggregates takes into account the data for the same country for the most recent period available. Such cases are indicated.

Country note: In Germany, from the first quarter of 2020 onwards, the Labour Force Survey is part of a new system of integrated household surveys. Unfortunately, technical issues and the COVID-19 crisis has had a large impact on data collection processes, resulting in low response rates and a biased sample. For more information, see here.

Definitions: The concepts and definitions used in the EU-LFS follow the guidelines of the International Labour Organisation.

Data collection techniques: The different kinds of data collection techniques used in the EU-LFS are the following:

1) PAPI (Paper and Pencil Interviewing): PAPI is a face-to-face interviewing technique in which the interviewer enters the responses into a paper questionnaire. If no interviewer is present and respondents enter the answers themselves it is considered a self-administered questionnaire.

2) CAPI (Computer Assisted Personal Interviewing): CAPI is a face-to-face interviewing technique in which the interviewer uses a computer to administer the questionnaire. Responses are directly entered into the application, and control and editing can be directly performed.

3) CATI (Computer Assisted Telephone Interviewing): CATI is a telephone surveying technique in which the interviewer follows a questionnaire displayed on a screen. Responses are directly entered into the application. It is a structured system of interviewing that speeds up the collection, control and editing of information collected.

4) CAWI (Computer Assisted Web Interviewing): CAWI is an Internet surveying technique in which respondents follow a questionnaire provided on a website and enter the responses into the application themselves

Different articles on detailed technical and methodological information are linked from the overview page of the online publication EU Labour Force Survey.

Methodological note on the EU-LFS survey errors due to the COVID-19 pandemic:

The classical test theory provides a mathematical-statistical measurement model that links the theoretical construct (the attribute to be measured) with the measurement instrument (value measured by indicator/ item). It assumes that hypothetically there is a ‘true’ value of a person (e.g. the number of working hours). However, the value measured in the EU-LFS or the response reaction in the interview can be distorted by random and systematic errors, and consequently does not correspond sometimes to the actual attribute of the respondent.

In methodological research, random errors are assumed to be rather unproblematic, since they balance each other out if the number of measurements (or respondents) is sufficiently large. Since systematic errors are all biased in the same direction, they cannot be compensated for and, accordingly, pose a serious problem for survey research. For this reason, since the establishment of survey methodology, there have also been research approaches that deal with the measurement of characteristics and the errors that occur in the process.

In the concept of the Total Survey Error (TSE), all potential causes of errors in surveys were considered in one model. It describes and examines errors that can occur in the planning and implementation of the interview. Thus, the causes of bias (systematic errors) and variance (random errors) of the data in surveys are considered in one model. Usually, types of errors are divided into observational errors (measurement errors) and non-observational errors (errors of representation). The latter are a result from mistakes made by defining the inferential and target population and the sampling frame (coverage error), mistakes in the process of sampling households or individuals (sampling error) and errors due to the fact that not all sampled units are interviewed (non-response error). Observational errors regard mistakes made in the process of defining the construct that is to be measured (validity), the measurement process (measurement errors, due to characteristics of the measurement instrument, e.g. survey mode or questionnaire; the respondent, the interviewer or the interview situation) and the lack of compliance between the response given by the interviewee and the value recorded in the edited dataset (processing errors).

Because of the many restrictions implemented by their government, changes to the survey mode have been implemented by some EU Member States. Changes in the data collection mode is always a sensitive issue, especially for research done with panel data or time series. Although, most impact can be expected for non-factual questions (e.g. attitude items), the EU-LFS data could be biased as well. Every mode has very specific characteristics (e.g. interviewer present, visual or audial perception, cognitive burden, support and motivation opportunities, social desirability) that can result in different errors (sampling and non-sampling errors). Moreover, a change of the mode requires the programming or preparing of the questionnaire in another IT system. Because the pandemic came rather unexpected, there might not have been much time for pre-testing the survey in the new system, which could be another possible error source (e.g. wrong filtering). In addition, sampling errors could occur because certain people typically participate in specific modes. Thus, sampling errors might not only be related to obvious, recorded socio-economic characteristics of the respondents like sex, age and education, but also to characteristics not measured (e.g. personality traits, attitudes).

As can be expected, the COVID-19 crisis mostly affected countries carrying out face-to-face interviews. Data showed that countries already using only telephone (CATI) or telephone and online interviews (CATI and CAWI) were not affected in terms of an obvious mode impact (e.g. Denmark, Luxembourg, Finland). In contrast, countries who had to change the data collection from face-to-face modes (CAPI or PAPI with interviewer) to telephone or self-administered modes (CATI, CAWI or postal PAPI without interviewer) could have measurement errors in their EU-LFS 2020 data. When changing a mode from an interviewer-administered survey (CAPI or PAPI) to a self-administered one (PAPI or CAWI), the absence of the interviewer, who usually helps, guides and motivates the respondents, could cause errors. For example, the filter path could be done incorrectly, questions could be misunderstood, and selected units might not want to participate or might not answer all questions (unit and item non-response rates). Changes from face-to-face interviewing (CAPI or PAPI) to telephone interviewing (CATI) could also have an impact on the quality of the data. Although there is still an interviewer that leads the respondent through the survey, the absence of visual aids might add cognitive burdens for the interviewees, which can result in measurement errors. Lower motivation, invalid answers for proxy interview questions, higher unit and item non-response rates might have occurred. A change of the survey mode might also imply that, in case of closure of the CATI call centers, the interviews are conducted by the interviewers of the CAPI network in telephone mode, if telephone numbers are available, using their personal telephone and the CAPI software currently installed on the interviewers’ laptops and this would be also an additional source of bias. The higher ‘distance’ of the interviewer and the respondent in CATI compared with a face-to-face interview can have positive and negative effects. Households or people participating for the first time (1st wave) could have less trust in the seriousness and assured anonymity of the survey and thus have given less honest answers. Because of the lack of experience with the questionnaire or the questions and since the interviewer could not help and motivate as much as when being there in person, respondents might have given less reliable or valid answers. On the other hand, because of the greater ‘distance’, the social desirability bias, where respondents give answers that might please or impress the interviewer, could have been less pronounced.

Besides the described possible changes of data collection modes and the possible impact on the data quality, some countries might not have had the opportunity to change their face-to-face mode – at least at the beginning of the crisis. Therefore, there might have been a (short) period, were no data collection was possible due to government restrictions and the time needed to prepare to switch to another survey mode. Moreover, people might not have wanted to participate in the survey (if voluntary) because of the risk of infection. This could have increased non-response rates. Countries using CATI might not have needed to change their survey mode but could nevertheless have been affected by restrictions caused by the pandemic. For example, call centers might have had to close at least for a certain period, which could have led to missing data.

In conclusion, besides all the mentioned possible negative impact that the pandemic might have (had) on the quality of the EU-LFS data, at least one more potential positive effect of the pandemic should be mentioned. Because of the travel restrictions and the increasing number of people working remotely from home in many participating countries of the EU-LFS, people selected for the survey were more likely to be at home. This could have led to less proxy interviews and thus less inadequate or unreliable answers from the proxies.

Context

The COVID-19 health crisis hit Europe in January and February 2020, with the first cases confirmed in Spain, France and Italy. COVID-19 infections have been diagnosed since then in all European Union (EU) Member States. To fight the pandemic, EU Member States have taken a wide variety of measures. From the second week of March 2020, most countries closed retail shops, with the exception of supermarkets, pharmacies and banks. Bars, restaurants and hotels were also closed. In Italy and Spain, non-essential production was stopped and several countries imposed regional or even national lock-down measures which further stifled the economic activities in many areas. In addition, schools were closed, public events were cancelled and private gatherings (with numbers of persons varying from 2 to 50) were banned in most Member States.

The majority of the preventive measures were initially introduced during mid-March 2020. Consequently, the first quarter of 2020 was the first quarter in which the labour market across the EU was affected by COVID-19 measures taken by the Member States.

In the following quarters of 2020 and 2021, the preventive measures against the pandemic were continuously lightened and re-enforced in accordance with the number of new cases of the disease. New waves of the pandemic began to appear regularly (e.g. peaks in October-November 2020 and March-April 2021). Furthermore, new strains of the virus with increased transmissibility emerged in late 2020, which further alarmed the health authorities. Nonetheless, as massive vaccination campaigns started all around the world in 2021, people began to anticipate improvement of the situation regarding the COVID-19 pandemic.

Please note that in this exceptional context of the COVID-19 pandemic, employment and unemployment as defined by the International Labour Organisation (ILO) might not be sufficient to describe the developments taking place in the labour market. In the first phase of the crisis, active measures to contain employment losses led to absences from work rather than dismissals, and individuals could not look for work or were not available due to the containment measures, thus not counting as unemployed. Only referring to unemployment might consequently underestimate the entire unmet demand for employment, also called the labour market slack, which is further analysed, with namely the evolution of the employment and the recent job starters, in the publication Labour market in the light of the COVID-19 pandemic.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations