Statistics Explained

Interaction of household income, consumption and wealth - methodological issues

Experimental.png

This article highlights the methodological issues that arise when measuring the relationship between income, consumption and wealth at household level. It briefly explains the different methods that might be used to produce experimental statistics on income, consumption and wealth (ICW), focussing on the one option , 'statistical matching', which is applied to the data. The results are set out in another Statistics Explained article. Here the methodological aspects and the statistical limitations of the exercise are explained. Figure 1 illustrates which kind of figures this exercise aims at producing:

Figure 1: Distribution of households according to income and consumption deciles in Belgium 2014 (icw_sr_07)

Full article

How to measure the interaction between income, consumption and wealth?

Survey integration

'Survey integration' is the ex ante combination of different surveys. It involves getting the same (sub)sample of the population to fill in a number of different questionnaires. The main drawback with this approach is the sharp increase in the 'response burden', which may affect the response rate and ultimately the reliability of the collected data.

Multi-source approach

Instead of collecting all pieces of information through (face-to-face) questionnaires, the variables (in particular the economic ones) are gathered from different data sources, such as tax/social security registers, loyalty cards' data, etc. The integration between the survey sample and the external sources can be performed on the basis of identifiers present in both or through 'record linkage' (i.e. using information that makes it possible to identify individuals present in both data sources). While this approach alleviates the response burden, there is often an imperfect match between the data and the concepts used in the survey. There is, therefore, a trade-off between response burden and data accuracy.

Statistical matching and modelling

Another approach is to use the variables common to different datasets in order to merge different samples. Such 'statistical matching' techniques assume that individuals who turn out to be similar with respect to the common variables are also similar with respect to the variables of interest. Statistical matching is also a form of model-based imputation, so there is a scope for using a number of imputation techniques (such as regression models). However, statistical matching should be regarded as a 'second-best' solution, as it cannot capture additional correlation to the link measured using the set of common variables (which in practice often turns out to be quite limited).

Modular approach

The modular approach can be seen as a combination of the options described above. It consists of a data source containing detailed information on one component of ICW along with a limited selection of variables on the other two. The latter are collected through a module which may apply to the full sample or a subsample of respondents. The modular approach offers a high degree of flexibility in questionnaire design. It is thus possible to reduce the 'response burden' effect while gathering some basic information about additional dimensions. However, this information is usually less accurate than data collected through a full survey since the modules generally involve proxies and less detailed questions. To compensate for this, statistical matching methods may be applied to impute the more accurate data from an external survey into the dataset using the proxies collected through the module as 'hook' variables. These hook variables significantly increase the quality of the matching, by reducing the uncertainty implicit to the matching methods.

An experiment: statistical matching of EU-SILC, HBS and HFCS data

With the objective of producing a joint distribution of disposable income and consumption expenditure data, which would enable the study of the interaction between income and consumption at household level, a statistical matching to EU-SILC and HBS data is applied. The Household Budget Survey already includes an income variable, but it is conceptually not harmonised across EU countries and not accurate enough for analysing the link between income and consumption. There is also an experiment with matching the SILC-HBS fused data with data from the Household Finance and Consumption Survey (HFCS). This provides a dataset containing micro-level information on the three ICW dimensions: income, consumption and wealth, which are defined here as follows:

income - the primary measure of income used is annual household disposable income (EU-SILC variable HY020), which includes all monetary income received by any member of the household (for example income from employment (and company cars), income from self-employment, social benefits, property income) and income received at household level, but deducting taxes and social contributions paid. It includes the balance between current transfers received and paid. It excludes the value of goods produced for own consumption and net owner-occupied housing services (in other words, imputed rents).

consumption - since the real consumption of households cannot be measured directly, annual monetary household consumption expenditure (HBS variable CP00) is used as the nearest proxy. It includes all expenditures that are not direct investments into pensions, life insurance policies, real estate or other forms of gross capital formation. We exclude imputed rents (CP042) and other non-monetary expenditures to ensure consistency with the income concept.

wealth - household net wealth includes financial and non-financial assets net of liabilities (HFCS variable DN3001), excluding occupational pension schemes and government social security schemes. ‘Liquid financial wealth’ has been calculated as the sum of the values of deposits, mutual funds, bonds, publicly traded shares and managed accounts.


Comparability issue

Comparability across surveys is critical for the quality of the matching. The surveys to be matched should be synchronised so that they relate to the same reference period and target population. Since HBS is conducted every five years and the reference period may vary across countries (see Table 1), the annual EU-SILC has to be aligned with HBS and the EU-SILC wave corresponding to the HBS reference period has to be taken. For joining HFCS data the HFCS survey year closest to HBS is used.

Table 1: Reference years for HBS 2010, 2015 and 2020 waves and HFCS 2010, 2014 (2017) and 2021 waves

For matching consumption and wealth to income at household level (looking for a household in HBS, and HFCS, that is similar to another in EU-SILC), common variables are defined consistently across surveys. Many of these variables, like age, education level etc., are based on the household 'reference person', for which the 'Canberra definition'[1] is applied (for details see Annex A of this document). Then, the distribution of variables common to HBS (HFCS) and EU-SILC are compared in order to determine which variables can be used for merging information from the two surveys at household level. A specific metric (the Hellinger distance) is used to gauge the comparability of the original HBS and EU-SILC distributions for the common categorical variables. The list of possible matching variables is limited to those that turn out to be evenly distributed across surveys for most of the countries: density level of the population, household size, household type, age of the reference person, level of education, activity status, occupation status, tenure status of the household and main source of income.

Random hot-deck

The selection of a donor in HBS for a receiver in EU-SILC is quite simple and conservative: From the set of comparable variables (see previous section), those with the highest explanatory power of the target variables (total consumption, total disposable income, gross income and food consumption quintile) are used to stratify the households of both surveys. These matching variables may vary from country to country according to varying consumption behaviours, but the HBS income variable is always used to rank and split the population into 20 equal segments. Implicitly, it is assumed that, despite the measurement errors that affect the HBS income variable, it correctly ranks households according to the true income distribution.

An HBS donor household is then randomly selected for each EU-SILC receiver from among the suitable matches (same stratum). Subsequently, the distribution of "donated" consumption data resulting from this 'random hot-deck' procedure is compared with the original distribution of consumption in HBS. As shown in Figure 2, data from the fused dataset remain quite consistent with the a priori information from the HBS survey.

Figure 2: Distribution of total consumption over all households (Belgium 2014). Blue line: original HBS data. Red line: HBS consumption matched to EU-SILC. Source: Eurostat.

Consistency and reweighting

This leaves the question of how to process the available data on consumption in a consistent manner. As the receivers come from EU-SILC, the EU-SILC sample is used to compute indicators dealing with consumption expenditures. Because of sampling uncertainty, these indicators are not necessarily fully consistent with a similar calculation using HBS data. In order to ensure the consistency of key income and consumption indicators, a normal calibration on EU-SILC individual weights is applied such as to ensure minimal consistency between EU-SILC, HBS and the fused dataset obtained through random hot-deck. The key indicators used for calibration are the following:

Rank hot-deck

Once total consumption expenditure from HBS has been successfully merged with EU-SILC data and weights have been re-calibrated, a similar procedure is used to join total assets and net wealth from the HFCS to the joint EU-SILC-HBS data set. This time both data sets are stratified according to the household type, the tenure status and the food consumption quintile. Then, the gross income variable available in HFCS and in EU-SILC is used to rank the households and find best possible matches. Figure 3 shows the distribution of total assets in the original HFCS data[2] as compared to the matched ICW dataset for Belgium, 2014.

Figure 3: Distribution of total assets over all households (Belgium 2014). Blue line: original HFCS data. Red line: HFCS total assets matched to EU-SILC. Source: Eurostat.

Uncertainty

Since the matching procedure is similar to imputation, the hot-deck is replicated many times in order to assess the level of uncertainty. However, this does not affect the main underlying assumption, i.e. the 'Conditional Independence Assumption' (CIA): when performing hot-deck, it is assumed that the link between income and consumption is entirely explained and described by the matching variables. This assumption can be challenged. There are methods for relaxing the assumption and computing ranges of plausible values for a given indicator. These ranges turn out to be quite large, confirming the important role of the assumption in the data production.

Limitations

The CIA is of course a significant limitation and may have a strong influence on the final results, even though using income as a categorical variable in the list of matching variables makes it more reliable. Indeed, if the proxy income variable from HBS is not used in the matching, results become very unreliable.

The CIA is also the reason why a comparison of matched ICW statistics with national accounts indicators should not be undertaken at this stage, since it adds yet another level of uncertainty. Already when comparing original EU-SILC and HBS data aggregates with household statistics from national accounts, differences are fairly large for some countries. Figure 4 shows such a comparison for aggregate saving rates: Saving rates originating from aggregate EU-SILC and HBS micro-data and those from national accounts are inconsistent for most countries. Also, it seems there is little or no correlation between the two measures, indicating measurement inconsistencies that vary across countries.

Figure 4: Aggregate household saving rates according to surveys and national accounts 'around 2015' (icw_sr_08) and (nasa_10_ki)

The latter underlines the need to close the gap between micro and macro data. Eurostat and some National Statistical Institutes are trying to improve the consistency of the data from different sources, documenting conceptual differences, data discrepancies for income, consumption and wealth and experimenting with methods for closing the gap. More information about this ongoing work can be found on the dedicated section Income and consumption: social surveys and national accounts.


Feedback

To help Eurostat improve these experimental statistics, users and researchers are kindly invited to give us their feedback by email

Data sources and availability

Eurostat's experimental income, consumption and wealth statistics are based on the statistical matching of EU statistics on income and living conditions (EU-SILC) for income, the Household Budget Survey (HBS) data for consumption and Household Finance and Consumption Survey (HFCS) data for wealth.

The annual collection of EU-SILC was launched in 2003 and is governed by Regulation 1700/2019 (previously: Regulation 1177/2003) of the European Parliament and of the Council. Household disposable income is established by adding up all monetary incomes received from any source by all members of the household (including income from work, investment and social benefits) — plus income received at household level — and deducting taxes and social contributions paid. In order to reflect differences in household size and composition, this total is divided by the number of ‘equivalent adults’ using a standard equivalence scale, the so-called ‘modified OECD’ scale, which attributes a weight of 1.0 to the first adult in the household, 0.5 to each subsequent member of the household aged 14 and over, and 0.3 to household members aged less than 14. The resulting figure ('equivalised disposable income') is attributed to each member of the household.

The Household Budget Survey (HBS) is a survey conducted every 5 years on the basis of a gentlemen's agreement between Eurostat, the Member States and the EFTA countries. Data are collected using national questionnaires and, in most cases, expenditure diaries that respondents are asked to keep over a certain period of time. The last two waves were collected 'around 2010' and 'around 2015'. Consumption is described according to the Classification of individual consumption by purpose (COICOP) for each household. Total consumption is obtained by adding up all COICOP items and (as with income) this total is divided by the number of 'equivalent adults' using the same modified OECD scale. The resulting figures are used to compute equivalised expenditures, which are attributed to each member of the household, in order to compute the 'low levels of expenditure' indicator. Results for Italy are irreliable due to the lack of an income proxy variable in HBS and have thus not been published.

Information on assets and liabilities is from the Household Finance and Consumption Survey (HFCS), in particular the first and second wave conducted in 2010 and 2014. The HFCS is run by National Central Banks and coordinated by the European Central Bank. It collects information on assets, liabilities, and to a limited extent income and consumption, of households. The second wave of the survey is based on 84 000 interviews conducted in 18 euro area countries, as well as Poland and Hungary, mainly in 2013 and 2014.

Context

In order to support its agenda for social fairness and a good balance between economic and social goals, the European Commission has stressed the need to bring social indicators up to a par with macroeconomic indicators within the EU's reinforced macroeconomic governance. To this end, it is important to ensure the availability of harmonised statistics at EU level that cover the distributional aspects of household income, consumption and wealth (ICW).

In September 2016, the Directorates General of the National Statistical Institutes (DGINS) conference[3] in Vienna stressed the importance of ICW statistics shedding light on people's material well-being and on inequality. The conference concluded that there was a need for a harmonised statistical framework on ICW based on a multi-source approach integrating existing sources of data (EU-SILC, Household Budget Survey (HBS) and the Household Finance and Consumption Survey (HFCS). These data are the first outcome of a data integration effort that will be pursued and improved in the coming years.

In the meantime, Eurostat has launched a section on its website dedicated to the dissemination of experimental statistics. These statistics use new data sources and methods in an effort to expand and improve Eurostat's response to its users' needs. Since the statistics presented in this article come from experimental data processing and are based on statistical assumptions, they belong to this section until they reach a sufficient level of maturity.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations




Notes

  1. See the See the final report and recommendations of the expert group on household income statistics (the ‘Canberra group’), 2001.
  2. The Eurosystem Household Finance and Consumption Survey (HFCS) is run by the National Central Banks of the Euro area and coordinated by the European Central Bank. The results published in this article and the related observations and analysis may not correspond to results or analysis of the data producers.
  3. The DGINS conference is held once a year and aimed at gathering the Directors General of the National Statistical Institutes so as to discuss topics related to the statistical programme. For more details, please have a look here.