Statistics Explained

Archive:Microdata linking in business statistics - introduction

This Statistics Explained article is outdated and has been archived - for recent articles on structural business statistics see here.

Authors: Pekka Alajääskö and Anton Roodhuijzen (Eurostat, Structural business statistics and global value chains). Planned article update: June 2019.

European business statistics compilers often face a dilemma: On the one hand, users and policy makers demand additional information on the structure and development of European enterprises. On the other hand, budget constraints and reluctance to increase the burden on survey respondents and national statistical institutes put tight restraints on the extension of data requirements. Microdata linking (MDL) can provide an opportunity to discover new information and to develop new statistics and indicators both when using existing data sets but also when combining with new data collections.

This article is the introductory article of an online publication on Microdata linking in business statistics.

Figure 1: Percentage of international sourcing for domestically and foreign-controlled enterprises
Figure 2: Employment (full-time equivalents), by sourcing status, 2011 (index 2008 = 100)
Table 1: R & D and engineering support function employment in foreign affiliates, by location, 2011
Figure 3: Enterprises reporting a decrease in employment, 2009-11
Table 2: Number of enterprises, value added and persons employed (FTE) broken down by type of SME, 2008 and 2012
Table 3: Export intensity, import intensity and trade openness broken down by size (SME/large) and type of enterprise (dependent/independent) in manufacturing, 2008 and 2012
Table 4: Number of international traders and domestic enterprises in manufacturing, 2008 and 2012
Source: Eurostat
Figure 4: Import shares by control in wholesale and retail trade, 2008 and 2012
Source: Eurostat
Figure 5: Survival and death rates from 2008 to 2012
Figure 6: Employment dynamics by export status, 2008-2012


Methodological approach

Eurostat, in close collaboration with National Statistical Institutes, has been conducting a number of MDL projects in recent years in response to user needs for more detailed and relevant business statistics i.e. information on performance, structure and demography of the enterprise population.

The approach used in European business statistics is the so called co-ordinated microdata linking or distributed microdata linking/research. This approach has been used in most business statistics related MDL projects. A typical co-ordinated microdata linking is carried out in separate phases:

The first phase involves the construction of the linked microdataset. The project coordinators produce standardised guidelines explaining in detail how the datasets in each participating country are to be structured and provide a common code to ensure that identical tables are made in all countries. Each country records information from all the data sources used in the project into its own national database. These linked microdatasets are stored locally at the national statistical institutes throughout the project and are not shared with third parties.

In the second phase of the project, the dataset is tested for consistency. Although each dataset being used in the project has already been carefully edited, it is necessary to carry out further checks to ensure, for example, that enterprises are represented by the same statistical units across different datasets and over time, as the reporting units used for specific enterprises can, and often do, differ across the data sources in each project. In fact in all business statistics projects many differences are found and corrected. Tests used in this phase of the projects are devised by the project coordinators and implemented locally by the national statistical institutes.

In the third phase of the project standardised statistical output is created in each country consisting of descriptive and longitudinal analysis. Sometimes more sophisticated statistical methods are used. Two examples of statistical analysis and findings can be found below in the external links.

When attempting to link two or more sets of data, two things are vital for a successful linking:

1. There has to be a unique identifier or at least a very reliable matching approach. Up to date National Statistical Business Registers (NSBRs) play a central role.

2. There has to be a large enough intersection of responding units.

This does not pose a major problem for register based data collection systems with exhaustive samples and extensive use of administrative data. However, micro data linking in sample based data collection systems is usually more problematic. The following section will shed some light on this issue.

Generalisation and aggregation of microdata to the total enterprise population

It is important to ensure that the linked microdatasets are extrapolated to the total population of enterprises in order to be able to generalise the results at the total population level. This is often a big challenge as linked microdatasets can miss many observations because some of the linked microdatasets are based on sample surveys. Other reasons for missing data are unit non-response, item non-response, inactive units and under-coverage of an administrative source, e.g. due to ineligibility of certain sub-populations or the use of thresholds. Some variables are completely observed, e.g. NACE activity code and size-class, as they are available for all statistical units in the NSBRs. But for most variables some values are missing, and often a variable is only observed for a small fraction of the total population. Datasets should therefore be accompanied by information on the reasons for missing data as well as information about the methods used to impute values for them. This is important in general for users of data but for microdata linking this information is essential. For example, Structural Business Statistics (SBS) are often surveys based on samples stratified with respect to economic activity and size-class. In this case linking SBS with other business statistics most of the missing data is due to the sampling design, some missing data is due to statistical unit non-response and some due to item non-response. Official SBS are obtained using survey weighting. For all responding enterprises (statistical units) weights are calculated. These design weights are subsequently adjusted to account for unit non-response. For this purpose in the case of SBS in addition to size-class and economic activity, number of persons employed and tax turnover information are often used as auxiliary variables. Missing information due to item non-response is usually imputed. The use of weights avoids biases in the estimates due to unequal sampling probabilities according to the sampling design and reduces non-response bias. When linking SBS data and variables with those from other sources, it is no longer evident that the original SBS weights can be used, since the set of statistical units for which all variables are jointly observed from all sources is a subset of the SBS responding enterprises in the original sample. The missing data pattern is very likely to be different, thus a new weighting or imputation strategy is needed. Sampling designs and other reasons for missing data vary between countries. Consequently the approaches taken and the variables to be added and retained to the linked micro data sets may to a certain extent be country-specific. Further information is available in the methodological report of the 2015 MDL project.

MDL in improvement of existing business statistics

Microdata linking can also be used to improve the quality of existing statistics. In 2013, under the umbrella of the ESSNet on Measuring Global Value Chains the NSIs of Denmark, Norway and Finland linked statistics on the activities of affiliates based abroad (foreign affiliates - OFATS) with statistics on foreign controlled enterprises resident in the compiling economy (IFATS). Since IFATS is mostly based on administrative (subset of SBS) data while OFATS information is collected by a survey, IFATS quality is generally assumed to be superior. The approach taken was to mirror IFATS and OFATS data sets between the countries, where control was exerted from an enterprise resident in one of the three countries and the foreign affiliate was located in another. In theory, this approach should have resulted in an identical set of affiliates in IFATS and OFATS; however the exercise showed that there are some discrepancies between the two statistics and gave important leads for the improvement of FATS data quality. More information can be found in their report.

Other business statistics MDL exercises

Apart from the SBS MDL exercises, TEC, and the FATS linking project described above, the ESS has engaged in several other business statistics related MDL exercises. There has been an ESSNet on Linking of Micro data on ICT Usage, where business registers were linked with SBS and ICT usage and e-commerce data. Finally there has been an ESSNet on data warehousing (DWH) and MDL which touched more on theoretical aspects.

MDL articles and preliminary statistical findings

Microdata linking - international sourcing

In the European statistical system one of the first initiatives to link microdata on enterprise level for business statistics was a project launched in 2010 linking the results of the survey on “international organisation and the sourcing of business functions” with structural business statistics and international trade in goods statistics. This was followed up with an exercise launched in late 2012 which broadened the scope to also include foreign affiliates statistics. The results of the 2012 project were analysed in Microdata linking - international sourcing.

Some statistical findings

  • Foreign-controlled enterprises are more active in international sourcing and trade (see Figure 1)
  • There seems to be a negative impact on employment due to International Sourcing between 2008 and 2011 (see Figure 2)

Foreign affiliates statistics - employment by business function

Foreign affiliates statistics - employment by business function investigates the employment record of foreign affiliates, by business function, of enterprises in 14 European Union (EU) Member States and Norway.

Some statistical findings

  • Employment in foreign affiliates of European enterprises is falling less than in domestic enterprises (see Figure 3)
  • There is no evidence of substantial movement of knowledge-intensive business functions to destinations outside Europe (Table 1)

Statistics on small and medium-sized enterprises - Dependent and independent SMEs and large enterprises

Small and medium-sized enterprises (SMEs) are a focal point in shaping enterprise policy in the European Union (EU). The European Commission considers SMEs and entrepreneurship as key to ensuring economic growth, innovation, job creation, and social integration in the EU. However, in official statistics SMEs can currently only be identified by employment size as enterprises with fewer than 250 persons employed. This is a big category and encompasses enterprises with different ownership structures and varying numbers of employees and levels of economic activity. To facilitate better analysis and understanding of the heterogeneity of SMEs, the 2014 (MDL) project linked data from structural business statistics (SBS), international trade in goods statistics (ITGS) and business registers (BRs). Statistics on small and medium-sized enterprises examines the statistical data from the MDL project, which produced linked datasets for analysing business structures and performance in a harmonised way, making cross-country comparisons possible. Compared with previous MDL projects, a new feature is the distinction between dependent SMEs (those belonging to an enterprise group) and independent SMEs.

Some statistical findings

  • Most enterprises are independent and do not belong to an enterprise group, but within the SMEs medium-sized enterprises are very often part of a group. This is most common in manufacturing and to a lesser degree in knowledge-intensive business services. (Table 2)
  • In most countries, dependent SMEs are more open to international trade than independent SMEs. In this regard dependent enterprises behave like large ones; they are also more exposed to shocks through international trade than independent SMEs (Table3)

Statistics comparing enterprises which trade internationally with those who do not

Economic globalisation and the participation of enterprises in international trade in goods are important drivers for economic growth. Evidence to demonstrate this is vital for designing policy. Research has shown that international traders differ substantially from domestic enterprises. Statistics comparing enterprises which trade internationally with those who do not splits the population into those taking part in international trade in goods — being an exporter, importer or two-way trader — and those who are active on domestic markets only. Most enterprises are two-way traders: very few enterprises export without importing and vice versa. Exporters (including two-way traders) are of special interest for policy makers because of their potential job creation due to demand from markets abroad. Importers (again including two way traders) are also important since they facilitate access to the raw materials, intermediate goods and technologies that are otherwise not easily available. Therefore the analysis includes both exporters and importers.

Some statistical findings

  • More than one in five (23 %) of all manufacturing enterprises in the eight participating countries are international traders. Denmark and Austria have the highest shares (around 40 %); Germany is a little above average (25 %) while other countries were below the average (Table 4).
  • Foreign-controlled wholesale and retail importers account for the highest shares of imports in most countries. In 2012, in Sweden, they accounted for 64 % of total imports of goods (see Figure 4).

Statistics on enterprise survival and growth prospects between 2008 and 2012

The 2015 European MDL project is a partnership between Eurostat and eight European countries creating linked microdatasets, which enables the analysis of trends for micro, small and medium — and large — enterprises. The novelty of these linked microdatasets is that they follow a group of enterprises that existed in 2008 over the time period 2008-2012. It therefore offers a dynamic view on how this period, which included the economic crisis, affected the survival and growth prospects of various types of enterprise. This 'longitudinal analysis' sheds more light on, for example, size-class mobility of small and medium-sized enterprises (SMEs) compared with large ones in various sectors of the business economy. Moreover, by combining this information with international trade in goods statistics, exporting and non-exporting enterprises can be distinguished.  

Some statistical findings

  • Survival rates are inversely related to enterprise size. On average — and across all countries — SMEs are characterised by a higher number of deaths, both in absolute and relative terms. In the SME category, micro and small enterprises have the lowest chances of survival (see Figure 5).
  • Service enterprises tend to be smaller and have a lower degree of mobility between size classes than enterprises active in industry (see Figure 6).

See also

External links