Statistics Explained

Merging statistics and geospatial information, 2018 projects - Denmark


Optimising, opening and visualising geospatial data with Danish sustainable development goals; 2018 project; final report 30 June 2021

DK GG2023.jpg


This article forms part of Eurostat’s statistical report on the Integration of statistical and geospatial information.

Full article

Problem

Statistics Denmark aims to develop statistics for 10 specific indicators on sustainable development goals (SDGs) that may be augmented with geospatial data. The project seeks to explore different methods for producing data on subnational administrative units, both for existing grid systems and for user-defined areas, as well geospatial statistical methods for modelling underlying point data in continuous space.

Objectives

  • Develop a workplace match that links addresses from the statistical business register to actual properties (for example, from the buildings register, the dwellings register and the cadastral register).
  • Examine which statistical model can be used to make the best predictions for including additional buildings/units as part of the workplace.
  • Implement a generic model for balancing open data and statistical disclosure control (including software developments).
  • Develop interactive tools for visualising and disseminating geospatial statistics using open source software (R), including:
    • point data aggregated to small administrative units (church parishes, municipalities);
    • local estimates using generalised additive models (GAM);
    • local estimates using kernel smoothing.

Method

During 2019, Statistics Denmark developed a workplace match, whereby addresses from the business register were uniquely linked to property. In other words, geospatial data from the buildings register, the dwellings register and the cadastral register were linked to local units from the statistical business register. This process allows workplaces to be linked, in an easy and consistent way, to a range of different information such as the property owner, surface area (in square metres), a building’s purpose, or its land use.

At the end of 2020, the original workplace match exercise had nearly all workplaces (99 %) linked to the cadastral register, some 75 % to the buildings register and 69 % to the dwellings register, while a small share (1.3 %) of workplaces had no match. Statistics Denmark concentrated on optimising the process based on addresses, concentrating resources on those workplaces where the current address could only be connected to the cadastral register (approximately 30 % of all cases), with the goal of also linking these workplaces to buildings and dwellings. To do so, the addresses were divided into four categories, with a decision tree for each category that was used to identify if it was possible to associate a real property with the address or whether further decisions had to be made to get a match. As a consequence, the number of workplaces that were linked to a real property increased 0.9 percentage points for the cadastral register, 5.9 points for the buildings register, and 9.7 points for the dwellings register.

Statistical methods for producing geospatial statistics were developed using open source statistical software (R). Figure 1 shows an image of the parish clusters used in the analyses.

A map showing parish clusters used in analyses performed by Statistics Denmark.
Figure 1: Parish clusters used in the analyses performed by Statistics Denmark

As part of the work on revising statistical disclosure control processes, Statistics Denmark arranged a round table discussion with Statistics Austria. The latter explained how statistical disclosure control was managed in their office and provided details of the challenges faced, as well as information on support that had been provided by the Federal Statistical Office of Germany. Following internal discussions within Statistics Denmark, the policy for statistical disclosure control was revised. It aims to balance three objectives: the informational value of the statistical product; the simplicity of statistical disclosure control; and confidentiality. The revised policy was published in September 2020, it is available (in Danish and English) on the Statistics Denmark website.

Following the release of the new policy on statistical disclosure control, Statistics Denmark tested the functionality of three statistical tools – simple SAS macros, sdcTable, and TauArgus – in relation to their functionality, usability, and efficiency. A decision was taken to use TauArgus. In a test environment, TauArgus was integrated into the national metadata system (Colectica) to help efficiently define hierarchical classifications.

The final task for this project concerned the development of a set of interactive tools for visualising and disseminating geospatial statistics. This was also carried out using the open source statistical software (R), with the results incorporated into an interactive web-based interface using the R-package Shiny.

Subnational data on SDG indicators were processed using the statistical methods developed for producing geospatial statistics. Data was processed for indicators covering SDG 1 (no poverty), 3 (good health and well-being), 4 (quality education), 5 (gender equality), 8 (decent work and economic growth) and 10 (reduced inequalities). Public access to the datasets was not granted for two reasons:

  • only a limited number of indicators were available at the time of completing the project;
  • public release was conditional on the implementation of a disclosure policy for geospatial data, which was pending.

Results

The interface produced by Statistics Denmark allows users to select between different statistical methods and also provides an opportunity to select various graphical criteria (such as the colour scale used in the output files). As part of the final report, an example was provided for subnational indicators covering two of the SDGs (a poverty indicator and mortality among children aged less than 5 years). Some examples are shown for the first of these, using different statistical methods and geospatial areas.

A visual composed of three example maps showing geospatial data for the density of poverty (a sustainable development indicator) for: a) parish clusters; b) square grids; and c) local estimates using generalised additive models.
Figure 2: An example for visualising geospatial data for SDGs in Denmark

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations