Statistics Explained

Merging statistics and geospatial information, 2017 projects - Finland


Integration of geographies and areal classifications as linked open data (IGALOD); 2017 project; final report 19 December 2019

FIN GG2023.jpg


This article forms part of Eurostat’s statistical report on the Integration of statistical and geospatial information.

Full article

Problem

Classification is a well-established area within Statistics Finland with a long history of centralised and coordinated governance. Around 2015, the classifications system was renewed based on the generic statistical information model (GSIM) and this system is widely used within Statistics Finland. Nevertheless, web-based [1] linked data were not very widely used in Statistics Finland.

The National Land Survey of Finland (Maanmittauslaitos) is the official provider of administrative municipality-based areal divisions. Statistics Finland uses custom-made versions of these data in its own geospatial data production processes. Both organisations create and distribute their own municipality-based geospatial datasets separately.

Objectives

The goal was to create a way that linked data could be used in internal production methods within Statistics Finland and also published as open data and within a map application.

Method

Statistics Finland and the National Land Survey of Finland worked together to integrate statistical and geospatial data with linked data methods and to tackle the issue of separate and overlapping work.

  • Statistics Finland produces and maintains various areal classifications, including for example, post code areas, municipalities, regional state administrative agency classifications and hospital districts. The municipality classification is a commonly used areal unit in statistical production and is also the basic unit of areal classifications. Statistics Finland maintains mappings between municipality classifications and several other areal classifications.
  • Annually, the National Land Survey of Finland produces various datasets of municipality geometries (polygons of areas and lines of administrative borders) from the national cadastre, and it is also responsible for a topographical database.

To develop linked data, the project needed to have uniform resource identifiers (URIs) for the data, the data needed to be in a resource description framework (RDF) and linked to other data on the web. Ontologies were needed to describe the integration of areal classifications and geographical data.

Identifiers were created for classifications based on the central master classification system. The National Land Survey of Finland already had identifiers for the dataset of administrative units, but these did not have published history versions. Within the project, identifiers were created for datasets for each individual year, thereby solving issues such as the impact of municipalities that merged on time series. In addition, identifiers were created for various representations of the administrative units (at different scales) and how filters for sea areas were applied.

More generally, the project planned an identifier service for managing unique identifiers. This was not just limited to identifiers for classifications and administrative units (geographies), but for any data object. A user interface was developed for the identifier system.

A diagram showing the logical data model for an identifier system in Finland.
Figure 1: The logical data model for an identifier system

For the resource description framework, turtle (terse RDF triple language) syntax was used.

Ontologies were developed for:

  • classifications, based on the XKOS (extended knowledge organization system) statistical extension of SKOS (simple knowledge organization system), and
  • administrative units (geographies), as a custom ontology based on existing vocabularies (such as INSPIRE AU (administrative units), Dublin Core and GeoSPARQL) as well as a couple of specific custom classes needed for this project.

These ontologies were also linked to the general Finnish ontology (YSO). The two ontologies were united (joined together) on an annual basis through the municipality codes for a particular year.

A web application was developed in order to pilot the use of the linked open data (LOD).

Results

Both organisations published their datasets as Resource Description Framework (RDF). Areal classifications and geometries of the administrative units can be combined using federated SPARQL queries. The SPARQL-endpoints of the organisations can be used to fetch the combined data in RDF format. These can be downloaded or visualised in the ALLUsion service.

A diagram showing the pilot implementation of a solution for the integration of geographies and areal classifications as linked open data (IGALOD) in Finland.
Figure 2: IGALOD solution pilot implementation

The ALLUsion web application makes it possible for users to browse and select administrative units and, if needed, create areas from multiple municipalities. Subsequently, statistics can be queried for the selection: users can select the year, the indicator (population, number of families, number of household-dwelling units, employed labour force resident in the area, number of workplaces in the area), the scale and whether or not to include the sea. The results are presented as a map or a table and metadata is included, for example information about NUTS codes.

A screenshot image of the ALLUsion web application, as developed by Statistics Finland.
Figure 3: The ALLUsion web application

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations




Notes

  1. Tim Berners-Lee outlined four principles for linked data: use uniform resource identifiers (URIs) as names for things; use HTTP URIs so that people can look up those names; when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL); and include links to other URIs, so that they can discover more things.