Statistics Explained

Merging statistics and geospatial information, 2017 projects - Poland


Defining new areas for statistics along with analysis of administrative units’ boundary changes’ impact on published indicators; 2017 project; final report 20 December 2019

PL GG2023.jpg


This article forms part of Eurostat’s statistical report on the Integration of statistical and geospatial information.

Full article

Problem

The project contained three actions related to the following needs.

  1. Development of guidelines for an information system presenting the character, scope and impact of changes in the territorial division of the country on official statistics published in the form of basic indicators at regional and local levels.
  2. Development of a statistical division framework for official statistics with respect to the geodetic division of the country and needs related to survey sampling.
  3. Development of a framework for publishing statistical data for urban areas in statistical grids with a level of detail higher than 1 km x 1 km with respect to statistical confidentiality.

Objectives

Action 1

  1. Analysis of changes in the territorial division of the country introduced annually in the National Official Register of the Territorial Division of the Country (TERYT); analysis of the impact on publishing statistical surveys’ results for time series.
  2. Estimation of the impact of the changes of territorial divisions on basic indicators published at regional and local levels.
  3. Development of guidelines for a web application presenting changes in the territorial division units in Poland.

Action 2

To examine the possibilities of using the limits of cadastral units as units for public statistics. To compare statistical and geodetic divisions. To analyse the size of statistical units in terms of their use in sampling for surveys.

Action 3

To develop the production and visualisation of grid-based data of various sizes from point data.

Method

Action 1

The first step was an analysis from 2009 to 2019 in terms of territorial identification system and principles for processing changes, including

  • reports comparing the lists of identifiers and names of the units of territorial division,
  • information on changes in the territorial division of the country,
  • the identification of changes recorded in data in the TERYT register,
  • principles for registering changes in the register’s information technology (IT) system.

Note: the TERYT register is the National Official Register of the Territorial Division of the Country and spans the following systems: identifiers and names of units of territorial divisions (TERC), identifiers and names of localities (SIMC), statistical regions and census enumeration areas (BREC), and identification of addresses: streets, real estate, buildings and dwellings (NOBC) – including the central street catalogue (ULIC).

The analysis identified:

  • changes ensuing from the change of unit type (gminas, cities with powiat rights – urban gminas which are governed by a city mayor) affecting the unit’s territorial code;
  • changes of boundaries resulting in relocation of particular areas together with their population, both affecting and not affecting the unit’s territorial code;
  • changes ensuing from relocating whole units entirely into a new unit at a higher level (such as moving a gmina (municipality) into another powiat (county or district));
  • other (based on the detailed analysis).

At the same time, the system for territorial coding used in four databases (Local Data Bank, STRATEG, Demografia and selected Knowledge Databases) and the method of informing users about potential changes in the territorial division units was analysed.

Action 2

Records of lands and buildings cover the entire land territory of Poland. The units of division for the purposes of the records are a registration unit, a cadastral unit and a cadastral plot. The registration unit is divided into cadastral units. The TERYT register (as noted above) is the official register of the territorial division used for statistical purposes.

A 10-level model was developed to examine the possibilities of using the limits of cadastral units (within the geodetic system) as units for public statistics, in order to improve the integration of statistical and geospatial data. The first five levels of the model are common to the geodetic system and the statistical system.

A visualisation showing a proposal for the harmonisation of geodetic and statistical divisions in order to improve the integration of statistical and geospatial data.
Figure 1: Proposal for the harmonisation of the geodetic and statistical divisions in order to improve the integration of statistical and geospatial data

An analysis of the numbers of people and dwellings was conducted for Lubelskie Voivodship and the capital city of Warsaw. This was done for

  • statistical division units
    • statistical regions
    • census enumeration areas
  • cadastral units.

The first stage of sampling for surveys is census enumeration areas or statistical regions. An analysis was made of the number of dwellings in these types of statistical divisions, in order to evaluate the distribution of such units in terms of their size. To ensure that each unit drawn in the first stage of sampling contained an appropriate minimum number of dwellings, units that were too small were combined. Units that were too large were (re)solved by means of standard cluster analysis algorithms using information about xy coordinates.

Two further analyses looked at distances within census enumeration areas and within statistical regions. This started with a calculation of the distance between each address and the centre of each area. From this, indicators of average and maximum distances were compiled for each area. These were analysed to identify unusual cases for further investigation.

For the analysis of geodesic divisions, spatial data of cadastral units were obtained and assigned to census enumeration areas and a database of cadastral codes combined with the addresses of dwellings used by Statistics Poland as the sampling frame for social surveys. A statistical analysis was then made of statistical and geodesic division units, for example calculating the number of census enumeration areas or statistical regions within cadastral units.

Action 3

Five cities were selected: the Warsaw metropolitan area, the metropolitan association of Upper Silesia and Dąbrowa Basin, Zielona Góra, Słupsk, and Giżycko.

A map showing the five urban areas selected for the Statistics Poland project.
Figure 2: Urban areas selected for the project

A Python script was created to perform automated aggregation of point-based statistical data to grids of various sizes.

  • The script operates on an ArcGIS file geodatabase.
  • Grid cell size is a customisable parameter defined in the script in metres, grids of side length of 100, 250, 500 and 1 000 m were used.
  • The default coordinate reference system is the ETRS89 Lambert Azimuthal Equal-Area projection, though it is also a customisable parameter in the script.
  • Coordinates of five bounding boxes (one for each area) have been defined.
  • The script creates a grid by iterating through defined bounding boxes and then merges the grid into a single polygon layer:
    • a temporary layer with points corresponding to the centre of each grid cell is created for data aggregation purposes,
    • each point in this temporary layer gets the coordinates of the lower left corner of the grid cell as attributes,
    • each point in the temporary layer is then joined to the polygon grid layer in order to transfer the coordinate attributes,
    • the polygon grid layer is then clipped to the polygon layer of the five chosen areas.
  • The first step of data aggregation is performed as a spatial join of point-based statistical data (persons) to the polygon grid cell; each statistical data record receives coordinates of a grid cell.
  • Total population and population by sex (as well as the share of women) are calculated and saved in geodatabase tables which are then joined to the polygon grid layer.

Results

Action 1

A system was developed to supplement the functionalities offered by the databases of Statistics Poland with respect to their spatial scope. It is possible to quantify a range of changes in the classification and to visualise the impact of these changes on various datasets. The implementation of the system will support analysts in the correct interpretation of differences for key indicators in the observed territorial units. It can also be a repository/archive for different iterations of territorial divisions.

The recommendations include guidelines for:

  • a model for descriptive data (metadata) – including a download functionality for users,
  • a system for graphic visualisation (tables, figures and maps) of changes along with a presentation of the impact for basic indicators published by Statistics Poland,
  • the units within official statistics that could be responsible for the implementation and maintenance of the system,
  • administrators regarding implementation of the system in Statistics Poland’s databases.

The uniformity of the coding system is based on the Coding System for Territorial and Statistical Units (KTS) which ensures the coherence of spatial systems and the way of identifying territorial units in all statistical databases.

Action 2

The following tasks have been completed:

  • analysis of the size and spatial diversity of statistical units in terms of population limits as a criterion for delimitation of statistical regions and census enumeration areas;
  • a comparative analysis of statistical and geodetic division units;
  • analysis of the size of statistical units in terms of their application at the first stage of sampling for representative surveys.

The results show that:

  • the distribution of the number of dwellings in cadastral units is asymmetrical (right inclined); there are cadastral units containing only one address and very large ones containing tens of thousands of dwellings;
  • the average number of dwellings in a cadastral unit is 276, however for half of the cadastral units this number is less than 71;
  • the distribution of the number of census enumeration areas within the cadastral units is also right inclined; in 75 % of cases, cadastral units contain one or two census enumeration areas, while the average number is four areas;
  • the average number of census enumeration areas in cadastral units ranges from 2 (in villages) to 24 (in cities with 0.5 to 1.0 million inhabitants) among the six size classes used in Poland, and from 2 (Lower Silesian) to 10 (Silesian) among the voivodships;
  • the distribution of the number of statistical regions in cadastral units is right inclined; in 75 % of cases, a cadastral unit contains one region, while the average number is two regions;
  • the average number of regions in cadastral units ranges from one (in villages) to six (in cities with 0.5 to 1.0 million inhabitants) among the six size classes used in Poland, while the average number of regions in cadastral units is two in nearly all voivodships (an average of four in Silesian is the only exception).
A map showing the boundaries of statistical regions within cadastral units for Warsaw.
Figure 3: The boundaries of statistical regions within cadastral units for Warsaw

The concept of using cadastral units as a division of space for the needs of official statistics in urban areas is not advisable as the number of dwellings and persons within most cadastral units is too high. In rural areas, cadastral units could be used. Nevertheless, issues arise if a rural area is transformed into a city or into an urban area and there may be interruptions to time series.

Action 3

The following tasks have been performed:

  • a selection was made of areas for the pilot aggregation, ensuring diversity of those areas (big city, agglomeration, metropolitan area);
  • IT tools were built for automated aggregation of point-based statistical data to grids of various resolutions with respect to primary statistical confidentiality (work was ongoing at the time of the final report);
  • visualisations (in the form of maps) were produced for each region for a variety of grid sizes.
A map showing the total population for a 100 metre grid, based on the average value method.
Figure 4: Total population in 100 m x 100 m grid – average value method (average value: 42)

Data which incorporate spatial location are gathered with reference to an address point. In case of population census data, this concentrates all people living in a building in a single point. In particular for apartment buildings, this creates a false impression of population distribution. At a detailed level (for example 100 m x 100 m grid cells), the population of the cell where the address point of an apartment building may be too high while that of the surrounding cells is too low. The extent to which this occurs depends on the existence of large apartment blocks. The situation could be improved by assigning population data to a polygon of a building’s area rather than to a single point.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations