Statistics Explained

Merging statistics and geospatial information, 2018 projects - Cyprus


Using geographic information in the 2021 population and housing census; 2018 project; final report June 2020

CY GG2023.jpg


This article forms part of Eurostat’s statistical report on the Integration of statistical and geospatial information.

Full article

Problem

There is a need for up-to-date geographic information for use in the organisation and analysis of the 2021 population and housing census.

Objectives

  • To create a basis for the integration of geographic information and statistics in order to produce a database of the road network for use in the 2021 population and housing census.
  • To update the database of the digital enumeration blocks that will be used during the census.
  • To create a basis for a future geocoding infrastructure for statistical production processes based on address, building and dwelling registers.

Method

Updating the database of the road network

Three databases were used to update the digitised database of the road network as a spatial database containing unique identifiers for each street.

  • The road network maintained by the Department of Lands and Surveys (DLS). This database includes the address of each street, its quarter, municipality/community and district.
  • The official database of all street names of the Republic of Cyprus, as provided by the Ministry of Interior (MOI). This includes the street name, quarter, municipality/community and unique code number of each street (labelled CILIS).
  • The road network of CYSTAT, as created for the 2011 population and housing census. This includes a unique code for each street (labelled Streetcode).

The combination of the databases started with a cleaning exercise. A standard for the presentation of addresses was chosen, based on the Postal Office address register. The three initial databases did not have a common numeric key nor a common standard for addresses. The matching of the data would be based on two fields: address name and quarter code and efforts were made to include these in a coherent manner for all of the files. Addresses with unknown street names were removed.

After cleaning, the three datasets were matched using different techniques.

  • Matching of the CYSTAT and MOI files using deterministic matching techniques. Around three quarters of the records from the CYSTAT file were matched with records from the MOI file.
  • Matching of the CYSTAT and MOI files using probabilistic matching techniques based on the COMPGED and the SPEDIS functions in SAS. Various thresholds were created to balance high numbers of matches against low numbers of incorrect cases. This increased the share of matched records to around 82 %.
  • Manual matching of the CYSTAT and MOI files. For the remaining addresses, a manual approach led to only a few additions to the matched dataset.
  • Matching of the CYSTAT and DLS files using automated geographic matching techniques. All the streets in Cyprus from each of the two files were distributed in different shapefiles according to their district to make the matching processes more manageable. Each record included in the CYSTAT file was assigned to a specific community. The CYSTAT file was divided into nine layers according to their district and urban/rural area. In the DLS file, records without complete information on their district were completed using ArcGIS, in some cases with a detailed manual check. The DLS file was divided into five different layers according to districts. A model was built and implemented on each of the CYSTAT shapefiles (see Figure 1): the information for the roads in the CYSTAT files were aligned with the DLS file, repaired for geometry problems, simplified by aggregating all street vectors according to their street code, buffered and finally joined to the DLS dataset.
  • Matching of the CYSTAT and DLS files using manual geographic matching techniques. Unmatched records were manually matched using ArcGIS and applying a hierarchical set of rules.
  • Matching of the CYSTAT, MOI and DLS files using SAS software. There were around 25 000 records in the DLS file that were not in the CYSTAT file and their street name, municipality/community variables, quarter variables and post code were identified. All the new street codes from this identification process, along with the existing street codes from the CYSTAT file, were matched with the DLS File. Finally, the CYSTAT-DLS matched dataset was matched with the CYSTAT-MOI matched dataset (using the CYSTAT street code as the key variable) and imported into the shapefile of the DLS roads, thus forming a final geographic layer which contained all the information from the DLS, CYSTAT and MOI files.
A diagram showing a model built for the automated geographic matching of streets between files from the Cyprus Statistical Service (CY STAT) and files from the Department of Lands and Surveys (D L S).
Figure 1: Model built for the automated geographic matching of streets between CYSTAT and DLS files

Updating the digitised enumeration blocks

During the 10-year period between 2011 and 2021, significant changes were observed in population trends due to the economic crisis of 2013. For this reason, CYSTAT needed new ways to update the enumeration blocks.

A shapefile of all the parcels in Cyprus was obtained from the DLS, along with a description in Excel of what is included in each parcel (for example, the number of properties and their description and type), connected in most cases using the unique parcel code. Where this code was missing, it was created and the two files were matched. The parcels included in the shapefile were joined with the 2011 enumeration blocks using ArcGIS. These were matched with the Excel file. From the resulting matched files, all housing units could be filtered and these aggregated within each enumeration block.

For the 2021 census, a threshold of 300 or fewer housing units was set for each enumeration block. After examining the list of enumeration blocks to be further divided according to the threshold, all enumeration blocks were checked using a) the most up-to-date satellite images available to CYSTAT and b) information obtained from the 2011 census, specifically information on occupied and unoccupied housing units and the total population. Based on this, some enumeration blocks over the threshold were not divided. A final list of 3 411 enumeration blocks was constructed for the 2021 census (compared with 3 157 used in 2011).

Results

The methodology for matching the databases for the road network was thoroughly examined and knowledge acquired. The updated database of the road network includes matched data from the three different data sources. This will facilitate the production of 2021 census data for the 1 km x 1 km grid. Furthermore, the 2021 census database will be used as the starting point for a dynamic statistical register maintained by CYSTAT. The database on the road network will be continuously maintained and updated by CYSTAT and DLS.

The database of the enumeration blocks to be used during the 2021 census was updated. This will be used for organisational purposes and for the integration of statistical and geospatial data for statistical analysis.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations