Statistics Explained

Merging statistics and geospatial information, 2020 projects - Slovenia


Establishment of an Earth observation data processing system for the monitoring of agricultural land (permanent grassland and soil moisture); 2020 project; final report December 2022

SI GG2023.jpg


This article forms part of Eurostat’s statistical report on the Integration of statistical and geospatial information.

Full article

Problem

There is a need to develop procedures for extracting improved statistical geospatial information from Earth observation (EO) data.

Objectives

  • Explore the current potential of remote sensing data for official statistics in Slovenia.
  • Tackle the establishment of a system for extracting statistical geodata from Earth observation (EO) data.
  • Assess the status of selected parameters in permanent grassland and the potential for detecting irrigation on agricultural land.

Method

Permanent grassland covers about 3 455 km² or 17 % of Slovenia and represents half of all agricultural land. The presence of irrigation was studied in olive groves, hop fields, intensive orchards and arable land. In Slovenia, there are 1 787 km² of arable land (9 % of all land in Slovenia), 19 km² of hop fields (0.1 %), 45 km² of intensive orchards (0.22 %) and 25 km² of olive groves (0.12 %).

The study used satellite data, land use data and reference data.

Open and freely available satellite data were used from Sentinel-1 (radar data), Sentinel-2, Landsat 5 and Landsat 8 (all optical). Data from the Ministry of Agriculture, Forestry and Food and the statistical office were used as reference data; some samples were also collected as reference and verification data.

  • The machine learning models were mainly applied to time series (for 2017 to 2021) of Sentinel-1 and Sentinel-2 satellite data with a spatial resolution of 10 m. Sentinel-1 and 2 have 6- and 5-day repeat cycles, respectively. Sentinel-2 has 12 spectral bands.
  • Landsat 5 (2005 to 2011) and Landsat 8 (2013 to 2021) offer imagery with a 30 m resolution and a 16-day repeat cycle. Landsat 5 has 6 spectral bands and Landsat 8 has 9 spectral bands.

The land parcel information system (LPIS) in Slovenia is part of the farm register and is a spatial representation of land used by farms. The reference parcel is the farmer block, a compact area of agricultural land used by a farm. The graphic unit of agricultural parcels (GERK) is a compact area of agricultural land with a uniform type of land use. Not all land use polygons are included in this system. For example, about 75 % of the total grassland area is included. The register contains data that farmers report annually to the Agency of the Republic of Slovenia for Agricultural Markets and Rural Development (ARSKTRP), including land use data and crops.

The national register of actual agricultural and forestry land use (RABA) contains aggregated polygons of land with the same type of actual use. It is used for the implementation of the common agricultural policy. The RABA layer is renewed gradually every three years based on aerial photographs. The RABA layer is spatially more generalised than the GERK information in the LPIS; the grassland polygons in RABA are larger and may be less homogeneous. The individual land use zones are defined for the whole country.

Reference data sources were used for training machine learning models for the observed variables.

  • The Institute of the Republic of Slovenia for Nature Conservation’s layer labels intensively and extensively used grassland related to field mapping of habitat types. The data are updated about once per year, going back generally to 2010 and in some cases to 2006. The training dataset contained 491 extensive and 768 intensive grassland polygons. These data were used in one of the two grassland mowing detection models and in the grassland fertilisation detection model.
  • Sinergise provided the mowing event collection for grassland areas and the bare soil collection for arable land in Slovenia. Both collections contain up to 40 000 labelled Sentinel-2 image samples and cover the year 2019. The collection of mowing events was used in conjunction with the mowing marker model in one of the two models to detect the mowing of grasslands. The bare soil dataset was used together with the bare soil marker model for the age of grassland variable.
  • The statistical office provided datasets from the census of agricultural holdings (KME). KME-PMGDK data from 2016 and 2018 for almost 14 000 farms with information on the area of permanent grassland fertilised and the type of grassland; these data were used as a reference for the detection of grassland fertilisation. Other KME datasets were not used due to their age.
  • Irrigation data collected were used for the study, but the spatial distribution of the samples was not optimal. Since the samples were also relatively small for machine learning, modelling was focused in a narrower area (area around the location of reference datasets).
    • The Biotechnical Faculty of the University of Ljubljana takes detailed measurements on a number of plots as part of their projects. These provided daily irrigation measurements for six olive groves, 22 orchards (various fruit trees) and one hop field.
    • Data on non-irrigated (only rainfed) parcels were obtained through short interviews with the owners of orchards and olive groves. Some additional data for olive groves and hop fields were obtained from other sources.
  • Irrigation data for intensive orchards for about 40 GERK with 62 hectares were obtained through telephone interviews (31 non-irrigated and 10 irrigated).
  • Evrosad (the largest fruit grower in Slovenia) provided data for a significant part of their 288 hectares of orchards (23 non-irrigated and 47 irrigated), along with data on orchard structure and type of trees for 2017 to 2022. Similar data were requested from larger agriculture companies.

Meteorological conditions are important for the interpretation of satellite signals. Timelines produced by the Slovenian Environment Agency (ARSO, 2022) show maps of the interannual spatial variability of temperature, precipitation and duration of solar radiation in relation to a reference period of 30 years (1981 to 2010).

Several masks were developed.

  • A mask of the permanent grassland area in Slovenia was developed. Other grassland related land use categories were excluded as they are more heterogeneous and preliminary machine learning analyses showed that the results were less reliable in observing subtle changes.
  • The delimitation of olive groves, hop fields and intensive orchards areas in Slovenia, as well as fields and the mask for irrigation variables were based on the actual land use data (RABA) combined with information from GERK. The mask was developed from the 2021 land use data and applied to all years. Some of these types of areas (such as hop fields) only occur in certain parts of Slovenia, while the observation area for the irrigation of arable land was limited to the region of Murska Sobota.
  • The aggregation mask for grassland related variables was prepared separately and covered the 2021 permanent grassland situation. It contained several classes which allow the selection of areas to be used in the aggregation calculation.

The workflow for the production of annual information layers included the preparation, pre-processing and analysis of satellite image time series, the preparation of reference datasets, the pixel-oriented classification by machine learning and the evaluation of the importance of features as well as the assessment of the mapping and classification accuracy.

In this study, the parameters of grassland use were obtained with different approaches.

  • A mowing event causes a drop and a subsequent recovery in the time series of a vegetation-like signal, such as the normalised difference vegetation index (NDVI). To improve detection of mowing events, information from radar, optical data, and both NDVI-based algorithms were collected. In this way a 28 % improvement was experienced compared with using NDVI alone.
  • Various machine learning algorithms were tested for fertilisation. The most important features for mapping were found to be the vegetation indices NDVI and EVI2 as well as some of the vegetation red edge spectral bands (8-A and 8). IoU (intersection over union) and MCC (Matthews correlation coefficient) over 0.95 were achieved. Nevertheless, a larger amount of evenly distributed reference data would be needed.
  • Grassland renewal (reseeding) and presence of pasture parameters were not determined as accurate reference data was not available.
  • Unlike mowing, grazing can consist of patterns which vary depending on a range of factors, such as the availability of forage and the movements of livestock. It is a challenge to identify grazing accurately as it varies in intensity and spatial extent over time. Detecting grazing activity using satellite imagery requires accurate reference data which were not available.
  • The age of permanent grassland was assessed based on Sentinel-2, Landsat 8 and Landsat 5 imagery time series. Grassland must not be ploughed for at least five years to be considered permanent. Annual layers of bare soil presence were produced on the pixel level with the condition that at least two consecutive bare soil observations were needed to recognise bare soil presence in an area for a particular year. The detected presence of bare soil in the long-term time series was an indication of grassland age. The results suggest that successful detection of bare soil presence is possible and that the methods developed within this project can be used to distinguish between permanent and non-permanent grassland, as well as for determining the age of grassland in general.
A line chart showing a normalised difference vegetation index (N D V I). The lines show time series of a grassland with mowing events and a crop.
Figure 1: NDVI time series of a grassland with mowing events (green) and a crop (orange) (Sen4CAP)

The classification of irrigation variables tried to determine whether a parcel was irrigated or not based on Sentinel-2 time series. The aim was to test possibilities of determining whether parcels were irrigated or not based on EO data alone. The soil moisture status and vegetation development were observed and used to assess the likelihood of irrigation. Similar results were experienced for different land uses. The results obtained with a supervised machine learning model were – from a statistical point of view – good, always reaching IoU and MCC of at least 0.90 for all considered years. Although the classification accuracies were high and most of the mapped parcels were homogeneous, it was still hard to trust entirely the classification results because of the lack of validation data.

  • The most important features for the irrigated olive grove class were various vegetation indices: modified soil-adjusted vegetation index (MSAVI), normalised difference tillage index (NDTI) and enhanced vegetation index-2 (EVI2). IoU and MCC were above 0.95.
  • The most important features in classifying irrigated hop fields were a combination of dedicated indices and spectral bands, chiefly normalised difference moisture index-A (NDMI-A), MSAVI and band 6 (vegetation red edge). Irrigation mapping of hop fields was carried out with overall IoU and MCC values of over 0.90 for all the considered years (2021 not included due to an absence of irrigation due to abundant rainfall). False negatives (irrigated fields that were classified as non-irrigated) were slightly more prevalent than false positives.
  • More than 10 different fruit tree types were present in the reference data for intensive orchards. Apple tree orchards were selected as they are the most widespread in the study area and reference data were available. The data mask was obtained based on crop type data gathered by ARSKTRP. Maps of intensive orchards irrigation reached overall IoU and MCC values of over 0.95 for all years. A combination of vegetation indices and spectral bands was found to be the most important in classification with MSAVI, normalized bare soil index (NBSI) and vegetation red edge bands (bands 5 and 6) at the forefront. False negatives were more common than false positives. Classification was slightly more accurate for orchards with a single type of tree.
  • Diverse crops were grown on the observed fields during the study period with maize, wheat, barley and rapeseed being the most common. Two classification approaches were carried out, one considering all reference fields and the other just those where a maize crop was followed by either wheat or barley. Crop field irrigation was detected with IoU and MCC values over 0.9 for all the considered years. The most important features in the classification were found to be the NDMI-A and MSAVI indices as well as the spectral band 8 (near-infrared). False positives were slightly more common than false negatives. When restricted to the selected crop rotation, all the fields were classified correctly.

Data aggregation provided statistics for NUTS level 2 and 3 regions by first calculating statistics per polygon (within the grassland layer) and then using zonal statistics.

A diagram showing the methodology of spatial aggregation from a raster layer, through a vector layer to NUTS level 3 and NUTS level 2.
Figure 2: Methodology of spatial aggregation

Results

Key parameters of grassland use such as the frequency of mowing, use of fertiliser and the age of permanent grassland were quantified, as well as irrigation variables for different types of agricultural land.

High-resolution, large-scale mapping of land use attributes of grassland or discrimination of irrigated and non-irrigated areas on selected permanent plantations and arable land were achieved. The biggest obstacle to achieving fully reliable results was the lack of accurate reference data.

The results of variables such as mowing events and the age of permanent grassland were presented for Slovenia as a whole and with spatial disaggregation to NUTS level 2 and 3 regions. The conclusions and guidelines reached in the project should help to introduce developed procedures and processes into the statistical office’s system. The main recommendations are improvements to the reference databases (format, quantity, and quality of the datasets) and working with various organisations and institutions to facilitate collaboration and data sharing.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations