SDMX explained
What is SDMX and why use it?
The internet and the world-wide web have made the electronic exchange and sharing of data easier and more frequent.
However, exchanges often take place in an ad-hoc manner, using all kinds of formats and non-standard concepts. This necessitates common standards, guidelines, and tools to facilitate more efficient processes for exchanging and sharing statistical data and metadata.
SDMX is an initiative designed to manage and automate the process of data and metadata exchange. The initiative is sponsored by 8 international organisations, including Eurostat. SDMX is a business choice, as opposed to a technical one, that aims to improve the quality of exchanges through standardisation, automation, validation, and data sharing.
Decision makers need to understand background issues. There are a variety of supply-side issues associated with the exchange of statistical data and metadata.
For instance:
- an exchange is complex, resource-intensive, and expensive, with data being collected in multiple ways and transmitted in various formats, across various media
- multiple organisations can collect similar or the same data
- similar concepts can have different content
- the manual nature of data collection can lead to errors and inconsistencies.
There are additional demand-driven challenges, including increased demand for data, quicker and more frequent exchanges, and a wider range of information exchanges.
Decision makers also need to know the advantages SDMX can offer. For instance, SDMX:
- inspires trust
- improves coherence and comparability
- supports modernisation
- improves timeliness and accessibility
- reduces costs and reporting burdens
- removes barriers to implementation and data accessibility
- provides access to a global community of practitioners.
For further information on the benefits of the SDMX standard, please consult the business case for SDMX on the official SDMX website.
Key components
SDMX is an ISO standard, number 17369:2013, and designed to:
- describe statistical data and metadata
- normalise their exchange
- enable them to be shared more efficiently among organisations.
To meet these requirements, SDMX has 3 key components:
The SDMX information model forms the core of SDMX. It describes statistics in a standard way. It identifies objects and their relationships, allows central management, and provides standard access.
In other words, statistical data, metadata, and the data exchange process are all modelled.
Data are concrete observations of a specific statistical phenomenon at a given time. A data set is a collection of related observations that are organised according to a predefined structure.
Data are meaningless unless accompanied by a description. For instance, what does 2 347 mean? It means nothing without concept descriptors and identifiers that explain its meaning.
If we begin to describe this figure in terms of the country, frequency, topic, unit, and time to which it refers, then the meaning becomes clearer. So, it might refer, for example, to 2 347 tourist campsites in Italy.
Data structure definition and metadata structure definition
These descriptors can be modelled according to whether they are:
- dimensions which identify and describe the data
- attributes which provide additional information about the data, such as whether they are estimates
- measures which represent the phenomenon to be measured
These structural descriptors are brought together in what is known as a data structure definition (DSD). The DSD identifies the dimensions, attributes, and measures in a data set, and associates them with common code lists and concepts.
In addition to the structural descriptions of data sets, there are also reference metadata. These refer to information about quality descriptions, process descriptions, methodological descriptions, and administrative descriptions.
Reference metadata are described in a standard way using the metadata structure definition (MSD).
The content-oriented guidelines (COGs) are a set of recommendations designed to maximise interoperability within the scope of the SDMX standard. They are intended to be applicable to all statistical domains.
The COGs focus on harmonising specific concepts and terminology that are common to many statistical domains. This harmonisation facilitates a more efficient exchange of comparable data and metadata and builds on previous implementation experience.
Concepts, lists, and domains
The COGs include cross-domain concepts, code lists, subject-matter domains, a glossary, and implementation-specific guidelines.
Cross-domain concepts in SDMX describe concepts relevant to most, if not all, statistical domains. SDMX recommends using these concepts whenever feasible in SDMX structures and messages. This promotes the reuse and exchange of statistical information and related metadata between organisations.
Examples of concepts include:
- reference area
- statistical unit
- time period
Each concept is described in a standard way with an ID, description, context, and presentation. Statistical concepts that are used in the data structure definition (DSD) or metadata structure definition (MSD) are brought together in an object called a concept scheme.
Code lists are predefined sets of terms from which some statistical coded concepts take their values. SDMX cross-domain code lists are used to support cross-domain concepts.
A statistical subject-matter domain refers to a statistical activity that has common characteristics. These refer to variables, concepts, and methodologies for data collection and the whole statistical data compilation process.
The SDMX glossary contains concepts and related definitions used in the structural and reference metadata of international organisations and national data-producing agencies. It recommends using common terminology to facilitate communication and understanding.
The main idea of the SDMX glossary is that if a term is used, then its precise meaning should match the glossary definition.
For more information, have a look at the recommended practices offered by the SDMX content-oriented guidelines on the official SDMX website.
To support more automated and efficient exchanges of data and metadata, standard tools and an IT architecture are required.
In practice, this means that SDMX promotes the use of standard SDMX-compliant formats (such as .xml). It provides the necessary tools to:
- support the information model
- create SDMX-compliant files
- store SDMX-related artefacts
- map and transcode from existing databases
- validate the structure – and, in the future, the content - of data files.
It provides IT systems with the necessary architecture to connect to the SDMX world, making it easier to share data. The SDMX IT architecture generally follows 1 of 3 models:
- the push mode architecture, in which the sending organisations in EU countries send their SDMX-compliant files to Eurostat (the receiving organisation) through eDAMIS, which is Eurostat’s single-entry point for statistical data and metadata files
- the pull mode architecture, in which Eurostat (the receiving organisation) retrieves the SDMX-compliant data files that it needs from the sending organisations’ databases
- the data hub architecture, in which users make queries to the databases in sending organisations and retrieve the data directly.
Therefore, SDMX is more than just a data transmission format.
All technical specification documents about SDMX that describe the standards are freely available on the official site of the SDMX community.
SDMX continues to evolve with improvements and new functions. Some of these developments, such as the technical specifications of SDMX 3.0 are often available for public review before they are finalised.
Future developments in SDMX
SDMX is now one of the pillars of a modern, industrialised statistical process. However, this does not mean that SDMX is static. On the contrary, SDMX’s action plan is influenced by the growing experience with SDMX among an expanding circle of users. The people who use it are interested in its open-source software tools, its business applications, and evolving standards.
Every 5 years, the SDMX sponsors outline a plan of action, called the SDMX roadmap. The current plan covers the 2021-25 period and provides a vision of how SDMX will develop.
The main objective of the initiative is to build up stronger and more global information systems that can provide open and real-time access to official statistics.
The SDMX roadmap 2025 is based on 4 key strategic pillars:
- strengthening the implementation of SDMX (implementation)
- making data usage easier via SDMX (simplification)
- using SDMX to modernise statistical processes, as well as continuously improving the standards and IT infrastructure (modernisation)
- improving communication and better interaction with the broader community (communication)
It is publicly available on the official portal of the international standard.