Can artificial intelligence make sense of public administrations' data?

Susanne Wigard, ISA² Programme Manager, SEMIC action

The SEMIC action of the ISA² Programme recently organised a webinar on ‘Artificial Intelligence and Public Administrations’ that attracted more than 230 participants. Public administrations are experimenting with – and in some case already successfully using – various applications of AI: chatbots, text mining, image recognition, traffic supervision and many more.

Now why is semantic interoperability a key enabler for such applications?

Semantic interoperability is about meaning of data. Together with syntactic information which is about the formal rules, it enables the representation of information in structured, machine-readable formats, where each element has a clear and unambiguous meaning. A lot of what we do in SEMIC is related to the development of data models, the management and representation of reference data and master data, and to making datasets discoverable through a common set of metadata. The semantic layer of the European Interoperability Framework is basically the data layer, and data are what is fuelling artificial intelligence.

For the more ‘traditional’ types of AI such as machine reasoning and expert systems, the connection with data modelling and Semantic Web technologies is obvious: knowledge is expressed in structured, machine-readable formats, linking information from various sources, and through the application of rules new knowledge is deduced from this existing information. Search engines deliver better results when the input data are already in a structured format. If this is combined with speech recognition and natural language processing, ‘bots’ can provide a very natural interface to end users, e.g. to citizens looking for information on public services.

Today's understanding of artificial intelligence is more ambitious

Machine-learning technologies ingest huge amounts of information to train models which then classify new data or make predictions about reality.

Neural networks learn in a way that is more closely modelling the way that the human brain learns.

In supervised learning, the AI is trained on large datasets, where we already know the ‘correct answer’, e.g. which set of images are displaying a car. We therefore need these ‘good’ input data (or metadata) in a machine-readable format. More complex classification jobs require more complex data models to describe both training data and results.

In unsupervised learning, the machine will discover regularities, patterns and clusters in the input data without prior input of information. It is then up to humans to assign labels to these clusters. For example, the machine might identify from statistical information texts that are about a common topic, but the human would need to name that topic.

Trained models from both supervised and unsupervised learning can be applied to extract useful information from the flood of data that are available on the internet, social media, and from sensors and portable devices. Knowledge, both facts and rules, can be represented in coherent formats (e.g. simple subject-predicate-object statements as RDF triples) and consolidated in a structured knowledge base, linked with other existing knowledge for enhanced queries.

Neural networks help ‘understand’ plain text resources, algorithms recognise entities such as people, organisations and places in a text and subsequent references to them (such as ‘they’ or ‘it’ being a certain person or organisation). They can also annotate them in machine-readable formats for easy ingestion into the knowledge base.

In all of this, ISA² specifications developed by SEMIC can play an important role – we therefore invite you to have a look at our Core Vocabularies, ADMS and DCAT-AP.

Wednesday, 24 April, 2019