Avoid data chaos and costs – use data ontology

By Mervyn Mooi, Director of Knowledge Integration Dynamics (KID) and represents the ICT services arm of the Thesele Group.

Johannesburg, 12 May 2023

Organisations today are enthusiastic about the value and potential power of data, and most are focused on data/information management quick wins, agility and fluency.

The new paradigm is to ingest data and, on the fly, do whatever they want with it. Unfortunately, harnessing data-for-everything is not as simple as this. There are keystone building blocks and practices behind the scenes that prepare the data to deliver the quick wins and enable fluency, and these are too often ignored.

Data taxonomy and ontology are keystone examples. Without proper definition and classification of data elements, components and artefacts, and its relationships, organisations will grapple with much chaos in making sense of their data, managing and processing data, reporting information and engaging with customers across departments and in the marketplace.

In some industries, common classification models are increasingly being used to eliminate confusion, reduce costs and support data-driven projects.

For example, a bank might not have standardised definitions of what constitutes a client, and the various types of clients – personal, private, commercial or corporate. This could lead to inaccurate reporting on client numbers, lost opportunities to cross-sell or up-sell, and a poor customer experience when the contact centre tries to sell a product to a client who already has it.

In some industries, common classification models are increasingly being used to eliminate confusion, reduce costs and support data-driven projects, including process automation, machine-learning application and artificial intelligence. Common industry standards and accords support sectors such as accounting and chemistry, where accuracy and consistency are crucial.

In the same way, data architecture should align with taxonomy and ontology standards across sectors and within organisations. Many fall short of doing so. Data projects are often complex and time-consuming, and many businesses don’t prioritise and formalise the organisation of their data upfront, which results in wasting time and resources downstream.

Getting the ontology in place

Taxonomy classifies data into categories, all of which have common or dissimilar definitions, terminologies and/or semantics. It is a structured harnessing of data, process and system component definitions. This finds its place in a business glossary, data dictionary and/or metadata repository, which is a central point of reference for an ontology.

Ontology organises data and process elements, components, artefacts, definitions and all things data; maps data (content) or placeholders (metadata); and relates or links (contextualises) these so that consuming devices and users can make sense of it all for business applications.

Ontology enables easier curation and distribution of information to the right channels and allows for the better usage of PPT (people, process, technology) resources, and is a key factor for data governance.

Indecision on the part of business about conflicting business and data definitions and semantics will cause inefficient data, process and system designs, and will lead to ambiguity in reporting. A lack of (or weak) data architecture practices will result in chaos and wastage of PPT resources.

Ontology is engendered or created with data and process solution architects, who are guided and mandated by data policies and standards as enforced by the risk and compliance and/or data governance teams.

Ontology is implemented by the data and software application engineers in consultation with business principals, administrators and data stewards. All the relevant role-players along the data journey have a responsibility to maintain the rules and standards of the ontology.

Taking ontology back from the edge

A best practice for organisations is to ingest data and containerise (organise) it according to ontological and taxonomy rules as necessary. This approach is finding its way into data quality and curation processes where classification and incumbent decisions can be automated.

One approach is to organise (classify/curate) data on the fly, per data query, on the edge (the consumer side). This can mean the repeated organisation of the same data, resulting in wastage of PPT resources.

Instead, taxonomy should also be applied to unstructured data. Mapping (relating or linking) data, whether it be the content itself or its metadata, not only reveals the “anatomy” (or ontology) of the data landscape, but also enables efficiencies, such as the ability to reuse, consolidate and deduplicate data, automate processes and carry out lineage evidencing. It also reveals inefficiencies, overlaps and gaps.

Organisations seeking to achieve greater value from their data need to start with the discipline of classification and taxonomy of data in glossaries and dictionaries to support ontology.

This is a transitional process, almost organic in nature, but as it picks up momentum, the organisation will get to a point where most or all of the data elements, components, artefacts and definitions are linked/related.

Avoid data chaos and costs – use data ontology

The keystone building blocks and practices behind the scenes that prepare data to deliver quick wins and enable fluency are too often ignored.

Getting the ontology in place

Taking ontology back from the edge