How AI could be the key to better data governance

The success of AI governance is intertwined with strong data governance. By reliably automating routine tasks and freeing human resources for more strategic work, AI can revolutionise data governance.

Jun 21, 2025 - 02:52
 0
How AI could be the key to better data governance

It might seem counterintuitive to make use of a technology that’s often questioned for its reliability to enhance data governance in an organisation. However, when skilled data engineers harness generative AI in the right way to refine data quality, then we can unlock the opportunity to develop more precise and dependable AI-powered applications.

 

Generative AI models excel at producing human-like responses, but they are prone to hallucinations and lack the ability to extract insights from internal company data that isn’t included in their training.

 

However, this internal data is often essential for enterprise applications. Imagine a chatbot that provides a customer with the exact details of their personal insurance coverage when asked. Or an AI-powered assistant that can immediately give IT teams real-time diagnostic data to troubleshoot a system outage.

 

These scenarios require precise, immediate answers. And machine learning engineers need access to accurate data to realize the potential of generative AI (GenAI) in business. This is where data governance plays a crucial role in mitigating both operational and reputational risks stemming from flawed AI-driven decisions.

 

By enriching data with metadata that details its structure, source, and intended use, data teams can maintain high data quality and improve the accuracy of GenAI applications. Beyond business concerns, this practice aligns with emerging regulatory frameworks that emphasize data integrity, security, and accountability.

 

But generating metadata manually is a time-intensive task, and busy data teams often bypass it or omit it altogether. A useful comparison is Tim Berners-Lee’s vision of the “semantic web”, where web content would become significantly more useful through machine-readable tagging—an idea that largely failed due to the burden of manual annotation.

 

The same challenge exists today in data governance.

 

Interestingly, while GenAI increases the demand for robust data governance, it also provides a solution. By prompting a generative AI model with labelled data examples, it can automatically generate metadata. While human oversight is still required to verify accuracy, the process becomes far more efficient than manually creating metadata from scratch.

A data product mindset 

The value of high-quality data extends far beyond AI applications, too. 

Data-driven decision-making has become essential in every sector, from healthcare and finance to government and retail. This growing demand has sparked interest in creating unified data catalogues to make the discovery and usage of data more accessible to teams. 

By combining GenAI’s ability to create metadata with data streaming platforms that curate reusable data products, organizations can democratize data access, fostering innovation and productivity across the board. 

 

To be effective, metadata must bridge both technical and human requirements, including machine-readable elements like database schemas and field descriptions, alongside human-readable context such as data origins and intended use cases. The goal is to ensure that any user across the organization can quickly understand both where the data came from and how to use it effectively.

 

A robust data governance framework, leveraging metadata schemas, provides the essential structure that helps GenAI models perform more effectively with domain-specific information. Prompting these models with quality examples of data collection and generation processes can significantly improve results. 

While it's possible to retroactively generate metadata using GenAI on older datasets, the accuracy is limited by potentially outdated schemas. For optimal results, metadata creation should be integrated into the data production process itself.

Human-AI collaboration

Despite its endless potential, AI is still in its infancy—and that makes human oversight indispensable. 

While AI can be remarkable at identifying patterns, it often struggles with generalization, especially when working with limited training examples. We still haven’t been able to replicate the human intuition and understanding that can only come with years of learning and experience. Thus, human expertise can effectively complement AI's ability to rapidly process large volumes of information.

 

Consider how we instinctively understand the significance of different vehicle license plate colors in India or recognize various RTO codes by state and district. An AI model might misclassify these without sufficient training data or contextual understanding of Indian territories. Human reviewers can easily spot and correct such errors.

 

While the choice of the underlying LLM is a factor, success depends more critically on well-defined workflows for data curation and the careful contextualisation of the system prompts. A collaborative human-AI approach ensures the accuracy and reliability in metadata creation, ultimately driving better business outcomes.

The role of data streaming

Recalling the Semantic Web, we never saw its vision realized of making the web machine-readable in the way its creators envisioned. Yet the web became machine-readable in a way that few foresaw in the early 2000s, because machine learning got far better at understanding media created for humans. In a similar way, better machine learning presents a better alternative to completing the rote tasks necessary for data governance. 

GenAI presents a similar opportunity to revolutionize data governance by automating routine tasks and freeing human resources for more strategic work.

 

The key enabler is a robust data streaming platform, capable of processing real-time data generation. This allows for immediate and incremental metadata application during production, making the data instantly available for other applications, with governance controls that support a unified data catalog.

 

Looking ahead, we can envision GenAI taking on increasingly sophisticated governance tasks. While this future may take time to realize, GenAI is already helping eliminate routine work like schema definition and application. This creates a virtuous cycle: better quality GenAI applications lead to more widely available data, which in turn improves AI capabilities.

As the industry works to define AI governance frameworks, one thing is becoming clear: the success of AI governance fundamentally depends on strong data governance. By helping data engineers trust and effectively use their data, we're laying the groundwork for more reliable AI applications. 

The future belongs to organizations that recognize this symbiotic relationship and invest in tools and processes that enhance both data quality and AI capabilities.

(Andrew Sellers is the Head of Technology Strategy at Confluent.)


Edited by Jyoti Narayan