From data modeling to knowledge graphs

Product

Use cases

About

Events

Knowledge Graph

From data modeling to knowledge graphs

Published: November 24, 2025

Alekh Jindal

Share this post

Data modeling has been the practice of adding business meaning to data for over thirty years. It involves mapping database entities to end-user vocabulary, identifying relationships across entities so they can be combined meaningfully, and creating the metrics and measures that decision-makers care about. The goal is to transform raw bits and bytes into actionable insights for the business.

Real-world Complexities

To understand the complexities and effort involved in data modeling, we analyzed the schemas of 137 real-world databases that connected to the Tursio platform over the last year. We counted the number of tables, the total number of columns, and the number of columns identified as dimensions and measures, respectively. The figures below show the distribution.

We see that real-world databases can easily consist of 100s of tables and 1000s of columns, with a significant number of them being dimensions or measures. These kinds of schemas are likely to require considerable cleaning, semantic mapping, relationship building, and other nuances to curate them into various sets of data models for different business use cases. This is a complex and time-consuming process that involves a significant amount of human effort and should ideally be automated.

Manual Context Building

Zooming out, data modeling is essentially the process of building a rich business context on top of raw data. This context is what business users rely on to make sense of the information. Today, practitioners create this business context (i.e., data models) manually and on demand, based on the questions or insights business users are seeking.

Building this context manually requires significant effort. BI engineers must understand both the data and the business requirements; they must continually add new context for different users and questions, often duplicating work as new needs arise. And because there is no systematic way to accumulate or consolidate this context over time, it remains fragmented.

Automatic Context Generation

Instead of assembling the context manually through piecemeal data modeling, modern enterprises require a global view of their data that can be leveraged by all users for all questions. Such a knowledge graph should:

Subsume all data models that business users require.
Dynamically provide the right data model as context for each user question.
Reduce the manual effort and time spent building individual data models.
Unblock business users, enabling faster insights and actions.

However, creating this knowledge graph is challenging; production databases are large, making the resulting graph complicated. The process must be automated to minimize manual work; however, the graph still needs to be accurate and useful for business questions. The knowledge graph is essentially a cleaner, business-friendly abstraction layered on top of messy, system-level tables and columns. Achieving this requires LLM-powered semantic interpretation of every part of the data and schema.

The Tursio platform addresses this problem by automatically inferring a knowledge graph from underlying databases. The goal is to construct a global view that supports all required data models. In fact, our earlier version of this graph was called the “Large Data Model”; today, we refer to it more aptly as a semantic knowledge graph, reflecting its deeper semantic interpretation.

The figure below shows the number of small models trained in Tursio to build the semantic knowledge graph for each of the 137 real-world databases. We see 10s to 100s of thousands of small models needed to accurately capture the semantics on real-world databases. Together, these small models capture name simplifications, join relationships, table and column descriptions, ontologies, default measures, aspects, aliases, PIIs, and so on.

Pros and cons

Data models work well when schemas are small and the model is static, meaning it serves a fixed set of questions. In such cases, only a few models are needed, and they can be created manually. Classic examples include a handful of KPI dashboards that rarely change.

Knowledge graphs, on the other hand, are better suited for large database schemas and for situations where data models need to be dynamic, as new questions or new business users emerge constantly. In these scenarios, many different data models are possible. As more data and requirements accumulate, manual modeling becomes increasingly tedious and unsustainable.

Conclusion

Businesses need to be data-driven, and access to data is essential for all decision-makers. Data modeling has been the traditional way of providing that data piece by piece; however, it has proven to be slow and insufficient. Automating the translation of data to business users via knowledge graphs can accelerate that process by dynamically pulling the required context globally. Tursio is taking that leap by inferring knowledge graphs and leveraging them to answer business questions. This is just the beginning of a new era of productivity with much more to come.

Related blog posts

Context Pushdown

Making MCPs Practical with Context Pushdown

Traditional MCPs have a twist the moment you try to run them in production: they can be unreliable, expensive, and slow.

Shivani Tripathi, Wangda Zhang, Alekh Jindal

Context Graphs

Managing Ambiguities in Context Graphs

Agents are the new apps. Agents are quickly becoming the new apps. Instead of navigating complex software interfaces, users are increasingly interacting with AI agents that understand intent, retrieve information, and take action on their behalf. And one of the most compelling things you can do with an agent is connect it directly to operational databases, as seen in the recent move by Google to connect all their operational databases to MCPs.

Alekh Jindal

nl2sql

query-complexity

How far are NL and SQL in NL2SQL?

SQL has traditionally been the language for data experts, alienating business users from the data. However, with rapid progress in large language models (LLMs), translating natural language questions from business to SQL, i.e., NL2SQL, is getting increasingly popular. Turns out that business questions are often high-level and do not have a simple SQL translation. Answering such high‑level questions over complex data is hard since the system must map the business user intent to the right tables, joins, filters, and metrics without losing meaning.

Shivani Tripathi