Will SQL be the new Assembly Language?

Product

Use cases

About

Events

Generative AI

Will SQL be the new Assembly Language?

Published: October 3, 2025

Alekh Jindal

Share this post

Memoir

Twenty years ago, my partner Yash and I wrote our bachelor thesis titled “Microcontroller-based power distribution, monitoring, and control”, where we hand-coded an 80196 microcontroller with 1,075 lines of assembly code. Although C converters were available, our guide, Professor S. P. Das, believed that assembly language was more rigorous and would help us understand the system design better. Turns out it was painful to get the assembly code working, and we spent countless hours debugging our code using paper printouts. Yet, the experience was rewarding, giving us a real sense of how the hardware works.

Hitting the SQL wall

Today, a similar situation exists for databases and SQL. Databases are like the lungs, pumping oxygen into modern enterprises, yet working with SQL remains challenging. In fact, some of the best ideas are in the heads of people who can't write SQL, even though nearly all those ideas require data from databases where SQL is the de facto language -- the SQL wall. This creates dependencies and slows people down. To gauge the scale of the problem, a quick Perplexity search reveals that while a few million people know SQL and may enjoy working with it, there are billions in the technical workforce who neither know nor have any interest in learning SQL. Eventually, their ideas may hit the SQL wall.

While the above statistics are only estimates, it is telling that even after decades of data-driven initiatives, much of the technical workforce still cannot access data in a meaningful way. They continue to rely on other people and processes to bring their ideas to fruition.

The "self-serve" that was promised

Stepping back for a moment, “self-serve” business intelligence (BI) was indeed the promise made to business users in the last quarter of this century. Unlike the 90s, when in-house experts delivered reports, the idea in the early 2000s was to decentralize decision-making through greater collaboration and collective knowledge, often referred to as business intelligence 2.0. Numerous BI tools, including Microsoft Power BI, Salesforce Tableau, Google Looker and Data Studio, IBM Cognos, Amazon QuickSight, ThoughtSpot, SAP BusinessObjects, Mode Analytics, HEX Technologies, JetBrains Datalore, Elastic Kibana, Qlik Sense, MicroStrategy, Apache Superset, and even newer ones such as Databricks, Snowflake, Sigma Computing, and Omni Analytics, set out to replace SQL with dashboards.

Unfortunately, dashboarding tools didn’t change things much for business users:

Creating dashboards requires pulling data, setting up data models, crafting visualizations, and preparing them for business users, all of which involve expertise, both in data and the BI tool, that is typically scarce and keeps business users blocked.
The process of building dashboards is time-consuming, taking anywhere from days to weeks. It is also iterative, requiring business users to communicate their requirements and ensure they are translated correctly.
Dashboarding is still limited to presenting data and charts, even though the ultimate goal is to make decisions. As a result, business users may need to further process the results before they can use them.
Finally, organizations typically end up with far more dashboards than they need, while still creating new ones all the time. This leads to a situation where finding the right dashboard is difficult, and even though so much time was spent building them, many remain unused.

In short, with BI 2.0, dashboards have replaced SQL not just as a new interface but also as a new pain point, while “self-serve” remains an illusion even after decades of tooling.

Text-to-SQL is a hammer in search of nail

The new wave (or rather, rage) of AI tools has reignited the debate about whether SQL will become redundant. With large language models (LLMs) becoming really good at processing natural language, the question is whether business users can finally express their needs directly to the LLMs. Ironically, answering questions on databases using LLMs has turned into a SQL-writing problem; rather than eliminating SQL from the picture, users are now inundated with auto-generated SQLs. This is not helpful, since people don’t really want SQL that maps to tabular results; they want answers. Interpreting the tabular data could lead to answers, but that requires the SQL to be correct in the first place, and someone needs to check that.

LLMs are trained on the entire world’s data and they can perform a lot of reasoning tasks, yet they are not grounded in enterprise data. To overcome this, most new AI tools put the onus of verifying SQL queries on the users. This brings us back to square one, where knowing SQL is still required to solve the problem of SQL being hard to know. So the question remains: will we ever reach a point where people no longer need to know SQL? Interestingly, this seems difficult with pre-trained models, especially as pre-training is hitting the wall. Coincidentally, text-to-SQL accuracy has also plateaued:

So while on one hand business users are asking to convert their text into answers and not SQL, on the other hand even the accuracy improvements of text-to-SQL have become incremental.

Towards higher-level query engines

Most people do not want to deal with SQL. They care about their questions and want to use data to answer them. Forcing them to inspect and verify the SQLs is counterproductive, much like asking people to inspect assembly code. Instead, modern users want to be assured of correctness using the new age AI that is expected to connect data with its consumers. The challenge is to control LLMs and ensure they operate within the boundaries of the enterprise data.

We need higher-level query engines that can constrain AI to operate strictly within the realm of the underlying data. This requires reimagining the entire stack, from data modeling and semantic understanding all the way to query compilation, optimization, execution, and result processing. To illustrate, consider the following question on the TPC-H dataset:

Example: “What are the top Chinese suppliers offering the most discounts in 2021?”

This question has several parts to it, including:

Chinese suppliers
Most discounts
In 2021
Top

A lot of things need to be correct when answering the above question, including the joins (based on the join graph), aggregates (count vs count distinct, average vs summing), comparisons (equality vs like), filtering (dates vs string), syntax (limit, etc.), augmenting with additional information, visualizations, aspects, and so on. Tursio guarantees all of these nuances every single time as illustrated in the demo below:

So, what’s the secret sauce?

The key is reducing the ambiguity that general purpose LLMs suffer from. Tursio does this systematically by first inferring a semantic model and then identifying which portions of the semantic graph to constrain the query to. This is a completely new way of processing queries, very different traditional SQL query processors, that combines determinism with creativity.
In addition to constraining, Tursio also guides the LLMs via relevant context. It auto-generates a large corpus of valid questions to cover the space of all queries uniformly and then identifies the relevant context for the incoming question. The goal is to automatically generate synthetic context over large databases and reduce the ambiguity by providing just the relevant one.
Users today expect that they should be able to ask anything in natural language. However, long winded questions can confuse the LLMs and make the responses flaky. Tursio reduces the noise by tokenizing questions into fragments for better interpretability, i.e., instead of trying to interpret the question as a whole, it first identifies what are the various components in the question (same as the example above).
Finally, the SQL query generation using LLMs can be unstable or even incorrect. Tursio avoids these issues by building query plans systematically step-by-step. It first constructs the operator trees and then rewrites them iteratively, using techniques from query processing, to make it as correlated to the user question as possible.

The next generation of data analytics will be AI-powered -- simpler, faster, and more efficient. Tursio is a practical step in that direction, with a reimagined query processing stack that is already deployed in several customer environments. But this is just the beginning; fusing enterprise data with AI is an exciting new world that is yet to be explored fully.