Why Text-to-SQL Breaks Down in Real-World Data Analysis

Table of Contents

Everyone is building natural-language interfaces on top of databases. Most of them are solving the wrong problem. They aren’t building AI analysts; they’re building glorified query translators that turn good questions into dangerous, out-of-context answers.

For the last two years, one of the most popular ideas in the AI analytics space has been deceptively simple: let users ask questions in plain English, translate those questions into SQL, run the query, and return an answer.

That is the core promise behind most text-to-SQL tools and many so-called AI analysts.

On the surface, it sounds like the future of business intelligence. No dashboards to dig through. No analyst bottlenecks. No need to learn SQL. Just ask a question and get an answer.

But that framing hides a deeper problem: data analysis is not primarily a query translation problem. It is a business reasoning problem.

The issue is not that large language models cannot generate SQL. They can. The issue is that correct SQL is not the same as a correct answer.

That is why text-to-SQL works well in demos, sometimes works in narrow internal workflows, and often breaks down in real-world business analysis.

The hard part of analytics is rarely retrieving rows from a database. The hard part is understanding what the data means, which definitions are trusted, which edge cases matter, and how metrics map to actual business decisions.

The future of analytics isn’t a better translator—it’s a layer of business context that understands your company as well as your best analyst does.

Text-to-SQL does not solve that. It simply hides the gap behind a cleaner interface.

What text-to-SQL gets right

Before criticizing the approach, it is worth being fair about where it is genuinely useful.

Text-to-SQL can be effective when:

  • The schema is relatively small and well-structured
  • The user is asking low-stakes factual questions
  • The output is reviewed by someone who understands the data model
  • The system is used as a copilot for analysts, not as an autonomous decision layer
  • The business logic is simple enough to be inferred directly from tables and fields

In those environments, text-to-SQL can reduce friction. It can help users explore data faster, assist analysts with repetitive query writing, and make simple data access more conversational.

But those are not the conditions that define serious analytics at scale.

The moment the question becomes strategic, ambiguous, cross-functional, or dependent on company-specific definitions, text-to-SQL starts to show its limits.

1. SQL is syntax, not business logic

SQL is a query language. It is excellent at retrieving, filtering, joining, and aggregating structured data in relational systems.

What SQL does not contain is business meaning.

A table can tell you that an order has status = 'completed'. It cannot tell you what “completed” means inside your company. Does it mean payment captured? Order shipped? Delivered? Past the return window? Marked complete because of a back-office workflow introduced during a migration two years ago?

The SQL query can count rows. It cannot validate the business definition behind them.

This is where many text-to-SQL systems quietly fail.

When a user asks, “How many customers did we retain last quarter?”, they are not really asking for a query. They are asking for an interpretation based on agreed business logic.

What counts as retention?

  • A repeat purchase within 90 days?
  • A subscription renewal?
  • Any returning customer, regardless of margin?
  • Only retained customers in a specific cohort?
  • Gross retention or net revenue retention?

The database schema does not answer those questions. Those definitions usually live somewhere else: in an analyst’s head, in a dbt model, in a Notion page, in an internal metric dictionary, or in a debate the leadership team had three months ago.

Text-to-SQL can access tables, but it cannot reliably access the meaning layer behind the business.

That is the core limitation.

2. The real problem is the query gap

Business users do not think in tables and joins. They think in outcomes, risks, segments, and decisions.

A Head of eCommerce asks:
“Which customers are most likely to churn next month?”

A Growth lead asks:
“Are our most loyal customers actually our most profitable?”

A founder asks:
“Did the new retention campaign improve customer quality, or just drive more discounted orders?”

These are not database-native questions. They are business questions.

To answer them well, a system needs to bridge what we can call the query gap: the distance between what the user intends to know and what the database can directly compute.

Text-to-SQL treats that gap as a translation problem. It assumes the user’s natural-language request can be converted into a valid SQL statement that maps cleanly to the schema.

That assumption breaks quickly.

Take the question:
“Are our most loyal customers also our most profitable?”

To answer that correctly, the system needs to know:

  • how “loyal” is defined
  • how “profitable” is defined
  • whether profitability means gross margin, contribution margin, or net profit
  • whether returns, discounts, support costs, and CAC should be included
  • which time window makes the comparison meaningful
  • whether profitability should be measured per order, per customer, or by cohort

A text-to-SQL system will often generate something plausible. That is exactly the danger. The query may be syntactically correct, executable, and even persuasive-looking while still being semantically wrong.

Text-to-SQL reduces friction at the interface layer while leaving the meaning gap unresolved underneath.

That is the real failure mode.

3. Why text-to-SQL hallucinations are structural, not accidental

Large language models hallucinate when the context they receive is insufficiently grounded.

In most text-to-SQL systems, the grounding context is some combination of:

  • table names
  • column names
  • foreign key relationships
  • data types
  • sample rows
  • short schema descriptions

That may be enough to produce valid SQL. It is not enough to produce reliable business analysis.

A schema describes the shape of the data. It does not explain the trustworthiness, historical baggage, caveats, ownership, or intended use of each field.

Imagine a field called customer_value_score.

A model may reasonably infer that this is a trusted measure of customer value. In reality, it could be:

  • a deprecated score from an old CRM
  • only populated for one market
  • recalibrated twice without backfilling
  • based on pre-discount revenue rather than contribution margin
  • ignored internally because the team replaced it with another metric last year

None of that is visible from the schema alone.

So when the model uses that field in a query, the problem is not that it is behaving irrationally. The problem is that the system has asked the model to reason from incomplete semantic context.

You cannot infer business meaning from database syntax alone.

That is why hallucinations in text-to-SQL systems are not just model bugs. They are a structural consequence of trying to infer business meaning from database syntax.

4. Why text-to-SQL gets worse as data complexity grows

Text-to-SQL can look surprisingly impressive on a small schema.

That is one reason the category demos so well.

If a company has a handful of clean tables, relatively stable definitions, and one main product flow, an LLM can often infer enough context to generate useful queries. But real businesses do not stay simple.

As companies grow, their data systems accumulate complexity:

  • more tables
  • more event sources
  • more transformations
  • more historical migrations
  • more duplicated logic
  • more edge cases
  • more internal definitions that differ by team

At that point, prompt engineering stops being the answer.

The issue is not just model quality. The issue is architectural mismatch.

SQL operates at the level of tables, joins, and filters. Business questions operate at the level of customers, products, channels, intent, lifecycle, profitability, and causality. The gap between those two levels grows with every layer of operational complexity.

That is why serious analytics systems eventually converge on some kind of intermediate layer, whether they call it:

  • semantic layer
  • metrics layer
  • business ontology
  • entity model
  • knowledge graph

The name varies. The function does not.

You need a stable representation that sits between raw data structures and business questions. A representation that captures definitions, entities, relationships, caveats, and trusted ways of computing metrics.

Every serious analytics system ends up building an intermediate meaning layer. The only question is whether you design it on purpose or discover it the hard way.

Text-to-SQL skips that layer and tries to jump directly from user intent to raw schema execution. That shortcut becomes less reliable as the company becomes more real.

5. A concrete example: where text-to-SQL gives the wrong answer

Imagine an eCommerce brand asks:

“Who are our most profitable repeat customers?”

A typical text-to-SQL system might do something like this:

  • identify repeat customers as users with more than one order
  • sum revenue per customer
  • sort descending
  • return the top segment

At first glance, that seems reasonable. But the real business question is much richer.

A correct answer may need to account for:

  • refunded orders
  • discounts
  • shipping subsidies
  • support costs
  • acquisition source
  • contribution margin rather than revenue
  • whether the repeat behavior happened recently or two years ago
  • whether one high-value customer is actually an outlier from a paid acquisition campaign that lost money overall

Now the situation changes.

The text-to-SQL system did not fail because it generated invalid SQL. It failed because it optimized for what was available in the schema rather than for the business question the user actually meant.

That distinction matters because the output may still look polished enough to drive action: target this segment, increase ad spend, build lookalikes, prioritize retention.

The danger is not that text-to-SQL always crashes. The danger is that it often returns answers that are coherent enough to be trusted and wrong enough to mislead.

6. “But better metadata fixes this” — only partly

A common response is that text-to-SQL gets much better once you add:

  • richer schema descriptions
  • metric definitions
  • dbt models
  • column documentation
  • approved SQL examples
  • semantic hints in prompts
  • retrieval over internal docs

That is true.

All of those things improve performance.

But they also prove the central point: SQL generation alone is not enough.

The moment you start enriching a text-to-SQL system with trusted definitions, entity mapping, business rules, historical caveats, and curated context, you are already moving away from pure text-to-SQL and toward a semantic architecture.

That is not a minor implementation detail. It is the real system.

The SQL generator is just one possible execution component inside a broader stack that provides meaning, constraints, and explainability.

So yes, metadata helps. Documentation helps. dbt helps. Retrieval helps.

The more context you need to make text-to-SQL reliable, the more you prove that SQL was never the real abstraction in the first place.

None of that strengthens the argument that text-to-SQL is the future of analytics. If anything, it demonstrates that the future belongs to systems with an explicit semantic layer, not systems that rely on raw schema inference.

7. Text-to-SQL vs semantic layer: the real architectural difference

This is the distinction many teams miss.

A text-to-SQL system asks:
“How do I translate this question into a query?”

A semantic system asks:
“What does this question mean in the context of this business, and what is the safest, most explainable way to answer it?”

That difference changes the architecture.

A semantic analytics stack typically includes:

  • resolved business entities such as customers, orders, products, campaigns, and channels
  • metric definitions with clear logic and ownership
  • relationships between entities
  • context on trusted versus deprecated fields
  • rules for time windows, cohort logic, attribution, and exclusions
  • traceability for how an answer was produced

Once that layer exists, an AI agent can reason over business concepts rather than guessing directly from tables and columns.

At that point, SQL may still exist in the execution layer. But it is no longer the central abstraction.

The future is not natural language to SQL. The future is natural language to business reasoning.

That is the key shift.

8. Why SQL is the wrong abstraction for AI agents

The deeper limitation of text-to-SQL is philosophical as much as technical.

SQL was designed for human operators working with structured databases. It is a language for specifying retrieval and transformation logic against known schemas.

That is useful. It is also fundamentally limited as the native interface for AI agents.

An AI agent operating over a business does not just need access to tables. It needs to reason about:

  • entities
  • state
  • dependencies
  • causality
  • business rules
  • ambiguity
  • confidence
  • missing context
  • whether a question is answerable at all

SQL is not designed for that kind of reasoning. It is an execution language, not a business cognition layer.

This is why the most promising direction for AI analytics is not better SQL generation. It is building systems where agents operate over semantically structured representations of the business: knowledge graphs, ontologies, metrics layers, entity systems, and contextual memory.

In that architecture, SQL becomes a backend detail. Useful, sometimes necessary, but no longer the thing intelligence is built around.

What the future of data analysis probably looks like

Large language models will absolutely play a central role in analytics.

But their highest-value role is not generating SQL from prompts.

Their real value is in:

  • understanding ambiguous intent
  • synthesizing findings across multiple data sources
  • explaining results in business language
  • surfacing caveats
  • supporting iterative reasoning
  • helping humans make decisions faster

To do that reliably, they need a foundation richer than a schema dump.

The analytics systems that scale will likely combine:

  • a semantic layer that defines trusted business logic
  • entity resolution across fragmented source systems
  • graph-like representations of business relationships
  • context management and memory
  • execution engines for retrieval and computation
  • agent layers that reason over all of the above

That is much harder to build than a text-to-SQL wrapper.

It requires modeling the business, not just querying the database.

But that is also why it has a chance of working in production.

When text-to-SQL still makes sense

To be clear, text-to-SQL is not useless.

It still makes sense for:

  • analyst copilots
  • lightweight internal exploration
  • simple ad hoc querying
  • narrow domains with clean schemas
  • workflows where human validation is built in

What it does not make sense for is pretending that SQL generation is equivalent to business understanding.

That is the category error at the center of much of today’s AI analytics market.

FAQ

What is text-to-SQL?

Text-to-SQL is the process of converting a natural-language question into a SQL query that can be executed against a structured database.

Why is text-to-SQL unreliable for business analysis?

Because business questions depend on definitions, context, caveats, and entity relationships that usually do not exist in the database schema itself. A query can be syntactically correct and still answer the wrong question.

What is the difference between text-to-SQL and a semantic layer?

Text-to-SQL focuses on translating user questions into database queries. A semantic layer defines what business entities and metrics actually mean, how they should be computed, and which logic is trusted.

Can text-to-SQL work in some cases?

Yes. It can work well for small schemas, low-stakes queries, analyst assistance, and simple exploration where a human can validate the result.

Why do AI data agents need semantic context?

Because reliable analytics requires more than retrieving data. It requires reasoning about business meaning, definitions, relationships, and uncertainty.

The real bottleneck was never query access

For years, the analytics industry treated access as the main problem: make dashboards easier, make SQL easier, make interfaces more conversational.

But query access was never the real bottleneck.

Meaning was.

The hard part of analytics has always been turning fragmented data into a trustworthy representation of the business. Once you understand that, the limitations of text-to-SQL become obvious.

It is not the future of data analysis because it does not solve the hardest part of analysis.

It just makes the wrong layer easier to use.

At Dataverto, we’re moving beyond simple SQL wrappers. We’re building the context layer that actually understands your business. If you’re tired of AI agents that give syntactically correct but business-wrong answers, let’s talk.

Move beyond dashboards

Dataverto tells you what to do next.
So you can grow with speed and focus.