Why LLMs Struggle with Text-to-SQL and How to Fix It

Large language models (LLMs) are capable of generating SQL queries based on natural language input, thanks to training on public and open-source examples. While they understand SQL patterns, generating queries that actually run against your specific data is another matter. They require more than just language comprehension; they need context.

Text-to-SQL is the process of turning natural language into valid SQL queries. For it to work in practice, LLMs need three kinds of context: knowledge of your schema, clarity on business terminology, and examples of real-world usage patterns.

At Select Star, we work closely with data teams to tackle these challenges using proven metadata and governance strategies. This post outlines four ways to provide that context: prompt engineering, fine-tuning, retrieval pipelines (RAG), and MCP servers with AI agents. This post will also offer guidance on when and how to use each approach.

Why Text-to-SQL Is Hard

In practice, text-to-SQL means converting a natural language question into a valid SQL query that runs correctly on your data warehouse. While the concept sounds simple, making it work reliably in production is hard. LLMs need three types of context to succeed: schema context, business context, and usage context.

Schema context. Without table and column names, data types, and relationships, models produce SQL that fails to execute.
Business context. Terms like “revenue” or “active user” have specific definitions that LLMs won’t know by default.
Usage context. Without examples of successful queries, the model can’t learn what works in your environment.

Text-to-SQL Techniques with LLMs

You can provide data context to LLMs through four different techniques: prompt engineering, LLM fine-tuning, RAG, and MCP servers with AI Agents.

Technique	Setup Effort	Flexibility	Best For
Prompt Engineering	Low	Low	Prototyping and early testing
LLM Fine-Tuning	High	Medium	Stable schemas and narrow domains
RAG Pipelines	Medium	Medium-High	Dynamic metadata retrieval
MCP + AI Agents	Medium	High	Scalable, context-aware systems

Text-to-SQL with Prompt Engineering

The simplest approach is to inject context directly into the prompt. This could include schema documentation, DDL statements, example queries, or even specific syntax rules. Prompt engineering works well for prototyping or early-stage experiments, especially when using tools like OpenAI or Claude in a notebook or custom UI.

But manually maintaining prompt context doesn't scale. Each query needs to be wrapped with the right metadata, and prompt length limits eventually become a constraint. For most production use cases, you’ll want a more dynamic method.

Example prompt:

Using Snowflake SQL, write a query that returns the top 10 customers by revenue. Schema: orders(order_id, customer_id, order_total, created_at), customers(customer_id, customer_name)

Pros:

Easy to prototype
No infrastructure needed

Cons:

Manual and brittle
Doesn’t scale across users or domains

Text-to-SQL with LLM Fine-Tuning

Fine-tuning gives the model more built-in context by training it on your data definitions, example queries, and documentation. This can improve performance for a specific schema or domain.

But fine-tuning is complex. It requires a high-quality training dataset, GPU infrastructure, and regular retraining as your models and tables evolve. For most teams, it’s a heavy lift without long-term flexibility.

Pros:

Customizes the model to your data
Reduces need for long prompts

Cons:

Expensive and time-consuming
Requires retraining to stay current

Example tools: Open-source model forks or APIs that support custom training. Limited production adoption today.

Text-to-SQL with RAG Pipelines

Retrieval-augmented generation (RAG) adds context dynamically. At query time, the model retrieves relevant metadata or example queries from a vector database. This enables the LLM to reason over fresh context without embedding everything in the prompt.

A typical RAG setup has two parts:

An offline pipeline to index metadata, documentation, and queries into a vector store.
An online pipeline that retrieves and injects context into the prompt at runtime.

Basic RAG pipelines often struggle with relevance. To get better results, you’ll need reranking, query rewriting, and thoughtful chunking strategies.

Pros:

Context is dynamic and up-to-date
Works with most hosted or open LLMs

Cons:

Relevance can be poor without tuning
Still limited by prompt size

Example tools: Vanna AI (Python-based), open-source frameworks that integrate LLMs with vector databases.

Text-to-SQL with MCP Servers and AI Agents

MCP (Model Context Protocol) is a protocol for MCP clients to retrieve data and trigger actions through MCP servers. You can think of MCP as an API specification for LLMs and AI agents to communicate with other tools. LLM platforms like Claude and Cursor implement MCP clients. Some databases and data catalogs have implemented MCP servers to provide data and metadata to LLMs and AI agents. This data and metadata can serve as context for text-to-SQL requests.

Data MCP servers like Select Star MCP server provide data context as the retrieval pipeline does for RAG. There is no need for an offline pipeline to reindex data after each data change. AI agents with MCP clients can pull the latest data from MCP servers through standardized protocols, just like APIs. The advantage of using AI agents together with MCP servers is that AI agents can iteratively reason about the quality of the SQL, and even execute the query before returning the SQL to the user.

MCP servers paired with AI agents offer a scalable and adaptive architecture for text-to-SQL. You don’t have to rebuild or fine-tune anything because the model pulls context on demand from your data stack.

Pros:

Real-time access to relevant context
No retraining or manual prompt maintenance

Cons:

Requires MCP-compatible tools or agents
Effectiveness depends on agent reasoning quality

Example tools with MCP support: Data warehouses available through MCP include: Snowflake, Databricks, and BigQuery. Data catalogs with MCP support include Select Star, a modern data governance platform with an automated data catalog, column-level lineage and semantic model generation for humans and AI.

Select Star MCP Server: Data Context for AI

Select Star MCP Server provides data context for AI code editors like Cursor IDE and Claude Code, AI assistants like Claude Desktop, and other MCP-enabled LLMs directly from your data catalog, without having to manually or programmatically curate metadata from your existing data stack. It implements four MCP tools that can be called by AI agents and LLM to retrieve data context: exposing structured metadata, usage patterns, and lineage that LLMs and agents can use to generate better, safer SQL.

search_metadata – full-text search across your data catalog
get_asset_details – retrieves comprehensive information about an asset, including popularity and documentation
get_table_info – provides detailed table schema and usage stats
traverse_lineage – traces upstream and downstream relationships and dependencies

When working with LLMs, access to metadata is critical but maintaining it manually is resource-intensive and error-prone. Rather than stitching together documentation, lineage, popularity stats, and query examples yourself, it’s far more efficient to rely on a data catalog that already indexes and updates this context automatically. Select Star serves as a centralized metadata context engine that is machine-readable and accessible via API, making it easy to plug into any AI agent or LLM workflow without reinventing the wheel.

The Future of Text-to-SQL

As natural language interfaces become more common in data tools from SQL editors to AI copilots, text-to-SQL will play a growing role in analytics. But getting accurate results requires more than just a powerful model. It depends on reliable context.

Whether you're experimenting with prompt engineering or building out agent workflows, having clean, well-documented metadata and usage signals is key. Select Star helps ensure that data context is always available to your LLMs and AI agents, no manual work required.

Ready to build smarter, safer AI experiences on your data? Request a demo.

Frequently asked Questions on Text-to-SQL with LLMs

What is text-to-SQL?

Text-to-SQL is the process of converting a natural language question into a valid SQL query. It allows users to query databases using plain English instead of writing code.

Why do LLMs struggle with text-to-SQL?

LLMs lack direct access to your data warehouse. Without metadata, business definitions, or example queries, they often generate SQL that is incorrect or unsafe.

What types of context do LLMs need for accurate SQL generation?

LLMs need schema context (tables, columns, relationships), business context (definitions of terms like “revenue”), and usage context (examples of successful queries).

What is the best way to provide context to an LLM for text-to-SQL?

You can provide context through prompt engineering, fine-tuning, RAG pipelines, or using an MCP server with AI agents. Each method has trade-offs in complexity and scalability.

How does Select Star help with text-to-SQL accuracy?

Select Star automates metadata discovery and management, exposing schema, lineage, and usage context via MCP endpoints that LLMs and AI agents can query in real time.

Building Semantic Data Models: From BI to AI

Learn More

dbt Coalesce 2025 Highlights: dbt + Fivetran Merger, Open Data Infrastructure, dbt Fusion and MCP Server

Learn More

Scaling dbt Docs with an Automated Data Catalog

Learn More

Why LLMs Struggle with Text-to-SQL & How to Fix It

Why Text-to-SQL Is Hard