Summarize with AI

Not enough time? get the key points instantly.

You walk into a data warehouse modernization with two assumptions. Better tools have improved the odds. AI pressure justifies the investment. Both are partially true. Neither protects the project.

70% of these projects still fail or run over. More than two-thirds of large-scale technology programs still fail on time, budget, or scope. If GenAI-assisted conversion tools were moving the needle, that number would have dropped as adoption spread. It hasn’t.

Forrester’s Q1 2025 assessment of application modernization services called the impact of GenAI accelerators on migration outcomes “muted,” noting that customers “seldom see much value” in supplier-branded AI platforms.

The failure mode was never code translation speed. It was everywhere else in the project, and AI doesn’t touch most of it.

Stay updated with Simform’s weekly insights.

GenAI code translators accelerate the part of the project that wasn’t slowing it down

The tools driving most of the conversation right now are GenAI code translators like GitHub Copilot applied to migration work, SQLGlot, and raw LLM-based converters.

They take queries, stored procedures, and ETL jobs written for Teradata, Netezza, or SQL Server and produce cloud-native equivalents for Snowflake, BigQuery, or Azure Synapse in minutes.

That acceleration is real for the narrow task they do. But they handle one phase of the project, code translation, and the failure rate of data warehouse modernization isn’t driven by that phase.

Microsoft Research’s own VLDB paper on its Horizon translation system documented the architectural answer plainly. LLMs should “augment rule-based tools and address gaps, rather than serving as full replacements due to cost, latency, and reliability concerns.

The structural point matters. A full data migration involves discovery, validation, governance, and interpretation of business context. A code translator handles none of those. Treating one as a migration platform is a category error, and that category error is where the overrun starts.

Three silent failures drive the overrun

The wrong answer that passes every test

The most dangerous failures in code-translator-led data migrations are quiet. A conversion compiles, runs, and returns wrong numbers.

A peer-reviewed study at NeurIPS 2025 tested LLM-based SQL translation across 22 production database systems and found an average accuracy of 38.5% or lower.

A business user notices months later that quarterly reports don’t tie out. By then, the wrong answer has been in production for weeks, and the validation work the tool was supposed to reduce has become the biggest line item in the project.

Source systems don’t hold still during migration

Some teams’ data migration plans assume that the systems being migrated away from will remain stable while the work is underway. They don’t.

Source systems keep running the business during a 12- to 24-month migration. Business owners add fields, change data types, adjust schemas, and deprecate tables. Each change can invalidate the work the migration team has already completed.

Schema drift is one of the least-measured failure modes in enterprise data work. There is no widely cited benchmark for how often source systems change during a data migration, or for what share of completed work gets invalidated as a result.

Mid-market teams are planning data migrations against baselines that are moving faster than the industry is tracking, which means the risk shows up in the budget overrun without ever appearing in the project plan.

Deferred governance becomes breach cost

Governance work that gets deferred during migration doesn’t stay deferred. It becomes breach cost.

Migrations create shadow data by design. Records get copied into new environments for validation. Old ETL jobs keep running against legacy stores while new pipelines are tested. For months, the same customer records have existed in multiple places with different access controls and different classification maturity.

IBM’s report found that 35% of breaches involve shadow data, and breaches involving data across multiple environments averaged over $5 million. Most migration teams have no systematic way to trace where sensitive data went during the move, and when the audit comes, they rebuild that picture manually under time pressure.

What actually moves the failure rate

The 30% of projects that succeed don’t use better tools. They make three structural decisions; the other 70% skip.

First, the migration team continuously checks every converted query and job against the original system. If a translated query returns a number that doesn’t match the old system’s output, the team catches it within days. Most projects run this check only at cutover, when there’s no time left to fix anything.

Second, someone tracks every change made by the source systems during the migration. Business owners keep adding fields, changing data types, and retiring tables while the migration is underway. A system that flags these changes in real time keeps the migration team from discovering them at cutover, when rework is most expensive.

Third, data classification, masking, and access controls are written into the plan from the beginning. They get built as the migration happens instead of becoming an audit scramble at the end.

This is the architecture Microsoft Research described. Rule-based systems augmented by LLMs, wrapped in validation and governance, with humans making the judgment calls where context matters.

Simform’s TrueMorph accelerator is built around exactly this structure, with self-healing validation, anomaly detection, PHI/PII masking from day one, and human-in-the-loop review for business logic.

Stay updated with Simform’s weekly insights.

Hiren is CTO at Simform with an extensive experience in helping enterprises and startups streamline their business performance through data-driven innovation.

Sign up for the free Newsletter

For exclusive strategies not found on the blog

Revisit consent button
How we use your personal information

We do not collect any information about users, except for the information contained in cookies. We store cookies on your device, including mobile device, as per your preferences set on our cookie consent manager. Cookies are used to make the website work as intended and to provide a more personalized web experience. By selecting ‘Required cookies only’, you are requesting Simform not to sell or share your personal information. However, you can choose to reject certain types of cookies, which may impact your experience of the website and the personalized experience we are able to offer. We use cookies to analyze the website traffic and differentiate between bots and real humans. We also disclose information about your use of our site with our social media, advertising and analytics partners. Additional details are available in our Privacy Policy.

Required cookies Always Active

These cookies are necessary for the website to function and cannot be turned off.

Optional cookies

Under the California Consumer Privacy Act, you may choose to opt-out of the optional cookies. These optional cookies include analytics cookies, performance and functionality cookies, and targeting cookies.

Analytics cookies

Analytics cookies help us understand the traffic source and user behavior, for example the pages they visit, how long they stay on a specific page, etc.

Performance cookies

Performance cookies collect information about how our website performs, for example,page responsiveness, loading times, and any technical issues encountered so that we can optimize the speed and performance of our website.

Targeting cookies

Targeting cookies enable us to build a profile of your interests and show you personalized ads. If you opt out, we will share your personal information to any third parties.