Somewhere in your organisation, there is almost certainly a system that nobody fully understands. It processes payroll, settles transactions, or manages customer records. It has been running for twenty or thirty years. The engineers who built it have retired. The documentation, if it ever existed, is either missing or hopelessly out of date. And yet the business depends on it entirely. For many large UK enterprises — particularly in financial services, insurance, the public sector, and utilities — this is not a hypothetical. It is Tuesday.
The 2025–2026 period is shaping up to be a genuine inflection point. IBM's shifting mainframe support commitments, rising z/OS licensing costs, and a dwindling pool of COBOL expertise are forcing modernisation decisions that organisations have deferred for years. At the same time, AI coding assistants have matured far beyond autocomplete. Tools like GitHub Copilot, Anthropic's Claude, and Amazon CodeWhisperer can now reason about large volumes of unfamiliar code in ways that were simply not possible eighteen months ago. The question is no longer whether AI can help with legacy systems. The more urgent question is how to use it effectively before the pressure becomes a crisis.
The Documentation Problem Is the Real Blocker
Ask any engineering lead who has attempted a legacy migration what slowed them down most, and the answer is rarely the technology. It is the knowledge gap. Nobody knows what a given module actually does in edge cases, why a particular conditional exists, or which downstream systems will break if a data format changes by a single character. Attempting to modernise without that understanding is how you end up with a six-month programme that quietly destroys a business process nobody had mapped.
This is where AI tools are delivering immediate, concrete value right now. By feeding legacy COBOL or Java source code into a large language model — either through an IDE integration like Copilot or directly via a model with a large context window such as Claude — engineering teams can generate structured, human-readable documentation at a pace no manual process could match. A 10,000-line COBOL batch job that would take a senior developer two weeks to annotate manually can be summarised, section by section, in hours. The output is not perfect, and it must be validated by someone with domain knowledge, but it gives teams a working scaffold to interrogate, correct, and build upon. Critically, it transforms tacit knowledge buried in code into something legible to architects, product owners, and the engineers who will eventually replace the system.
Strangler-Fig Migration: AI as an Incremental Co-Pilot
The strangler-fig pattern — incrementally replacing components of a legacy system by routing traffic through a new layer while the old system continues to run — has been the recommended approach to large-scale modernisation for well over a decade. The problem has always been execution. Identifying the right seams to cut, understanding the data contracts between components, and writing the adapter logic that keeps both systems in sync is extraordinarily labour-intensive. Many organisations have started strangler-fig migrations only to stall after the first few modules, leaving them with two systems to maintain instead of one.
AI tools are changing the economics of that execution work in several practical ways. First, they can analyse existing code to identify functional boundaries that may not be obvious from the file structure — surfacing clusters of logic that behave like discrete services even if they were never architected that way. Second, they accelerate the generation of adapter and translation code: the unglamorous but essential plumbing that maps legacy data structures to modern equivalents. Third, and perhaps most valuably, they can assist in writing the characterisation tests that capture what the legacy system actually does today — not what the documentation says it should do — providing a safety net against regression as the migration proceeds. None of this removes the need for skilled engineers. What it does is reduce the ratio of tedious, error-prone manual work to genuinely difficult architectural thinking.
What the Evidence from Early Adopters Shows
Across a number of UK financial services and public sector organisations that have begun structured AI-assisted legacy programmes in the past twelve to eighteen months, several patterns have emerged. Teams that use AI tools solely as code generators tend to be disappointed. The output is syntactically plausible but semantically unreliable when the model lacks sufficient context about the business rules embedded in the legacy code. The organisations seeing the strongest results are those that treat AI as a comprehension and documentation engine first, and a code generation engine second.
One recurring finding is the value of what practitioners are beginning to call 'living documentation' — documentation that is generated incrementally as engineers work through the codebase, stored in version control alongside the source code, and updated as understanding improves. Unlike a one-off documentation sprint, this approach keeps knowledge accumulation continuous. It also creates an artefact that AI tools can themselves query in later stages of the programme, giving the model a richer context from which to generate more reliable outputs. Teams using this approach report significantly fewer surprises during integration testing and faster onboarding of engineers who join the programme mid-flight.
The Risks That Require Human Judgement
It would be irresponsible to discuss AI-assisted legacy modernisation without being clear about where the approach breaks down. Large language models do not understand business logic — they recognise patterns in code and produce statistically plausible descriptions of what that code appears to do. When a COBOL routine contains a business rule that is technically correct but economically counterintuitive — a tax calculation that applies a particular relief only under a combination of conditions that appear nowhere in the code comments — an AI tool will describe the mechanics without flagging the business significance. Domain experts must remain central to the validation process.
There are also important data governance considerations. Feeding production source code into a third-party AI service raises legitimate questions about intellectual property, regulatory obligations, and the risk of sensitive business logic being retained in model training pipelines. Most enterprise AI vendors now offer data processing agreements and deployment options designed to address these concerns, but organisations should conduct a proper assessment before embarking on any programme. Self-hosted or private cloud deployments of capable models are an increasingly viable option for organisations with the most stringent requirements.
If your organisation is facing mainframe retirement decisions in the next twelve to twenty-four months, the time to begin the comprehension and documentation phase is now — not when the decommission deadline is six months away. The organisations that will navigate this wave most successfully are those that start treating AI-assisted documentation as an engineering discipline rather than an experiment: building it into sprint cycles, establishing quality gates for AI-generated output, and accumulating the shared understanding that makes incremental migration feasible.
The goal is not to let AI rewrite your legacy systems. The goal is to use AI to make those systems legible — to your architects, your new engineers, and eventually to the tools that will help you replace them. That shift in framing, from automation to comprehension, is what separates the modernisation programmes that succeed from those that generate expensive stalls. If you are beginning to scope what this looks like for your organisation, the practical starting point is almost always the same: pick one well-bounded system, generate documentation for it rigorously, and learn from what the process reveals before scaling the approach.
Which AI tools are currently best suited to analysing COBOL legacy codebases specifically?
Anthropic's Claude and GPT-4-class models tend to perform best on COBOL due to their larger context windows and broader training on legacy language syntax. GitHub Copilot is more effective for Java and modern languages. For COBOL specifically, several specialist vendors — including Micro Focus (now OpenText) and IBM watsonx — offer AI tooling with dedicated COBOL support built into their modernisation platforms.
How large a codebase can current AI tools realistically process in a single session?
Context window size is the primary constraint. Claude 3's 200,000-token context can accommodate roughly 150,000 lines of code in a single pass, but complex legacy systems often run into millions of lines. In practice, effective programmes decompose the codebase into functional modules and process them iteratively, building documentation incrementally rather than attempting to analyse the entire system at once.
Is it safe to upload production source code to cloud-based AI services like Copilot or Claude?
It depends on your organisation's data classification policies, sector-specific regulations, and the contractual terms of the service. Most enterprise tiers of these tools include data processing agreements that prevent code from being used in model training. Organisations in regulated sectors should review these terms carefully and consider whether a private deployment — via Azure OpenAI Service or a self-hosted model — is more appropriate.
What is the typical cost of an AI-assisted legacy documentation programme for a large enterprise?
Costs vary significantly depending on codebase size, the level of domain-expert validation required, and whether specialist tooling is procured. A focused programme covering a single mission-critical system might run from £150,000 to £400,000 over three to six months, inclusive of engineering time. This is substantially less than a full rewrite, and the documentation produced has value beyond the immediate migration programme.
How do you handle legacy systems where the business rules are undocumented and no subject-matter experts remain?
This is one of the most challenging scenarios. The recommended approach is to use AI to generate characterisation tests — test cases that capture what the system demonstrably does with known inputs — and to cross-reference outputs with downstream systems and historical data. Regulatory filings and audit records can sometimes be used to reconstruct business rules retrospectively. In extreme cases, organisations accept a degree of specification uncertainty and invest heavily in monitoring and rollback capability during migration.
Can AI tools help identify hidden dependencies between legacy components that aren't obvious from the code structure?
To a meaningful extent, yes. Static analysis combined with AI-assisted code comprehension can surface implicit dependencies — shared database tables, undocumented file format assumptions, or shared in-memory state — that manual review might miss, particularly in very large codebases. However, runtime dependencies triggered only by specific data conditions may not be detectable through code analysis alone and require dynamic analysis or production traffic monitoring to identify fully.
What skills should an engineering team develop internally before starting an AI-assisted legacy programme?
Teams benefit most from developing prompt engineering skills specific to code comprehension tasks, familiarity with the AI tool's context window limitations and how to manage them, and strong practices around validating and version-controlling AI-generated documentation. Domain knowledge of the legacy system's business context remains irreplaceable — AI augments that expertise but cannot substitute for it.
How does AI-assisted modernisation affect the business case compared to a traditional lift-and-shift or full rewrite approach?
AI-assisted incremental migration typically offers a stronger risk-adjusted business case than a full rewrite because it reduces the period during which the business is exposed to a 'big bang' cutover risk. The documentation phase also reduces unknowns before significant re-engineering investment is committed, allowing better-informed scope and cost estimation. Lift-and-shift (running COBOL on cloud infrastructure) reduces infrastructure risk but does not address the long-term skills and maintainability problem.
How long does a full AI-assisted strangler-fig migration of a mainframe system typically take?
For a large enterprise mainframe with multiple interconnected systems, realistic programmes run three to seven years when pursued incrementally and safely. The AI tooling compresses specific phases — documentation, characterisation testing, adapter code generation — but does not eliminate the need for careful architectural sequencing, business validation, and parallel running periods. Organisations should treat timelines of under two years for complex mainframe estates with scepticism.
Are there UK public sector-specific considerations when using AI tools for legacy code analysis?
Yes. Public sector organisations must assess AI tool usage against the UK Government's AI procurement guidance, the Data Protection Act 2018, and any sector-specific frameworks such as NHS DSPT or Cabinet Office security classifications. Contracts with AI vendors should be reviewed against Crown Commercial Service terms where applicable. The Centre for Digital Public Services in Wales and CDDO in England have both published relevant guidance on responsible AI adoption in government contexts.
Get in touch today
Book a call at a time to suit you, or fill out our enquiry form or get in touch using the contact details below