AI assurance evidence for a global fintech

Context

A global fintech with a growing AI estate faced a familiar problem. The controls were real: scoped workload identities, a gateway brokering all model calls, guardrails on the two highest-risk copilots, and runtime containment limits. What did not exist was a coherent record that any of it operated. The internal audit team had asked a straightforward question during planning: for each AI control, show us design evidence, show us that it ran, and show us who reviewed it. The answer was three weeks of manual reconstruction work and a gap register that nobody had maintained.

The CISO wanted the audit result to be different next time. The goal was a standing assurance map, built to produce evidence continuously rather than retrospectively, so the next internal audit, external assessment, or regulator query could be answered from the map rather than from a project.

Risk

Evidence assembled after the fact. Logs existed but were uncurated. Policy documents existed but were not linked to the systems that ran them. Oversight minutes existed in email threads but were not indexed. An auditor sampling any single control had to rely on the team’s assertion rather than independently testable evidence.
Inconsistent evidence depth. High-risk decisioning copilots and low-risk internal summarisers were documented to the same shallow depth, meaning the audit could not distinguish the controls the firm had invested in from those it had not.
No gap register. Controls with no evidence were invisible until an auditor found them. There was no standing list of controls awaiting evidence, which meant gaps were discovered under scrutiny rather than managed in advance.

Engagement

We started from the AI use-case inventory the governance function already maintained. For each use case we asked one question of every control protecting it: what does this control leave behind when it runs?

The answer sorted controls into three evidence types: design evidence, meaning the control exists and is specified in policy or configuration; operating evidence, meaning the control ran and produced a record; and oversight evidence, meaning a named person with authority reviewed it. Any control that could not produce all three was a gap, logged in a register before the audit found it.

We then built the assurance map in four moves:

Tied evidence depth to risk tier. High-risk decisioning copilots got a full evidence trail with quarterly oversight cadence. Low-risk internal tools got a lighter treatment. This meant audit attention landed where the risk was, not uniformly across the estate.
Preferred machine-generated evidence. Wherever a control already emitted a log, a signed configuration, or a gate result, we pointed the map at that artefact rather than a human-written description. Generated evidence is harder to fake and cheaper to keep current.
Tagged each control with its enforcement placement. For each control we recorded whether it was enforced at the AI gateway, the agent runtime, an MCP connector, or an endpoint. This made the evidence location unambiguous, so an auditor sampling gateway controls knew exactly where to look, and an auditor sampling agent-runtime controls knew the gateway logs would not contain what they needed.
Set a refresh cadence. Machine-generated evidence refreshed continuously. Design and oversight evidence refreshed quarterly or on material change. Stale evidence was treated as a finding in the gap register, not a footnote.

In the three weeks between engagement start and the internal audit, we closed fourteen gaps in the gap register, leaving four accepted residual items with documented rationale. The audit team received a map, not a binder.

Outcome

Assembled a testable AI assurance map covering all fourteen production AI use cases within three weeks, before the scheduled internal audit examination.
Closed 14 of 18 identified evidence gaps during the engagement; the remaining 4 were accepted as residual risk with documented rationale and owner.
Internal audit completed its AI controls examination without requesting any retrospective evidence reconstruction, the first clean AI audit cycle the CISO team had experienced.
The map was used three months later to answer a regulatory information request in under two working days, drawing directly from indexed evidence rather than from a new project.

In previous cycles, every AI control question produced a three-week side project. This time the team handed us a map with evidence locations and a gap register. We could sample independently and reach our own conclusions. That is exactly what a third line should be able to do.

Head of Internal Audit, global fintech (anonymised)

For the broader regulatory context, including how AI assurance evidence maps to DORA examination requirements and regulator-query readiness, see DORA readiness for fintech.

Related case studies

a regulated UK utility · 2024

Resilience for a regulated UK utility

a global fintech · 2024

AI security guardrails for a global fintech

Next step

Working on something similar?

We'll diagnose the shape of your problem in a 30-minute call. No proposals, no pitching.

Book a discovery call