Model poisoning defence for analytics

Context

A big-data analytics provider had built a production model pipeline that ingested labelled training data from several internal teams and an external data-enrichment partner. The models produced segment scores used by the business’s own clients for targeting and prioritisation decisions. When the security team ran a routine review of the pipeline before a planned model update, they found something they could not explain: a cluster of training records that, on closer inspection, had been modified in a way that appeared designed to bias the model’s output towards a specific scoring pattern.

The provider’s CTO recognised that this was not a data-quality problem. It was a write-path integrity problem. Someone, or some process, had introduced records crafted to alter the model’s latent behaviour. The concern was not only the batch they had found. The concern was how long the pipeline had been running without the controls to detect this class of insertion, and whether other batches were already in the weights of a model in production.

Risk

Durable, latent model manipulation. Unlike a live injection that acts on a single request, a poisoned training record changes the model’s behaviour across every inference from the moment of training. The effect is persistent, covert, and potentially active for months before detection. The provider’s clients would be making decisions on scores the attacker had shaped.
Open write-path into a critical data store. The enrichment partner’s data feed was ingested with minimal provenance controls. The pipeline validated schema and volume but did not verify that individual records matched the statistical distribution expected from that source. The write path was effectively open to anyone who could compromise or spoof the feed.
Hard-to-test-for trigger conditions. A backdoor planted via training data can be dormant under ordinary evaluation. The standard model-evaluation suite ran on held-out data from the same distribution and would not surface behaviour conditioned on a rare or specific trigger pattern. The team’s existing quality gates gave no signal that anything was wrong.

Engagement

We structured the engagement around the three questions that matter for a write-path integrity failure.

Pipeline audit and write-path mapping. We enumerated every channel that could place records into the training corpus, including internal team uploads, the enrichment partner feed, and any automated data-quality correction jobs that wrote back to the store. For each channel we asked what provenance controls existed and what an attacker who controlled that channel could insert. The enrichment feed had no cryptographic signing, no statistical drift detection, and no record-level provenance log. That was the primary vector.
Corpus forensic review. Working with the data science team, we applied distribution analysis and outlier detection to the training batches to find records that differed from the expected statistical signature of their labelled source. The suspect cluster identified in the initial review was confirmed to be statistically inconsistent with other records from the same origin. We extended the analysis to earlier batches and found one additional suspect segment in a batch from several months prior.
Pipeline hardening and gating. We designed and helped implement ingestion controls covering: statistical drift detection on each incoming batch before it was committed to the corpus, record-level provenance tagging tied to a signed source identity, a quarantine path for records that failed the drift gate, and a build-time integrity check that blocked model training if any training-data record lacked a valid provenance signature. The enrichment partner was notified of the findings and a signed-feed requirement was written into the data-sharing agreement.

Before the next training run, every training record in scope had a verified provenance tag. Records from the suspect batches were quarantined and excluded.

Outcome

Detected and quarantined poisoned training data before it influenced the next model version, with suspect records isolated from both the training corpus and any model already in production. - Identified a second contaminated batch from several months earlier, allowing the team to assess and document which production model versions may have trained on it. - Reduced the write-path exposure from an open, schema-only-validated feed to a signed, drift-gated ingestion pipeline with a quarantine path. - Delivered a provenance control that the CTO could use with enterprise clients as evidence that training data integrity was actively governed, not assumed.

We had strong controls on inference. We had almost none on the data that shaped the model before inference. Salvador Cloud found the gap, found evidence it had been used, and left us with a pipeline where we can actually answer the question: where did this training record come from?

CTO, big-data analytics provider (anonymised)

Corpus integrity is often the last control organisations invest in because the attack is covert and slow-moving, unlike an injection that fires on the request that carries it. A poisoned corpus document or training record sits dormant until it is retrieved or learned from, then shapes behaviour across every use of the model that follows. The threat and the controls that answer it sit at the core of Agentic AI and MCP security.

Related case studies

a UK energy market operator · 2018

ISO 27001 for a UK energy market operator

an APAC crypto custody provider · 2021

Cloud security architecture for an APAC crypto custody provider

Next step

Working on something similar?

We'll diagnose the shape of your problem in a 30-minute call. No proposals, no pitching.

Book a discovery call