The Safety Paradox: Why Frozen Models Aren’t Always Safer Than Agentic Ones

Apr 23

Picture this: A pharma company deploys an AI-assisted deviation triage model. They freeze the weights, version-lock the prompts, pin the RAG corpus, and validate it under GAMP. The validation team signs off. The monitoring dashboard shows no drift. Eighteen months pass. Everyone relaxes.

Now picture this: A second company deploys an agentic AI workflow for pharmacovigilance signal detection. It chains decisions across multiple models, pulls from live data sources, and acts semi-autonomously. The governance architecture assumes continuous change: so the team builds real-time integrity checks, prompt-layer monitoring, privilege boundaries, and automated anomaly detection into the design from day one.

Which system is more secure? The intuitive answer is A. The correct answer is: it depends on which dimension of risk you're measuring — and right now, most organizations are only measuring one.

Frozen or Agentic?

The pharma industry's current mental model: frozen = conservative = safe; agentic = autonomous = dangerous. This is a logical assumption; it’s built on decades of deterministic validation where locking a system's state was genuinely protective (locked code, locked config, locked database = validated state preserved). Why it transferred: when organizations began deploying AI, they applied the same logic: freeze everything, validate the frozen state, maintain it. CSV taught us that controlling change is how you control risk. For deterministic systems, that was correct. For probabilistic systems, it's partially correct: and the partial is where the danger lives.

Freezing a deterministic system preserves its validated state. Freezing a probabilistic system may only preserve the appearance of one.

The Two-Dimensional Problem

At minimum, two dimensions matter from a validation/quality standpoint:

Dimension 1: Validation integrity (consistency, reproducibility, drift control)

Frozen architectures are stronger than dynamic ones when measured in this dimension. Locked weights don't drift. Pinned prompts produce more consistent outputs. Version-controlled RAG sources don't introduce new information. This is real and shouldn't be dismissed.

Agentic systems are genuinely harder to validate on this dimension, by design. They're dynamic, and the validation challenge is continuous rather than point-in-time. The psychological shift here is thinking about validation as a project with a definitive endpoint to a lifecycle project, requiring intentional controls and monitoring. As a field, we are already undergoing a version of this shift, moving from a strict V-model mindset to a risk-based approach.

The final revised ICH Q9 (R1) guideline was endorsed by the ICH Assembly and regulatory agencies on January 18, 2023, and became effective on 26 July 2023. The FDA published its availability in the Federal Register on May 4, 2023. In parallel, the FDA formalized its Computer Software Assurance (CSA) for Production and Quality Management System Software guidance, finalized September 24, 2025.

The next frontier as an industry is for us to extend these frameworks to more complex machine learning and probabilistic applications, such as Large Language Models (LLMS) and Agentic AI.

Dimension 2: Adversarial resilience (security posture, attack surface awareness, compromise detection)

Frozen architectures are actually weaker here. A frozen system that passes validation creates organizational complacency: "it's validated, it's locked, we don't need to keep watching." The static state becomes a stable target for attackers. The prompt layer, the RAG corpus, the model weights sit in databases and storage that are still live infrastructure, still exposed, still being probed.

Agentic systems, paradoxically, may be stronger here; because the governance model assumes dynamism, it builds continuous monitoring, integrity checks, and anomaly detection into the design by necessity. You can't govern an agentic system with a point-in-time assessment, so you don't try. The watching is built in by necessity. The safest-looking system on paper may be the most dangerous in production; not because it failed, but because no one was looking when it was compromised.

Case in point:

McKinsey launched the AI platform Lilli in 2023, naming it after Lillian Dombrowski, the firm’s first professional female hire in 1945. Three years later, a hacker named CodeWall deployed an autonomous AI agent to identify and exploit vulnerabilities found in Lilli. Within two hours, CodeWall's agent had achieved full read and write access to Lilli's production database: 46.5 million chat messages, 3.68 million RAG document chunks, and write access to 95 system prompts.

If McKinsey, considered by many to be the most prestigious consulting firm in the world, could be systematically infiltrated by an AI agent within two hours, what surfaces in our industry are vulnerable? In the regulated life sciences industry, our patients and shareholders’ trust both depend on robust security infrastructure.

On March 11, 2026, McKinsey confirmed the vulnerability in the following statement posted to their website: “McKinsey was recently alerted to a vulnerability related to our internal AI tool, Lilli, by a security researcher. We promptly confirmed the vulnerability and fixed the issue within hours. Our investigation, supported by a leading third-party forensics firm, identified no evidence that client data or client confidential information were accessed by this researcher or any other unauthorized third party. McKinsey’s cybersecurity systems are robust, and we have no higher priority than the protection of client data and information that we have been entrusted with.”

On March 31st, 2026, CodeWall claimed they also hacked BCG’s data warehouse. The accessed account is claimed to have held full write privileges, meaning the attacker could not just read the data, but silently alter it.

On April 13th, 2026, the same group claimed their autonomous agent had also been able to view nearly ten thousand conversations held with the third of the Big Three, Bain’s, internal AI chatbot for employees, named Pyxis. CodeWall claimed that it was able to access some data within Pyxis. As more agentic AI use cases are deployed within organizations, surface area vulnerability will only increase.

Imagine the data warehouse was a clinical trial repository instead of competitive intelligence. The consequences may not just be exposure, but an inexplicably failed clinical trial, or even patient stratification inaccuracies.

Stability vs. Adaptability

A static model with locked prompts stored in a database is itself a form of vulnerability: the predictability gives an attacker a stable, predictable target. They know what they're compromising and what effect their modification will have. The attack surface doesn't change, so the attacker has unlimited time to probe it.

Five of the top 8 OWASP LLM Top 10 risks (prompt injection, supply chain and data/model poisoning, system prompt leakage, and vector/embedding weakness) cannot be detected via one-time validation. A filed validation package isn't evidence your system is safe. It's evidence your system was safe at one moment, eighteen months ago.

What Agentic Governance Gets Right (By Necessity)

Agentic systems can't be governed with a point-in-time validation. This is widely understood as a disadvantage, but it has a reframe: because of continuous monitoring, there are controls in place to manage the overall risk of the model’s performance for its context of use.

Because agentic governance assumes continuous change, the monitoring architecture is continuous by design: real-time integrity checks, privilege boundaries, action scope controls, prompt-layer monitoring, automated anomaly detection. This continuous monitoring infrastructure catches adversarial manipulation as a side effect of catching legitimate drift; the detection surface is always active. The HITL/HOTL controls required for agentic systems (similar to those presented in my piece Untangling the Web: HOTL for Agentic PV) also serve as security controls: human review of autonomous actions creates checkpoints that a silently compromised frozen system doesn't have. Continuous change control means the security posture is reassessed with every change, not once at initial validation.

Agentic AI isn't harder to govern because it changes. It's harder to compromise silently: because someone is always watching it change. This is not an argument for agentic AI over frozen architectures broadly speaking; rather, this is an argument that the risk profiles for both are different, and that the validation approach should consider this context.

The Risk Topology (Both-And)

The question isn't "frozen or agentic": it's "which risks are you willing to accept and which controls address each dimension?"

Read across each row: neither column is all green.

What This Means for Your Governance Architecture

Professor Renato Cuocolo of the University of Salerno, citing the Clusmann work, explicitly framed the problem: "Once the model has been poisoned, we cannot just go and excise the poisoned data after the fact. We need to retrain the model from scratch, reimplement it from scratch, and validate again. Obviously, this has an order-of-magnitude higher cost compared to traditional software, which can just be straightforwardly patched.”

For a validated pharmaceutical AI, this means a successful poisoning attack is not just a data breach: it is a revalidation event, which is itself a regulatory inflection.

Palo Alto Networks' "Securing Agentic AI: Where MLSecOps Meets DevSecOps" explicitly articulates the required convergence: "MLSecOps teams concentrate on AI-supply chain security including machine learning models, training data validation and AI-specific risks…For agentic AI, these parallel tracks must converge into an integrated security approach" with "unified threat modeling that considers both AI and software attack vectors" and "comprehensive security testing that evaluates the entire system, not just its components.”

This is recognition that the security system of the house must apply to all systems, whether frozen or agentic. Both architectures need continuous security monitoring; the difference is that agentic governance models build it in by default and frozen ones often don't.

A frozen generative AI system still needs prompt injection testing. A frozen RAG system still needs corpus integrity monitoring. This maps directly to the FDA-EMA Good AI Practice Principle 3: the call for adherence to regulatory standards (in this context, cybersecurity) isn't architecture-dependent. It applies to everything.

Three operational implications:

Frozen systems need a layer of continuous security monitoring just as agentic ones do. The validation sign-off doesn't protect the infrastructure layer.
The "point-in-time" validation model needs a security reassessment cadence, not just a performance monitoring cadence.
Organizations choosing between frozen and agentic architectures should assess the risk topology across all dimensions, not default to "freeze everything" as the conservative choice.

The frozen system that everyone assumed was safe had no one watching when a prompt-layer compromise went undetected for months. The agentic system that everyone assumed was risky had continuous monitoring that caught an integrity anomaly within hours.

Safety isn't a property of the architecture. It's a property of the governance around it.

Kayla Britt