Why AI Governance Fails Without a Control Layer: A House-of-Trust Model for Regulated Drug Development
Regulators have made something clear.
Organizations deploying AI in drug development must demonstrate that these systems are:
fit for purpose
risk-appropriate
governed across their lifecycle
What regulators deliberately did not specify is how that evidence should be generated.
That gap is where validation architecture lives.
Validation Architecture
The diagram below maps the structure behind the work published so far. Each layer represents a different component required to move from AI experimentation to inspection-ready systems.
Layer 1: AI Regulation
AI validation does not exist in a vacuum. It emerges from signals across the regulatory and scientific landscape.
Several developments over the past year illustrate this shift:
“Artificial Intelligence and Medicinal Products” (March 2024, updated February 2025)
The EU AI Act (June 2024)
EMA Reflection Paper AI in the Medicinal Product Lifecycle (September 9, 2024)
FDA Draft Guidance - AI in Drug Regulatory Decision-Making (January 7, 2025)
EMA First AI Qualification Opinion (March 2025)
FDA “Elsa” Launch (June 2025)
GAMP Artificial Intelligence Guide (July 2025)
FDA internal deployment of agentic AI (December 1, 2025)
CIOMS WG XIV Final report on AI in Pharmacovigilance (December 4, 2025)
FDA–EMA Good AI Practice Principles (February X, 2026)
(Insert Infographic)
Layer 2: Framework
Once the signal is clear, the next question becomes architectural:
How should AI systems actually be validated?
The core framework I propose follows a lifecycle structure (the Britt Biocomputing Probabilistic Validation Lifecycle)
CoU → Risk → Evaluation Design → Acceptance Criteria → HITL → Monitoring → Change Control
Each step ensures that validation evidence reflects context of use and scientific risk, not generic benchmarks.
This is a validation workflow, but per the FDA/EMA Good AI Practice Principles, multidisciplinary expertise is required, and an AI Validation Architect should work with relevant stakeholders, including domain specialists, Digital, Data Science, Quality, and Regulatory groups.
Layer 3: Operational Controls
Architecture alone is not enough. Regulators do not audit concepts; they audit controls.
Operational controls transform validation architecture into real-world, practical governance mechanisms.
These include:
human oversight structures
drift monitoring
lifecycle validation plans
inspection-ready documentation
https://www.kaylabritt.com/blog-1-1/data-drift-a-risk-based-and-gamp-aligned-approach
https://www.kaylabritt.com/blog-1-1/human-in-the-loop-liability-still-in-play
https://www.kaylabritt.com/blog-1-1/ai-trust-is-not-a-feeling-its-a-validation-strategy
https://www.kaylabritt.com/blog-1-1/the-guidance-says-what-the-next-12-articles-show-how
Layer 4: Failure Modes & Evaluation
Validation begins with a simple premise:
You cannot validate a system until you understand how it fails. Identifying failure modes comprehensively necessitates a multi-disciplinary approach. Scientific workflows fail in two fundamental ways: either the system is fed the wrong evidence, or it draws the wrong conclusion from the evidence. When both occur together, the result is compounded system risk.
In an AI-enabled workflow, evidence-generation failures sit upstream of the AI layer with data governance, while evidence-interpretation failures sit within the realm of AI validation itself.
In AI-enabled life-science workflows,
Evidence failure includes:
assay variability
poor provenance
mislabeled samples
cohort bias
missing metadata
batch effects
nonrepresentative training data
Inference failure includes:
wrong model choice
overfitting
unsupported extrapolation
hallucinated LLM output
prompt fragility
weak acceptance criteria
automating a task that still requires expert judgment
This is where evaluation design and golden datasets become critical.
https://www.kaylabritt.com/blog-1-1/validation-for-llms-an-interdisciplinary-perspective
https://www.kaylabritt.com/blog-1-1/fit-for-purpose-llms-why-it-matters
Layer 5: Worked Examples
This is the final layer that translates architecture and controls into real use cases.
Early examples include:
pharmacovigilance case processing
deviation categorization workflows
The next phase of this work expands into more complex systems:
agentic workflows
multimodal models
vendor qualification
human performance validation
The Transition to the Next Phase
The foundation is now in place.
The next phase of this work will stress-test the architecture against increasingly complex systems:
agentic AI workflows
multimodal models
vendor qualification
full validation case studies
Because in regulated science, the real question is never simply whether AI works.
It is whether we can stand behind the evidence it produces.