Regulated, Interconnected, Stalled: What's Blocking AI Projects in Five Industries

2026-05-11

regulated-industriesai-strategygovernancecommunicationopen-sourcesbommodel-validationenterprise-ai

In regulated industries in 2026, AI rarely fails on missing technology. The harder blockers sit upstream: a missing shared language between engineering and compliance, a missing binding standard for "good enough", and a missing mandate to settle these questions across functions.

Alexander C. S. Hendorf moderating the Problem Clinic "Python in Regulated Environments - What Works, What Doesn't" at PyCon DE & PyData 2026, Darmstadt

20 practitioners from five regulated industries - banking, pharma, medical-product development, healthcare IT providers, critical infrastructure - spoke for one hour under Chatham House Rule. Not for the stage, not for a recording, not for slides. About the points where their programmes actually get stuck. That confidentiality was decisive: without it, many statements would have stayed in the usual conference register. Instead, this wasn't about curated success stories but about the operational bottlenecks behind the programmes.

The core observation was unambiguous: the industries differ in their rulebooks, but not in their pattern of blockers. Across all five themes, the dominant blocker was almost never the tooling itself, but a combination of three organisational bottlenecks - a missing shared language between engineering and compliance, a missing binding standard for "good enough", and a missing mandate to decide across functions.

At a Glance

Three bottlenecks dominated across industries: language (engineering and compliance talk past each other), threshold (nobody operationally names what "good enough" looks like) and mandate (no role has the standing to negotiate between the functions).
Five themes recurred in almost every industry: SBOM and sub-dependencies, model validation, the perception of open source, engineering-to-QA communication, AI-generated code in review.
The unresolved question nobody in the room had cracked: LLM validation in productive, life-adjacent systems under regulatory oversight.
The practical implication: solve language, threshold and mandate in one industry and you've already half-built the blueprint for the other four.

Language: Engineering and Compliance Talk Past Each Other

In nearly every account, the same fracture point surfaced: engineering does not understand the vocabulary of compliance, compliance does not understand the vocabulary of engineering, and nobody translates. This isn't a soft-skill problem; it's a shared technical language that, in most houses, simply doesn't exist. The result is that requirements get formulated past each other, acceptance criteria stay vague, and each side experiences the other as a brake rather than a partner.

A second observation in the same vein, also from the room: hardly any tech conference has a communication track. There are tracks for architecture, for ML, for security, for cloud, for DevOps. But not for the one discipline at which most regulated programmes actually fail in day-to-day work.

What that means in practice: as long as a regulated AI programme has no shared language between engineering and compliance, every architectural debate is a proxy fight - and every delay costs more than the underlying problem.

Threshold: "What Would Actually Be Good Enough?"

One voice in the room put it as a question that has not left my head since: "What would actually be good enough?" In many corporates, that question has no clear counterpart. When compliance can't name the threshold operationally and engineering isn't allowed to ask for it, every architectural decision becomes a negotiation without a scale. The consequence isn't a too-high or too-low bar; it's arbitrariness - what gets accepted today is insufficient tomorrow, and nobody can say why.

The problem rarely lies in nobody wanting quality. It lies in the fact that quality doesn't get translated into decidable criteria. "Safe", "robust", "auditable" or "compliant" are valid as goals, but for a development team they're not sufficient. A team needs the answer to the operational question: how do we know that this solution is good enough for this context?

What that means in practice: without an operationally formulated "good enough" - measured, shared, documented - even the best stack can't be cleared for release. The bottleneck sits before the code, not inside it.

Mandate: Who Has Standing to Negotiate Between the Functions?

The same anchor came up several times in the room: innovation teams as an institutionalised bridgehead inside regulated corporates. They are neither pure engineering nor pure strategy. What matters is not their name, but their mandate: they are allowed to translate, negotiate and escalate between engineering, compliance, QA, security and senior leadership with binding force.

Where these structures exist, new stacks make it through conservative IT setups. Where they don't, Python stays in the sandbox - for organisational reasons, not technical ones. Then every team can do good local work and still fail at the handover into productive responsibility, because nobody is allowed to carry the decision across functional boundaries.

What that means in practice: a mandated bridge function is the precondition for language and threshold to be negotiated at all. Without it, both bottlenecks remain structurally unresolvable.

SBOMs and Sub-Dependencies - Who Owns the Supply Chain?

An SBOM (Software Bill of Materials) is the ingredients list of a software product: every library, every module, every third-party component that ends up in the running system. Sub-dependencies are the ingredients of those ingredients - software is built on software, often five or ten layers deep. Anyone signing off in a regulated industry that a system is secure and auditable has to know not only what is directly included, but also what those included components themselves bring along. A vulnerability three layers down is not a footnote; it falls under the same accountability.

The discussion in the room was not about tools - those exist - but about accountability and supply chain. Who maintains the SBOM? Who escalates a critical vulnerability in a sub-sub-dependency? Who carries the risk when a supplier quietly abandons a component, an acquisition rewrites the roadmap, or licence terms tighten mid-lifecycle?

What that means in practice: as long as SBOM stewardship is treated as a tooling question rather than an accountability and escalation question, audit safety is only formal, not operational.

Open Source - The Pharma Question

One of the most striking moments of the discussion came out of the pharma context: the perception that "free" means somebody is stealing on our behalf. It would be tempting to dismiss this as an isolated anecdote. It is, in fact, a widely held picture - and in regulated corporates a consequential one, because it frames open source as a deficit rather than as a structural advantage.

Two points are systematically overlooked. First, open-source licences like MIT, Apache 2.0 or BSD are irrevocable. A version released under that licence today is still under that licence tomorrow - no supplier can take it back, no acquisition can change it, no "strategic pivot" can void it. With proprietary components, exactly that is a standard risk. Second, open source cannot raise its price. With proprietary stacks, the next licence round, the next acquisition, the next "repricing" is a fixed part of the TCO reality - with open software, that lever simply does not exist.

Auditability, sovereignty, vendor independence, total cost of ownership - on every one of these axes, open software is the more sustainable foundation in regulated corporates, not the budget version. The longer argument is carried by the open-stack piece in this series, Stop Waiting, Start Shipping.

What that means in practice: in 2026, open source is, for the overwhelming majority of enterprise workloads, the regulatorily and economically more grown-up choice. Setting this picture straight in regulated industries corrects a misconception that produces real costs in investment and compliance decisions.

LLM Validation - The Unresolved Question

LLM validation, at its core, means demonstrating that a language model inside a productive, regulated process is reliable enough to carry the accountability that any other causal piece of software in that context would carry. Classical validation expects causal behaviour, deterministic answers, reproducible tests. An LLM delivers none of those in the strict sense - and that is precisely where evaluation begins as an operational discipline.

Alexander C. S. Hendorf in discussion with practitioners from five regulated industries during the Problem Clinic, PyCon DE & PyData 2026, Darmstadt

There was a notable silence in the room. To the question of whether anyone is running an LLM as a core component of a productive system under GxP or comparable validation, no serious confirmation came back. Selective LLM use, yes - document extraction, pre-classification, helper functions. As a core component of a life-adjacent or financially critical process, validated by the standard that would apply to any other causal piece of software: no.

That is not a failure of the practitioners. It is the regulatory gap. The transitional question is whether existing risk frameworks from the pharma world - hit-rate models, statistical acceptance under oversight, continuous post-market surveillance - can be transferred to software components. There are precedents on that side. They have not, in 2026, been systematically translated to the software side.

What that means in practice: anyone deploying an LLM in productive responsibility needs an eval pipeline that measures daily, not one that gets reached for once at release. The shift from validation as a phase to evaluation as architecture is the only bridge that holds at the moment.

For Decision-Makers

First: Mandate the Bridge Function

My recommendation: build a mandated bridge function before the next AI programme begins - an innovation unit with its own mandate, its own budget, direct access to senior leadership. Give it three explicit rights: escalating goal conflicts, documenting release criteria, and preparing decisions all the way to senior leadership.

Second: "Maybe" Is Not an Answer

My recommendation: make "what would be good enough?" a standard instrument in every compliance escalation, in two stages - which standard formally applies today, and how it would be operationally measured for this specific system. "Maybe", "it depends" or "we're still reviewing it" are not answers to this question; they are the signal that the requirement is not yet decision-ready.

Third: Translation as a Hiring and Training Discipline

My recommendation: treat translation between technical and regulatory language as a discipline of its own in hiring and development plans - not as a soft skill, but as an operational precondition for delivery. Concretely: role profiles that name translation competence; a shared glossary for the terms decisions hang on; and, in every larger programme, a named individual who owns this translation.

In regulated environments the hardest blockers are rarely technical. Let's talk about where your programme stands - and which of the three bottlenecks (language, threshold, mandate) is the deciding one for you.

Let's talk