DE
← All articles

Overfitted Promises: AI in Coding Research – Hype vs. Evolution

AI CodingHype vs RealityNewtonAlchemyDevelopment Tools
Title slide: Overfitted Promises: Underestimated Futures - AI in Coding Research

Overfitted Promises: AI in Coding Research – Hype vs. Evolution

As a technology leader, who can you actually trust? What works with AI coding tools, what doesn't — and what does that mean for your engineering organisation?

At a Glance


For context: This piece is an opinionated vision piece, not a playbook. The technology moves so fast that it would be dishonest to lay down definitive best practices here. What you are reading is a hypothesis — and above all an argument for developing fresh evaluation frames, instead of relabelling existing software engineering routines as "and now with AI". The tools, the language, the use cases and the boardroom expectations are new — the lens we use to judge them ought to be new too. Anyone reading this critically and thinking along is in the right place. Anyone expecting a finished roadmap will be left dissatisfied — that is by design.

At neurons&neckar 2025 I put forward a provocative thesis: today's AI coding tools resemble alchemy more than modern science. But this is neither pure criticism nor hype. It is a sober stocktaking — and a look at how your engineering organisation should deal with it.

Newton: Between Alchemy and Revolution

Isaac Newton wrote more than a million words on alchemy — most of them unpublished in his lifetime. He was looking for the philosopher's stone, to transmute metals and to attain immortality. He experimented with "Diana's tree" and believed metals could "grow" and had life-like properties.

His verdict: he missed the goal. The consequences, however, were enormous. His alchemical experiments were methodical and meticulously documented. They led to scientific breakthroughs in chemistry, optics and thermodynamics. Newton needed alchemy in order to invent science.

That is the pattern I see today around AI coding tools: trial and error with an invisible mechanism — but not without value.

The Parallel: AI Coding Today

Slide: prompting as a magical incantation Slide: Newton the alchemist

The problem I observe in companies: they roll out GitHub Copilot or Cursor, expect a 30 percent productivity jump, and are then puzzled when reality turns out to be more complex.

Today's AI coding tools work like alchemy, not like established science:

This is not meant unkindly. It describes today's reality: AI coding tools are not yet scientifically predictable enough to integrate blindly into critical systems. We trust compilers too (program code → machine code) — but those are strictly deterministic.
AI coding is a new layer that we still have to learn how to handle and how to fence in.

But there is a pattern. And once we understand the pattern — once we know where these tools work and where they don't — we can deploy them deliberately.

Where AI Coding Tools Really Work — and Where They Don't

The good news: there is a pattern. Not every task is the same.

Where they consistently deliver

Where it gets alchemical — and you need to pay attention

This is not a defect of the tools. It is their boundary. The question for you as a technology leader is: which of these tasks dominate inside my team?

A Strategic Example: The Agentic AI Problem

A second pattern shows up around AI agents (extended discussion in the post "Agentic AI"). Automated systems meant to act on their own need clear guard-rails. It is the same problem: humans have to define the limits of where the AI is allowed to act and where it is not. The AI itself will not get this right without human guidance.

The same applies to code generation: an AI cannot decide for itself whether generated code is "good enough" for production. It can iterate quickly, but the final call — the critical call — remains with humans.

The Evolution: From Alchemy to Science

Newton needed 50 years to get from alchemy to science. We will be faster. But we are not there yet.

The direction is clear:

This transition is your responsibility as a leader, not the tool vendors'. You have to plan today when and how you adopt these tools — and where you don't.

What This Means Strategically: Four Heuristics for the Transition

These four points are not best practices — the technology is too young, the material to distil best practices from is not there. They are heuristics that hold up in today's phase and deserve to be reassessed in twelve months.

1. Map, don't evangelise

Before you adopt AI coding tools, you have to know: which tasks dominate in our work? Are they 70 percent boilerplate (large opportunity) or 70 percent complex domain logic (limited opportunity)? No one-size-fits-all — different teams need different strategies.

2. Start with small pilots

Begin with a team that does a lot of boilerplate and bug-fix work. Measure: what becomes faster, what doesn't? Which bugs appear? Where do you need more reviews? Then scale only if the balance is positive.

3. Quality gates are not optional

Code from AI needs reviews just as good as any other code — only different. Not "does it work?", but "is the approach safe?" and "have we done it like this before, or is this new?". These reviews have to be done by your architects, not by junior developers.

4. Fence in the risk

Some codebases are too critical for AI experiments. Security, payment processing, core logic — AI-generated code does not belong there until the tools are more mature. That is not technophobia; it is risk management.

The Uncomfortable Truth — and the Opportunity

Today's AI coding tools are overfitted to their current applications. That means: they perform brilliantly inside their training corridor (boilerplate, APIs, simple patterns) and less well the further you move away from it.

This is not malicious. This is physics.

But: Newton's alchemy was also "overfitted" to the transmutation of metals. And yet it led to modern chemistry, to thermodynamics, to optics. Not because the original question was right, but because he experimented and learned systematically.

That is what is happening with AI coding today. We are in the "alchemical" phase. We are in the middle of discovery. The companies that understand this now — that encourage their teams to experiment systematically, that introduce quality gates, that learn where these tools belong and where they don't — will reap the benefits when the science arrives.

The others will later wonder why they lost so much time.

Conclusion: Tell Hype from Opportunity

I often hear in boardrooms: "We have to invest in AI" or "Everyone's already using Copilot." That is disorientation in the hype. The right question is not "do we need AI coding?" but "where does AI coding pay off economically in our organisation, and where does it add risk?"

The answer is different at every company. And if you are unsure — if you do not know how to make that assessment, which tools fit you, how to roll them out without creating chaos — that is exactly what I am here for.

Not every AI promise is a promise. Some are real opportunities. The difference is strategy — and the courage to look at the new with fresh eyes, instead of forcing it into the old lens.

Unsure which AI coding tools actually pay off in your organisation?

Let's talk
This article is based on my talk at neurons&neckar 2025 and on practical experience with AI coding tools in open-source projects.