DEV Community

From Vibe Coding to Spec-Driven Development: Tasking AI with Spec Kit

Mustafa ERBAY on June 15, 2026

From Vibe Coding to Spec-Driven Development: Tasking AI with Spec Kit In the world of software development, especially in rapid prototyp...

Read full post

Beatriz Albernaz • Jun 15

One thing I'd add from a security perspective: spec-driven development also narrows the attack surface. When AI generates code from a well-defined spec, the behavior is bounded and auditable. With vibe coding, the AI fills in gaps with assumptions and those assumptions are often where vulnerabilities hide. Business logic flaws, missing authorization checks, unexpected state transitions, these are exactly the kind of issues that don't show up in automated scanners but emerge when you're working without a spec.

The tighter the spec, the less room for the model to improvise in dangerous directions.

Mustafa ERBAY • Jun 15

That’s a great point.

One thing I’ve noticed is that AI rarely invents vulnerabilities out of nowhere; it usually invents assumptions.

When authorization rules, state transitions, validation requirements, or business constraints are missing from the specification, the model fills those gaps using patterns it has seen elsewhere. Sometimes those assumptions are reasonable, sometimes they’re dangerous.

In that sense, specifications are not only a development artifact but also a security control. The clearer the boundaries, the less opportunity there is for both developers and AI systems to introduce unintended behavior.

I suspect that as AI-generated code becomes more common, security reviews will increasingly focus on the quality of the specification itself, not just the generated code.

Thanks for adding the security perspective.

Beatriz Albernaz • Jun 15

The problem is when the spec doesn't specify, it borrows from whatever context seemed similar during training.
It also means pentesting will need to evolve. Most automated scanners test behaviour, not intent. They won't catch a feature that works exactly as implemented but violates what the spec actually required.

That's actually part of what we're building at Faultline Security: manual pentesting and AI red teaming that looks at the gap between intended behaviour and actual behaviour, especially in AI-native products. If you're ever working on something where that gap matters, happy to take a look! We're early stage and selectively taking on engagements right now ;)

Mustafa ERBAY • Jun 15

Exactly. That gap between intended behavior and implemented behavior is where many of the dangerous issues live.

A system can pass automated tests, return the expected status codes, and still violate the actual business rule. That is especially risky with AI-generated code, because the implementation may look clean while quietly encoding the wrong assumption.

I like the way you framed pentesting around intent, not only behavior. In AI-native products, reviewing the specification, the generated implementation, and the model’s assumptions together will probably become a core security practice.

Faultline Security sounds interesting. I’m currently exploring this space from the architecture and AI workflow side, so I’d be happy to follow what you’re building and exchange ideas.

Adam Lewis • Jun 17

Agree on the direction, with one caveat: the spec only helps if it can fail the work. Spec Kit gives you the structure, but what matters is whether the acceptance criteria are concrete enough for the agent to run them and come back red. A spec that only reads as a tidy description, with nothing in it you can execute, is still the vibe, just written down.

Mustafa ERBAY • Jun 18

That’s an important distinction.

If the same agent writes the implementation, defines the acceptance criteria, and evaluates the result, we risk creating a closed validation loop. The system is effectively grading its own homework. What I’ve started thinking about is separating generation from verification as independent concerns. The implementation can be AI-generated, but the acceptance criteria should ideally originate from the specification itself or from an earlier, immutable step in the workflow. Otherwise, the agent isn’t proving correctness against requirements; it’s proving consistency with its own assumptions. The more autonomy we give agents, the more valuable independent verification becomes.

Mustafa ERBAY • Jun 17

That’s a very good caveat, and I completely agree.

A specification that cannot fail the implementation is mostly documentation, not a real control mechanism.

For AI-assisted development, I think the spec needs to become executable in some form: acceptance criteria, contract tests, validation rules, state transition checks, or CI gates. Otherwise, we are just moving the “vibe” from the code into a nicer-looking document.

The real value appears when the agent can read the spec, generate the work, run the checks, and get a clear red or green result.

So yes, Spec → Generate → Verify only works if the spec has teeth.

Thanks for pointing that out.

Adam Lewis • Jun 18

Agreed. The one thing I'd add is to watch who writes the checks - if the agent writes the code and its own acceptance tests in one pass, they tend to pass trivially. Easier to trust when the criteria are pinned before the code, or come from a different step.

Joaquinriosheredia • Jun 15

Great article. The connection between vibe coding and runtime anti-patterns is something I've been working on from a different angle.
I built java-vibe-guard specifically to detect the Spring Boot / Java runtime failures that AI-generated code tends to introduce — things like @Transactional holding DB connections while waiting on async work, or .block() in reactive contexts. Patterns that pass CI but fail under real concurrency.
The interesting part: static detection alone wasn't convincing enough. So I added --verify — it reproduces the failure locally using Testcontainers so you can observe the phenomenon, not just read a warning.
Spec-driven development prevents the problem at the source. Evidence-based verification catches what slips through anyway.
📦 github.com/Joaquinriosheredia/java-vibe-guard

Mustafa ERBAY • Jun 15

I think this highlights an important shift in the AI era.

For years we focused heavily on syntax correctness, then test correctness. Now we’re increasingly dealing with behavior correctness under real operational conditions.

AI-generated code often passes code review because the implementation looks familiar, but production failures usually emerge from timing, concurrency, transaction boundaries, resource exhaustion, and other runtime realities that are difficult to see statically.

That’s why I like the combination of specification, generation, and verification. The specification defines intent, the AI accelerates implementation, and tools like java-vibe-guard validate whether the implementation survives contact with reality.

The --verify approach is a clever way to bridge that gap. I’ll check out the project.

Joaquinriosheredia • Jun 16

Exactly. "Survives contact with reality" is the right framing.
The gap between passing CI and surviving production is where most of the interesting failures hide. Static analysis closes part of that gap. Reproducible verification closes a bit more.
Thanks for checking it out — feedback from anyone who runs --verify on their own setup would be genuinely useful.

Mustafa ERBAY • Jun 16

I think AI is pushing us toward a new validation stack.

A few years ago, passing compilation was enough for many projects. Then passing automated tests became the baseline. Now we’re reaching a point where even passing tests may not be sufficient if the system fails under realistic operational conditions.

What’s interesting is that AI tends to generate code that looks correct because it follows familiar patterns. The challenge is that production failures rarely come from syntax mistakes; they emerge from timing, concurrency, scale, resource contention, and assumptions that only become visible at runtime.

That’s why I see tools like yours as complementary to Spec-Driven Development rather than alternatives to it.

The spec defines what the system should do.
The AI generates an implementation.
Verification tools test whether the implementation behaves correctly when reality starts applying pressure.

The stronger AI gets, the more valuable that final verification layer becomes.

Joaquinriosheredia • Jun 16

That's a good way to frame the stack.
Spec → Generate → Verify. Each layer catches what the previous one misses.
The more capable the generator, the more important the verifier becomes. That's not a coincidence.

𝕋𝕙𝕖 𝕃𝕒𝕫𝕪 𝔾𝕚𝕣𝕝 • Jun 15

Excellent article. The shift from vibe coding to Spec-Driven Development is a natural evolution for teams that want to scale AI-assisted development beyond prototypes. Spec Kit's structured approach helps align requirements, architecture, and implementation while reducing ambiguity and rework. What stood out most is how it transforms AI from a code generator into a collaborator that operates within clearly defined constraints. As AI agents become more capable, strong specifications will likely become as important as clean code itself. Thanks for sharing this practical perspective on building more reliable AI-driven workflows. ❤️

Mustafa ERBAY • Jun 15

Thank you!

What fascinates me most is that AI is forcing us to revisit old software engineering lessons.

For years, many teams treated specifications as optional and relied on developer intuition. That worked when implementation was expensive and relatively slow. Today, an AI agent can generate an entire feature in minutes, which means ambiguity becomes far more expensive than coding itself.

In a way, Spec-Driven Development isn’t a new idea. AI is simply making the cost of missing specifications much more visible.

Thanks for reading and sharing your thoughts!

𝕋𝕙𝕖 𝕃𝕒𝕫𝕪 𝔾𝕚𝕣𝕝 • Jun 15 • Edited

Your consistency is good Mustafa!! 😊