From Vibe Coding to Spec-Driven Development: Tasking AI with Spec Kit
In the world of software development, especially in rapid prototyping or early-stage projects, we often encounter an approach we can call "vibe coding." This is the practice of writing code based more on our instincts, on-the-spot decisions, and a "it feels right" sensation, without a specific spec or documentation. While this approach might seem to offer rapid progress initially, it can lead to serious problems as the project scales or the team size increases. The code becomes difficult to understand, unexpected bugs emerge, and worst of all, this ambiguity prevents the efficient use of AI (Artificial Intelligence) tools.
Recently, the capabilities of AI models have been increasing at an incredible pace. With techniques like prompt engineering and RAG (Retrieval-Augmented Generation), we can ask AI to perform specific tasks. However, AI also needs an "understanding" similar to ours. If we cannot fully define what a task is, how can we tell AI what it needs to do? This is precisely where Spec-Driven Development (SSD) and tools like Spec Kit come into play. In this article, we will delve into practical ways to move beyond "vibe coding" and task AI using Spec Kit.
The Hidden Costs of Vibe Coding
"Vibe coding" can seem appealing, especially for small teams or solo projects. Building and testing things quickly, bringing ideas to life instantly – it sounds good. However, the long-term costs of this approach are quite high. Bugs that are overlooked initially accumulate over time, creating a massive technical debt. As the codebase becomes complex, even adding a new feature or fixing an existing bug can take months.
One of the most evident consequences of this situation is the communication breakdowns within the team. Since there's no reliance on a specific standard or documentation, understanding the logic behind the code written by one developer becomes challenging for others. This makes code reviews inefficient and increases the probability of bugs reaching production. In my own experience, I realized that a persistent issue with a production ERP system where the late shipment report was consistently incomplete was a result of this "vibe coding." The fact that the reporting module was built by different developers, at different times, and with different assumptions made data consistency impossible. Problems like these not only slow down the development process but also directly impact business operations.
⚠️ The Blind Spots of Vibe Coding
The biggest danger of vibe coding is that its initial speed is deceptive. As the project grows or the team expands, the code's understandability, maintainability, and testability are severely compromised. This situation causes us to miss opportunities for automation and optimization that AI could also be involved in.
Spec-Driven Development: Building the System
Spec-Driven Development (SSD) offers an approach that is the exact opposite of "vibe coding." This methodology is based on the principle that everything starts with a specification. Details such as what the code should do, what inputs it will receive, what outputs it will produce, and what behaviors it will exhibit under which conditions are clearly defined before writing any code. These specifications become the primary source of information not only for human developers but also for AI tools.
At the core of SSD is the idea that software is not just code, but also a "contract" that defines a set of rules, expectations, and behaviors. This contract serves as a bridge between developers, test engineers, product managers, and even customers. Specifications help clarify requirements and act as a guide during the development process. For example, if we consider the design of an API endpoint, in the SSD approach, all details such as which HTTP method the endpoint will use (GET, POST, etc.), its URL structure, which query parameters it will accept, what data it expects in the request body, and what responses it will return in success or failure cases are determined in advance. These details also form the basis for test scenarios.
Meet Spec Kit: Structured Specifications
Spec Kit is a tool designed to bring this SSD philosophy to life. This library allows you to define your specifications in a structured format. These specifications, typically stored in machine-readable formats like YAML or JSON, are both human-readable and processable by various tools. Spec Kit offers a flexible way to define many different structures, such as API definitions, transaction flows, and data models.
For instance, when using Spec Kit for an API definition, you can detail endpoints under paths, the HTTP methods that can be used for each endpoint (get, post, etc.), the parameters they accept (query, path, header, cookie), the request body schema, and the expected responses (responses). Additional information like data type, whether it's required, and a descriptive text can be defined for each parameter or field. This structured approach not only simplifies the development process but also enables processes like automatic documentation generation, test case creation, and even code generation. In the backend of a financial calculator application I developed, I used these structured specifications to design APIs faster and minimize error margins by anticipating unexpected situations.
ℹ️ The Structural Power of Spec Kit
Spec Kit eliminates ambiguity by creating specifications defined according to a specific standard. This structured data provides a clear roadmap for both human developers and AI models.
The Way to Task AI: Spec Kit and Prompt Engineering
One of the biggest challenges when working with AI models is being able to clearly tell them what we want. In the "vibe coding" approach, this communication is often vague and incomplete. However, the structured specifications created with Spec Kit completely change this situation. We can use these specifications as primary input for AI.
For example, when we want to generate code for an API endpoint, we can provide the relevant Spec Kit definition to the AI model. We can formulate our prompt as follows: "Based on the Spec Kit definition below, create an API endpoint using FastAPI in Python. The endpoint should be POST /users and accept name (string, required) and email (string, required, valid email format) fields in the request body. Upon successful requests, it should return a 201 Created status and a JSON containing the ID of the created user. In error cases, it should return a 400 Bad Request with appropriate error messages." This way, we specify exactly what we want from the AI.
Another use case is generating test scenarios. The parameters, data types, and constraints in the Spec Kit definition guide the AI on which test scenarios to generate. With a prompt like "Write test scenarios using pytest for the given API endpoint, covering valid and invalid cases, based on this Spec Kit definition," we can take automation to the next level. While working on my own side product, an Android spam blocking app, I benefited from this kind of AI-assisted test scenario generation to check the validity of user inputs. The data validation rules in Spec Kit clearly showed the AI which valid and invalid inputs to try.
Code Generation with Spec Kit: An Example Scenario
Now, let's see the code generation potential of Spec Kit with a more concrete example. Suppose we have a specification for a user registration API. Let's define this specification as a YAML file:
paths:
/users:
post:
summary: Create user
operationId: createUser
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
name:
type: string
description: User's full name
minLength: 2
maxLength: 50
email:
type: string
format: email
description: User's email address
pattern: "^\\S+@\\S+\\.\\S+$" # Simple email regex
required:
- name
- email
responses:
'201':
description: User created successfully
content:
application/json:
schema:
type: object
properties:
id:
type: string
format: uuid
description: Unique ID of the created user
message:
type: string
example: "User registered successfully."
'400':
description: Invalid request data
content:
application/json:
schema:
type: object
properties:
error:
type: string
example: "Invalid email format."
By providing this YAML file to an AI model, we can ask it to create this endpoint using Python and FastAPI, for example. Our prompt could be:
Using the OpenAPI (Swagger) like spec definition below, create an API endpoint with Python and FastAPI.
Spec Definition:
---
paths:
/users:
post:
summary: Create user
operationId: createUser
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
name:
type: string
description: "User's full name"
minLength: 2
maxLength: 50
email:
type: string
format: email
description: "User's email address"
pattern: "^\\S+@\\S+\\.\\S+$" # Simple email regex
required:
- name
- email
responses:
'201':
description: "User created successfully"
content:
application/json:
schema:
type: object
properties:
id:
type: string
format: uuid
description: "Unique ID of the created user"
message:
type: string
example: "User registered successfully."
'400':
description: "Invalid request data"
content:
application/json:
schema:
type: object
properties:
error:
type: string
example: "Invalid email format."
---
The expected output should be a Python code containing the relevant route for the FastAPI app. Include the necessary imports as well.
The output we receive from the AI would be roughly as follows (simplified):
from fastapi import FastAPI, HTTPException, status
from pydantic import BaseModel, EmailStr, Field
import uuid
app = FastAPI()
class UserCreateRequest(BaseModel):
name: str = Field(..., min_length=2, max_length=50)
email: EmailStr # FastAPI's EmailStr usage automatically checks this format
class UserCreateResponse(BaseModel):
id: str
message: str
# Simple database simulation
users_db = {}
@app.post("/users", response_model=UserCreateResponse, status_code=status.HTTP_201_CREATED)
async def create_user(user_request: UserCreateRequest):
if user_request.email in users_db:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="This email address is already in use.")
user_id = str(uuid.uuid4())
users_db[user_request.email] = {"id": user_id, "name": user_request.name, "email": user_request.email}
return UserCreateResponse(id=user_id, message="User registered successfully.")
# Additionally, for catching the 400 error from the spec, model-level validations are already handled by Pydantic.
# For example, if the email format is incorrect or the name field is empty, Pydantic will automatically return a 422 Unprocessable Entity.
# For our specific 400 error (email duplication), the if block above has been added.
In this example, we see that the AI not only generates code but also correctly creates Pydantic models, generates UUIDs, and includes a simple error check (email duplication). Details like using EmailStr show that the AI understands Pydantic's built-in validations. This offers an incredible increase in efficiency compared to hundreds of lines of code that would be written manually with "vibe coding." Approximately 30+ lines of functional code were generated from a YAML definition of about 40 lines. This significantly shortens development time, especially when dealing with repetitive API endpoints or data models.
💡 Things to Consider in AI Code Generation
Code generated by AI should always be reviewed and tested. AI may adhere to specifications but can overlook edge cases or security vulnerabilities. It is best to consider the generated code as a starting point.
Automating Test Scenarios
Spec Kit not only generates code but also provides an excellent foundation for creating test scenarios. The data types of fields, constraints like minLength, maxLength, pattern, and required fields in the specifications give the AI clear information on what types of test scenarios to generate.
We can give the AI a prompt like this: "Based on the given Spec Kit definition, write test scenarios in Python using pytest for the following API endpoint, covering both valid and invalid cases."
The AI can analyze all the requirements in the specification and generate test scenarios like the following:
- Valid Email and Name: If
nameandemailfields are in the correct format and meet the required conditions, it should return 201 Created. - Invalid Email Format: If the email field does not contain "@" or ".", or does not match the specified regex, it should return a 400 Bad Request (or a 422 automatically returned by Pydantic).
- Missing Fields: When the
nameoremailfield is left empty or not sent, it should return a 400 (or 422). - Name Length Constraints: When the
namefield goes outside theminLengthandmaxLengthvalues, it should return an error. - Duplicate Email: If an attempt is made to register with an email that has already been registered, it should return a 400 Bad Request.
These types of automated test scenarios improve code quality from the beginning of the development process and help prevent regression errors. In my own projects, these automated tests integrated into CI/CD pipelines have increased the reliability of the code I deploy to production environments to 99%. In my work on a production ERP system, this type of test automation helped ensure that critical errors in the reporting module were detected before going into production.
AI-Powered Applications with RAG
Retrieval-Augmented Generation (RAG) is a technique that allows AI models to leverage external knowledge sources to produce more accurate and contextual responses. Structured specifications created with Spec Kit can serve as an excellent knowledge source for RAG systems.
In a RAG system, to answer a user's question, the AI model first goes through a "retrieve" phase. In this phase, it searches for the most relevant information related to the user's query from sources such as Spec Kit definitions, documentation, or database schemas. The retrieved information is then provided as input to the AI model for the "generate" phase. This way, the AI can generate responses using not only its general knowledge but also project-specific details.
For example, when a developer asks the AI, "What parameters does the user profile update API accept?", the RAG system finds the relevant Spec Kit definition and provides this definition as input to the AI model. The AI can then list the fields accepted by the PUT /users/{id} endpoint in the specification, whether they are required, and their data types. This is a great convenience for new developers or anyone working in complex systems. I used RAG to create documentation for my custom financial calculators; Spec Kit definitions and code comments enabled the AI to produce consistent and accurate documentation.
Trade-offs and Looking Ahead
Adopting Spec-Driven Development and Spec Kit naturally involves some trade-offs. Preparing specifications initially may take more time compared to "vibe coding." However, this early investment provides significant savings in the long run by preventing costs and problems that would arise in later stages of the project. Keeping specifications up-to-date also requires separate attention. If the code deviates from the specifications, or if the specifications become outdated, the system's consistency will be compromised. Therefore, establishing automations in CI/CD pipelines that check for compatibility between specifications and code would be beneficial.
In the future, the role of AI in software development processes will continue to grow. Tools like Spec Kit will be one of the cornerstones of this AI revolution. Structured and machine-readable specifications will enable AI to perform more complex tasks, find smarter errors, and even assist in making new architectural decisions. Our role as developers is to effectively use these tools to build more sustainable, reliable, and efficient software. When working with AI, trusting the "spec" rather than the "vibe" will be the key to the future.
On this journey, getting acquainted with tools like Spec Kit and adopting Spec-Driven Development principles will not only increase our own productivity but also make our collaboration with AI much more meaningful and results-oriented. Let's not forget that fully leveraging the power of AI depends on our ability to clearly articulate what we want, and Spec Kit is one of the most important components of this language.
Top comments (17)
One thing I'd add from a security perspective: spec-driven development also narrows the attack surface. When AI generates code from a well-defined spec, the behavior is bounded and auditable. With vibe coding, the AI fills in gaps with assumptions and those assumptions are often where vulnerabilities hide. Business logic flaws, missing authorization checks, unexpected state transitions, these are exactly the kind of issues that don't show up in automated scanners but emerge when you're working without a spec.
The tighter the spec, the less room for the model to improvise in dangerous directions.
That’s a great point.
One thing I’ve noticed is that AI rarely invents vulnerabilities out of nowhere; it usually invents assumptions.
When authorization rules, state transitions, validation requirements, or business constraints are missing from the specification, the model fills those gaps using patterns it has seen elsewhere. Sometimes those assumptions are reasonable, sometimes they’re dangerous.
In that sense, specifications are not only a development artifact but also a security control. The clearer the boundaries, the less opportunity there is for both developers and AI systems to introduce unintended behavior.
I suspect that as AI-generated code becomes more common, security reviews will increasingly focus on the quality of the specification itself, not just the generated code.
Thanks for adding the security perspective.
The problem is when the spec doesn't specify, it borrows from whatever context seemed similar during training.
It also means pentesting will need to evolve. Most automated scanners test behaviour, not intent. They won't catch a feature that works exactly as implemented but violates what the spec actually required.
That's actually part of what we're building at Faultline Security: manual pentesting and AI red teaming that looks at the gap between intended behaviour and actual behaviour, especially in AI-native products. If you're ever working on something where that gap matters, happy to take a look! We're early stage and selectively taking on engagements right now ;)
Exactly. That gap between intended behavior and implemented behavior is where many of the dangerous issues live.
A system can pass automated tests, return the expected status codes, and still violate the actual business rule. That is especially risky with AI-generated code, because the implementation may look clean while quietly encoding the wrong assumption.
I like the way you framed pentesting around intent, not only behavior. In AI-native products, reviewing the specification, the generated implementation, and the model’s assumptions together will probably become a core security practice.
Faultline Security sounds interesting. I’m currently exploring this space from the architecture and AI workflow side, so I’d be happy to follow what you’re building and exchange ideas.
Agree on the direction, with one caveat: the spec only helps if it can fail the work. Spec Kit gives you the structure, but what matters is whether the acceptance criteria are concrete enough for the agent to run them and come back red. A spec that only reads as a tidy description, with nothing in it you can execute, is still the vibe, just written down.
That’s an important distinction.
If the same agent writes the implementation, defines the acceptance criteria, and evaluates the result, we risk creating a closed validation loop. The system is effectively grading its own homework. What I’ve started thinking about is separating generation from verification as independent concerns. The implementation can be AI-generated, but the acceptance criteria should ideally originate from the specification itself or from an earlier, immutable step in the workflow. Otherwise, the agent isn’t proving correctness against requirements; it’s proving consistency with its own assumptions. The more autonomy we give agents, the more valuable independent verification becomes.
That’s a very good caveat, and I completely agree.
A specification that cannot fail the implementation is mostly documentation, not a real control mechanism.
For AI-assisted development, I think the spec needs to become executable in some form: acceptance criteria, contract tests, validation rules, state transition checks, or CI gates. Otherwise, we are just moving the “vibe” from the code into a nicer-looking document.
The real value appears when the agent can read the spec, generate the work, run the checks, and get a clear red or green result.
So yes, Spec → Generate → Verify only works if the spec has teeth.
Thanks for pointing that out.
Agreed. The one thing I'd add is to watch who writes the checks - if the agent writes the code and its own acceptance tests in one pass, they tend to pass trivially. Easier to trust when the criteria are pinned before the code, or come from a different step.
Great article. The connection between vibe coding and runtime anti-patterns is something I've been working on from a different angle.
I built java-vibe-guard specifically to detect the Spring Boot / Java runtime failures that AI-generated code tends to introduce — things like @Transactional holding DB connections while waiting on async work, or .block() in reactive contexts. Patterns that pass CI but fail under real concurrency.
The interesting part: static detection alone wasn't convincing enough. So I added --verify — it reproduces the failure locally using Testcontainers so you can observe the phenomenon, not just read a warning.
Spec-driven development prevents the problem at the source. Evidence-based verification catches what slips through anyway.
📦 github.com/Joaquinriosheredia/java-vibe-guard
I think this highlights an important shift in the AI era.
For years we focused heavily on syntax correctness, then test correctness. Now we’re increasingly dealing with behavior correctness under real operational conditions.
AI-generated code often passes code review because the implementation looks familiar, but production failures usually emerge from timing, concurrency, transaction boundaries, resource exhaustion, and other runtime realities that are difficult to see statically.
That’s why I like the combination of specification, generation, and verification. The specification defines intent, the AI accelerates implementation, and tools like java-vibe-guard validate whether the implementation survives contact with reality.
The --verify approach is a clever way to bridge that gap. I’ll check out the project.
Exactly. "Survives contact with reality" is the right framing.
The gap between passing CI and surviving production is where most of the interesting failures hide. Static analysis closes part of that gap. Reproducible verification closes a bit more.
Thanks for checking it out — feedback from anyone who runs --verify on their own setup would be genuinely useful.
I think AI is pushing us toward a new validation stack.
A few years ago, passing compilation was enough for many projects. Then passing automated tests became the baseline. Now we’re reaching a point where even passing tests may not be sufficient if the system fails under realistic operational conditions.
What’s interesting is that AI tends to generate code that looks correct because it follows familiar patterns. The challenge is that production failures rarely come from syntax mistakes; they emerge from timing, concurrency, scale, resource contention, and assumptions that only become visible at runtime.
That’s why I see tools like yours as complementary to Spec-Driven Development rather than alternatives to it.
The spec defines what the system should do.
The AI generates an implementation.
Verification tools test whether the implementation behaves correctly when reality starts applying pressure.
The stronger AI gets, the more valuable that final verification layer becomes.
That's a good way to frame the stack.
Spec → Generate → Verify. Each layer catches what the previous one misses.
The more capable the generator, the more important the verifier becomes. That's not a coincidence.
Excellent article. The shift from vibe coding to Spec-Driven Development is a natural evolution for teams that want to scale AI-assisted development beyond prototypes. Spec Kit's structured approach helps align requirements, architecture, and implementation while reducing ambiguity and rework. What stood out most is how it transforms AI from a code generator into a collaborator that operates within clearly defined constraints. As AI agents become more capable, strong specifications will likely become as important as clean code itself. Thanks for sharing this practical perspective on building more reliable AI-driven workflows. ❤️
Thank you!
What fascinates me most is that AI is forcing us to revisit old software engineering lessons.
For years, many teams treated specifications as optional and relied on developer intuition. That worked when implementation was expensive and relatively slow. Today, an AI agent can generate an entire feature in minutes, which means ambiguity becomes far more expensive than coding itself.
In a way, Spec-Driven Development isn’t a new idea. AI is simply making the cost of missing specifications much more visible.
Thanks for reading and sharing your thoughts!
Your consistency is good Mustafa!! 😊
Some comments may only be visible to logged-in visitors. Sign in to view all comments.