The narrative is backwards
There's a narrative going around that AI is making software engineering easier. I think it's getting the direction wrong.
AI is making it easier to generate code, build prototypes, and move from idea to output faster than ever. That part is real and significant. But the act of writing code was never the hardest part of software engineering. Understanding the problem was. Defining the right architecture was. Translating what a client actually needs into reliable system behavior was. Testing, validating, maintaining, and scaling software over time was.
None of that got easier because an LLM can produce a function in three seconds.
The gap is widening, not shrinking
If anything, the gap between "code that exists" and "software that works in context" is widening. When generating code was slow and expensive, the generation step forced a certain amount of thinking. You considered trade-offs as you wrote. You questioned assumptions because each line took effort. Now that code appears instantly, all of that deliberation has to happen separately and deliberately. And most teams haven't adjusted their process to account for that.
What the teams doing it well look like
The teams I see succeeding with AI aren't the ones generating the most code. They're the ones asking better questions before they generate anything. They define the problem clearly before they prompt. They evaluate whether the generated output actually fits their architecture instead of just checking whether it runs. They validate edge cases the AI never considered because nobody prompted for them. They invest time in understanding what was generated before it ships.
The role is shifting, not shrinking
The role is moving from "person who writes code" to "person who designs systems that work in context." That's not a demotion. It's actually a higher bar. The writing was the mechanical part. The engineering judgment around it was always where the real value lived.
AI reduces the effort needed to produce software. It increases the importance of everything that surrounds production: problem definition, architectural decisions, validation, and the judgment to know when generated code is good enough and when it's hiding assumptions that will break under real load.
Where the advantage actually lives
The future won't belong to teams that output the most code. It'll belong to teams that validate faster, make better technical decisions, and ask the questions that LLMs can't ask for themselves.
Is your team's process actually different since adopting AI tools? Or did the tools change but the workflow stayed the same?
Top comments (15)
"The gap between code that exists and software that works in context" — that line hit hard. Just wrapped up my 15th story on the same gap. The team in that one also had AI hitting 97.2% coverage, but the client had 14 external dependencies and a 24-hour CI pipeline. Turns out coverage report exists ≠ production won't blow up 😅
The 97.2% coverage with 14 external dependencies is the perfect example because coverage measures "did the code run" not "did the code handle what production will actually throw at it." You can hit 100% coverage with tests that all mock the external dependencies, which means you've thoroughly tested your code's behavior in a world that doesn't exist. The 24-hour CI pipeline detail is telling too because that's usually a symptom of the same problem, the team is running a massive test suite that gives them confidence numbers without actually reducing risk proportionally. The gap I keep seeing is that AI makes it trivially easy to generate tests that boost coverage metrics without anyone asking "what does this test actually prove about production behavior." Coverage went from a useful signal to a vanity metric the moment generating tests became cheaper than thinking about what to test.
"Coverage turned into a vanity metric when generating tests got cheaper than thinking about what to test" — that's the whole thing in one sentence. I'm honestly tempted to repost your reply as a comment under my own article so more people see it 😂
Ha go for it, good ideas should travel. And honestly that framing only clicked for me because of your 97.2% coverage example, the specific number makes the absurdity concrete in a way that "coverage doesn't equal quality" never does. Drop a link to your article if you want, curious to read the full story behind the 14 external dependencies and the 24-hour pipeline.
Just read through it. The RFP framing is what makes it land, because now the 97.2% isn't just a bad metric, it's a sales pitch that won the room while the actual product fell apart behind it. That's the part nobody wants to say out loud, that half these numbers exist to close deals not to protect production. Good stuff, gonna follow the series.
Thanks for reading the whole thing. See you in the next one 👊
You are welcome, see you soon.
This is why you need to have eval systems baked into the system. AI agents/models are probabilistic entities, you need to put in place deterministic infrastructure to safe guard yourself
agreed on the core...writing code was always the cheap part. but I'd push on the "doesn't make it easier" framing, because I think it's worse than that. when building was slow, the build itself was a brake on bad design. you'd get halfway in and feel the friction...this is dragging, the abstraction's wrong, time to back up. you caught the mistake before you'd fully paid for it. AI took that brake off. now you can build the wrong thing all the way and fast, and the design flaw stays invisible until it's big and load-bearing. so the part of the job that didn't get easier is now the part that's most expensive to get wrong
The widening-gap framing is right, and it shows up most in the parts that never make it into the prompt: error paths, edge cases, and behavior under load. When generation was slow, that thinking got forced on you line by line; now it has to be a deliberate separate step that's easy to skip when the diff already looks done. The only reliable counter I've found is deciding the failure cases and the checks up front, before any code gets generated.
Software design and engineering principles have never been more important than they are today.
One thing worth remembering is that code generation was never the game. Developers who were exceptionally good at rapidly producing code were often referred to as “code monkeys.” They weren’t the people invited into the room to define how the system should work or how complex business problems should be solved.
The real value has always been in understanding the problem, designing the system, defining the boundaries, and making sound architectural decisions. Writing the code was simply the implementation of that thinking.
Working with AI makes this even more important. Before we let an AI agent generate code, we need to clearly define what we’re building, why we’re building it, and the constraints it needs to operate within.
Many teams that initially evaluated AI based purely on code generation speed are now discovering that without strong architectural guidance, coding standards, and quality guardrails, a significant portion of that generated code ends up needing to be rewritten.
Code generation speed is not the same as production value.
And production value is not the same as customer value.
The teams that will get the most out of AI won’t be the ones generating the most code. They’ll be the ones designing the best systems for AI to build.
The line about slow generation forcing deliberation is the part that rings true. When every line cost effort, you were basically rubber-ducking the design as you typed it. Now that step is free, so the thinking has to move somewhere, and for most teams it just doesn't. Where I feel it most is review. Reviewing code a human wrote, you can usually guess the intent behind it. Reviewing generated code there's no intent to read, just plausible output, so you end up reverse-engineering what it was even trying to do, which is slower than people admit. Has your team actually carved out time for that, or does review still get squeezed the way it did before?
This hits the nail on the head. "The gap between code that exists and software that works in context is widening" is probably the most accurate description of the current AI era.
As a non-tech founder who literally self-taught and built a web platform for AI agents entirely alongside LLMs, I live this paradox every single day. AI makes me feel like a wizard who can materialize features in minutes. But the moment real traffic hits, or when I have to reason about edge cases, rate-limiting, and structural maintenance, the "wizardry" fades, and the sheer necessity of true engineering judgment becomes glaringly obvious.
Building a product with AI has actually made me respect senior engineers and architects infinitely more. AI can write the functions, but it doesn't possess the empathy to understand user behavior, nor the historical judgment to prevent architectural decay.
The tools changed, but the ultimate bottleneck is still—and will always be—human engineering discipline. Thanks for writing this!
There will always be the people bottleneck. Software can't possibly be accelerated overnight. Decisions always lie on an actual person orchestrating the AI. Today these decisions need higher quality verifications than ever before. This is a good reminder, thank you for this.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.