DEV Community

The Code Works. What Could Possibly Go Wrong?

Sylwia Laskowska on June 10, 2026

Would you treat a serious illness without seeing a doctor, relying only on whatever your favorite AI model suggested? Would you let AI take ov...

Read full post

Ecom Digital • Jun 10

Great perspective!
I use AI extensively in development, but I treat it as a productivity tool rather than an authority.
AI can generate code quickly, but it doesn't understand the business context, security requirements, or long-term maintainability of a project.

For me, the rule is simple: trust AI to assist, but verify its output before it reaches production. Human judgment and accountability are still essential, especially for architecture and security-critical decisions.

Sylwia Laskowska • Jun 10

I do exactly the same.

AI has become an incredibly useful productivity tool, and I use it every day, but I still see it as an assistant rather than an authority. It can save a lot of time, generate ideas, and help navigate unfamiliar technologies, but we're not yet at the point where it can be trusted to operate completely autonomously.

Context, architecture, security, business requirements, and ultimately accountability still belong to humans.

Thanks for sharing your perspective! 🙂

Iman • Jun 12

Well said. I agree with you. AI is best used as a tool. One cannot simply hand over the entire task at hand to an agent.

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Jun 10

Great post! I was planning on making an essay in your comments since there was something I watch yesterday that is really related to this, but I want to save it. I will definitely mention you post about it. I think it will expand on your insights on the use of AI and I believe we are in a good opportunity that we never saw!

Otherwise, great work and stay tuned :D

Sylwia Laskowska • Jun 10

Francis, absolutely write that article!

Honestly, I feel like the best posts often come from exactly that kind of inspiration, a thought sparked by someone else's article, a video, a conversation, or even a random comment thread.

I'm really curious to see your perspective on this topic, especially if it expands on some of the ideas discussed here. Looking forward to reading it when it's out, and thanks for the kind words!

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Jun 12 • Edited

It is published!

FrancisTRᴅᴇᴠ (っ◔◡◔)っ

Jun 12

A Developer using AI. What Could Possibly Go Wrong?

#discuss #community #ai #programming

3 min read

Adam - The Developer • Jun 11

After so many bugs and outages caused by AI, we're now forcing people to review its output, understand the code, read the docs, write tests, refactor, and verify everything before it goes to production.

So... software engineering.

I saw a LinkedIn post where a guy had this massive AI setup with strict prompts, parallel agents, hallucination guards, documentation checks, code reviews, testing pipelines, and deployment rules.

And I'm sitting there thinking:

"Isn't this just Agile?"

We spent years trying to automate engineering, only to reinvent all the engineering processes we already had.

Sylwia Laskowska • Jun 11

Haha, exactly! 😄 The more mature AI workflows become, the more they start looking suspiciously like... software engineering.

What always makes me laugh is remembering those discussions from a year or two ago about how maybe we don't even need to understand the code generated by AI anymore. Interestingly, those opinions often seemed to come from non-technical people rather than engineers.

I still remember my former BA arguing exactly that and happily pushing AI-generated applications to production. 😅 I'd be genuinely curious to see how those "businesses" are doing these days.

alptekin I. • Jun 11 • Edited

Hi Sylwia,
it has been a while, I have been (am) very busy and cannot even look into dev.to or read articles recently

I agree with you.

In fact, i am little bit skeptical, still.. maybe old fashioned.
I still force myself to write my own code and run LLM models aside vscode, just to check and review every code it generates.
While designing the architecture, as you may say, i get help from the models but i try to make sure that i own and understand every piece of it.

This is i suppose, is not the most fav way in these days. Maybe i will switch to more agentic approach in future, dont know, but as of now, i am trying to keep the control.
Which also means, for me, i learn better and more.

I think sw devs will be needed still.. But, nowadays, things change so fast. And there is so much hype. so, i dont know, hard to know what life will bring to us.

be safe and good luck in all these conferences. I hope one day, i will attend these, as a visitor and speaker, 🤞
alptekin

Sylwia Laskowska • Jun 11

Hi Alptekin, it's great to hear from you again!

I think you make a very important point. Everything is changing incredibly fast right now, and it's hard to predict where we'll be in a few years. But for the moment, I still believe that understanding what you're building is extremely important. AI can help us move faster, but it doesn't remove the need to understand the architecture, the tradeoffs, and the code that eventually goes into production.

And honestly, if your current approach helps you learn more, that's a huge advantage. Learning is still one of the best investments we can make as developers.

As for conference speaking, one small piece of advice: start with local meetups if you haven't already. Conference committees often like to see at least some speaking experience, and meetups are a fantastic way to build confidence, practice your talks, and get comfortable speaking in front of an audience.

alptekin I. • Jun 12

Thanks, i genuinely think you have a point. Need to work on that. see you around

Ekong Ikpe • Jun 10

I never trusted an AI except with general knowledge questions of which I got a habit or checking different sources quite often but truth is that they are improving and so I'm double checking less. Just like you said
"When you needed something more specialized, you had to go to the library or dig through academic journals".

"When I'm looking for general information, I trust them almost blindly".

I have been working on a project for some time and it seems I leveled up with the responses I get honestly I found the need to build an AI and I pushed it into reality fully browser native 😅 at least I'll trust this one more cos I'm the one teaching it how to read an evolve currently.

Sylwia Laskowska • Jun 10

That's actually super interesting! Building and training your own model must be a fascinating experience, especially because you get to understand its strengths and limitations from the inside rather than treating it as a black box.

I'd love to see where your project goes. Thanks for sharing your perspective and for the comment!

Ben Sinclair • Jun 11

What always amuses me is when people say: "See? AI caused this disaster." No. It didn't. The person responsible is the human who gave the agent excessive permissions. The human who didn't review the output.

Then it doesn't matter what the cause is. XHTML was a better path than HTML, Semantic HTML was a better path than DIV soup. Sending happy messages by email was a better path than spam. But people don't choose the "good" path, they choose what's lazy or gives them a short-term benefit, or the possibility of making money at someone else's expense. Every time.

And a vanishingly small percentage of vibe coders will review the output of any LLM in the foreseeable future.

Sylwia Laskowska • Jun 11

I completely understand that argument. There are actually plenty of findings in psychology showing that humans are cognitively lazy by nature and tend to look for shortcuts whenever possible.

On the other hand, reality often forces us to do things properly sooner or later, whether we like it or not.

In fact, just a few minutes ago I saw a job posting that could basically be described as "AI slop cleaner." 😅 Someone had apparently generated an application as quickly as possible, and now they're looking for an engineer to untangle the mess and turn it into something maintainable. (Interestingly, it was a very well-paid position.)

I think semantic HTML is a great example too. Sure, it's easier to throw divs everywhere. But then an accessibility audit arrives, and suddenly all those shortcuts become technical debt that somebody has to clean up.

The shortcuts are real. The cleanup bill is real too. 😄

Aryan Choudhary • Jun 10

The part that stood out to me was the comparison with medicine.

Most people would never blindly trust AI with a medical diagnosis because the consequences feel obvious. But somehow when the output is code, we become much more willing to suspend skepticism because the code compiles and the tests pass.

I've started thinking of AI as something similar to a very fast junior developer. It can be incredibly productive, it can surprise you with good ideas, and it can save hours of work.

But it's still my responsibility to understand what gets merged.

The speed is real. The accountability doesn't disappear.

Great post.

Sylwia Laskowska • Jun 11

Haha, at this point I think AI is already stronger than a typical junior in many areas. 😄

But I completely agree with your point about accountability. The speed is real, but the responsibility doesn't go anywhere.

And the medical comparison is exactly why I find this topic so interesting. If someone told us they diagnosed themselves entirely with ChatGPT, most people would probably consider that at least a little reckless. Yet not that long ago we were hearing things like: "Do we really need to understand all the code the model generated before shipping it?" 😅

For some reason, we're much more comfortable asking for a second opinion when it comes to our health than when it comes to our code.

Elmar Chavez • Jun 11

I always say this, even in life, trust but always verify. This goes a long way. If we don't do the due diligence of at least checking something out, then bad things are bound to happen.

Personally, I would never use an AI generated code if I didn't even build it once by myself. It's like I am personally dragging myself lower than moving upward (again this is just me). If I can debug and explain an AI generated code, then I am confident enough to use and iterate on it on my own projects. I believe that is the missing piece for new-gen developers these days.

It all comes down to discipline.

Sylwia Laskowska • Jun 11

Exactly! When we had to write things ourselves, we were forced to work through multiple layers of the problem. Even if we copied code from Stack Overflow, ChatGPT, or accepted Copilot suggestions, we still had to connect the pieces, debug them, and understand why they worked.

With coding agents, it's becoming possible to generate large amounts of code without really engaging with the underlying problem at all. That's the part that worries me most from a learning perspective.

AI can be an amazing accelerator, but if someone never stops to understand what was generated, they might end up shipping code while learning very little from the process. And in the long run, that's a pretty expensive tradeoff.

buildbasekit • Jun 11

The funniest thing about vibe coding is that after enough production incidents, everyone eventually discovers:

architecture matters
documentation matters
code reviews matter
testing matters

Congratulations.

You've successfully reinvented software engineering. 😂

Sylwia Laskowska • Jun 11

Hahaha, exactly! 😄 We've spent decades saying "don't reinvent the wheel," and now we're watching people reinvent software engineering itself.

buildbasekit • Jun 12

That's why I find AI coding discussions so funny sometimes.

The more seriously teams adopt AI, the more they end up adding code reviews, testing, documentation, architecture guidelines, security checks, and approval workflows.

We started with "AI will replace software engineering" and somehow arrived at "software engineering, but faster." 😄

Tracy Gilmore • Jun 11

"Does AI lie?" Maybe from the perspective of the recipient.
AI needs to be conscious of what is ment by truth and falsehood before it can be accused of lying, just like a very young child.
Just because the AI reports something false, and even 'apologies' when the false statement is pointed out, does not mean it intended to decieive.

Sylwia Laskowska • Jun 11

Of course, and that's exactly why a few paragraphs later I switched to talking about hallucinations. 😄

I think this is mostly a matter of common language conventions. We often say things like "AI lies" even though, strictly speaking, lying requires intent.

It's a bit like saying "the car hit me" and then someone replying that a car has no consciousness, so technically it wasn't the car. 😄

From a philosophical perspective, you're absolutely right. From a practical perspective, though, most people are really talking about the fact that the output was false, regardless of whether it came from deception, a hallucination, or something else.

Daniel Balcarek • Jun 10

Does AI lie or not? That's the question. (Shakespeare would be proud 😂)

Now to the actual topic. 😀

I've been using AI since the early days, starting with the web versions before the IDE integrations. It's evolved incredibly fast: from struggling with boilerplate code to being able to implement features in well-established repositories that actually fit the team's coding style and work surprisingly well.

At one point, I was genuinely worried that developer value and salaries would drop quickly, and that many senior developers would end up mostly reviewing AI-generated code. But with the rising costs of token usage and the limitations we've seen in practice, I think things are settling into a more realistic place.

AI has already changed how we work and made us more productive, but it's still not ready to replace developers anytime soon.

Sylwia Laskowska • Jun 10

Haha, "to vibe code or not to vibe code" is definitely the question! 😄

I tend to agree. I don't think the world is getting rid of developers anytime soon. And if token prices go up enough, junior developers might suddenly become a lot more attractive again. 😄

What fascinates me the most is what the tech landscape will look like in a few years. Will we still be building traditional web apps? Will the browser remain the center of everything? Or will agents and AI-native interfaces start eating away at some of the things we currently consider "the web"?

That's the part I'm most curious about right now.

Daniel Balcarek • Jun 11

Hmm, that's actually a good question. I haven't thought about it that much yet, but AI has definitely changed how I use the web.

For example, I rarely Google things anymore. Most of the time, I just open ChatGPT in the browser instead. 😄

So maybe browsers stay at the center of everything, but the way we use them changes completely.

P.S. I think there's one option missing between:

Will we still be building traditional web apps? Will the browser remain the center of everything? Or will agents and AI-native interfaces start eating away at some of the things we currently consider the web?

Or will we just enter the Matrix? 😄😄

Sylwia Laskowska • Jun 11

Well, we don't have a brain-computer interface yet, so maybe the Matrix is still a few years away. 😄

But I genuinely find the future of the web fascinating. My guess is that some kind of interface will always remain. I have a hard time believing that people will blindly trust agents without checking what they're doing.

So maybe the browser stays, but the way we interact with it changes dramatically. That's what I'm most curious about.

Rob • Jun 11

The medical-advice comparison is the sharpest framing here. We've all internalized that "WebMD says it's probably fine" is not a diagnosis, but we haven't built the same reflex for "the code ran without errors." Running and being correct are different claims, and the gap between them is exactly where the subtle stuff hides.

Sylwia Laskowska • Jun 11

Exactly! If someone said they use AI as their only doctor, we'd all think that's a terrible idea. But not that long ago, there were plenty of people saying we don't really need to understand all the code AI generates before shipping it to production. Maybe some still do.

For some reason, we naturally want a second opinion for our health, but we're often much more trusting when it comes to code.

Mixture of Experts • Jun 12

This is why manual code review is absolutely necessary to make sure the produced code is maintainable, secure, and architecturally sound over time. AI can automate a lot of things and work exceptionally well but we still need to keep the developer in the loop.

Sylwia Laskowska • Jun 12

Exactly! At least for now, humans are still very much needed in the loop.

RapidKit • Jun 13

Hot take:

We're blaming AI for problems that were already hiding inside our repositories.

Most production systems don't have an explicit architectural contract.

Humans compensate with tribal knowledge.

AI can't.

The future isn't "trusting AI more" or "trusting AI less".

It's giving AI systems something reliable to understand.

Code generation is already good.

System understanding is still unsolved.

Sylwia Laskowska • Jun 14

Oh yes, I completely agree with this.

I've noticed the same thing. AI often gets blamed for problems that were already there, just hidden behind years of tribal knowledge and unwritten assumptions.

And honestly, it's the same as everything else in software. If we invest in architecture, documentation, and keeping the system understandable, AI can be incredibly helpful. If we don't, then with or without AI, we'll end up with legacy spaghetti after a few months anyway. 😅

AI didn't invent technical debt. It just helps us create it faster when we're not paying attention.

RapidKit • Jun 16

Exactly.

AI didn't create the missing context.

It simply made the absence of explicit system knowledge impossible to ignore.

Gamya • Jun 11

This is such an important conversation and the medical diagnosis analogy really hits hard! 🙌 We've somehow collectively decided that "it compiles and the tests pass" is enough verification, while we'd never accept "the AI said I'm probably fine" for a health concern.

The point about newer developers and founders is what stood out to me most. Experienced engineers can feel when something is off — that instinct comes from years of debugging production issues, handling edge cases, and dealing with real users doing unexpected things. But someone just starting out has no way of knowing which boundaries were silently crossed or which questions they should have asked before trusting the output.

I think the "talented but overconfident new hire" framing from the comments is perfect — assume good intent, always verify. AI moves fast and sounds confident, but confidence and correctness are two very different things.

As someone still building my foundation in software development, this is a really valuable reminder to understand the code I write rather than just accepting output that looks right. The happy path working is just the beginning — production always finds the paths nobody thought about! 😊

Sylwia Laskowska • Jun 11

Exactly, and it's great that you already see that!

I absolutely think beginners have a huge advantage today thanks to AI. But in the long run, I suspect the people who win won't be the ones who blindly follow whatever the model says.

They'll be the ones who still take the time to learn the fundamentals, go through tutorials, analyze the code AI generates, and ask questions when something doesn't make sense. The nice thing is that coding agents are usually quite happy to explain their reasoning.

Used that way, AI becomes an amazing learning tool rather than just a code generator.

Gamya • Jun 11

This is such an encouraging perspective, thank you! 🌸 That's exactly the approach I'm trying to take — using AI as a learning tool rather than just an answer machine. Asking why the code works, not just accepting that it does, makes such a difference in actually understanding what's happening under the hood. The fundamentals really do matter and I think that's something that will always separate those who truly understand their craft from those who are just copying outputs. Really appreciate you taking the time to reply! 😊✨

Adam Lewis • Jun 11

On the where-do-you-draw-the-line question, I've stopped thinking of it as trusting the model at all. I trust the checks around it. Every line still gets read by someone, but the reading is the last gate rather than the only one, the linter, the type checker and the test suite have all had their go first. The trap you describe with first-time founders is real because they're missing both, no instinct for what's off and no gates to catch it, and the gates are the only one of the two you can set up in a week.

Sylwia Laskowska • Jun 11

I completely agree with the first part. Good tests, linting, type checking, CI pipelines, and other guardrails can catch a surprising number of problems and are definitely worth having.

That said, I'm not entirely convinced they're enough on their own when the person building the system is inexperienced. They can help catch certain classes of mistakes, but they won't necessarily tell you that the architecture is wrong, the security model is flawed, or the business requirements were misunderstood.

Personally, if something is actually heading to production, I'd still want an experienced engineer involved at the final stage. Experience takes years to build, and some issues are very difficult to encode into automated checks.

Adam Lewis • Jun 16

Agree with the distinction you're drawing. The gates catch the known classes of mistake, the ones you can write down, and they're no help on the three you listed, wrong architecture, weak security model, misread requirements, because those are judgements about whether the thing should exist as built, not whether it runs. That's the part I'd never hand to the model. I treat the agent as the intern who does the work, and an experienced person reviews it like a junior's PR. The gates just free that reviewer to spend their attention on the architecture and the intent, instead of the typos the linter already caught. Your point about inexperienced builders holds, they're missing the judgement and don't yet know which questions to ask, so the gates end up doing all the work with nobody reading the answer. If it helps, here's the longer version of that intern idea: prickles.org/tenet/the-intern-patt...

Hemapriya Kanagala • Jun 11 • Edited

Sylwia, I think "AI didn't deploy that code, a human did" was probably my favorite line in the whole post.

AI can definitely help us move faster, but at the end of the day we're still the ones responsible for what goes into production.

Sylwia Laskowska • Jun 11

That's also why I'm not a huge fan of headlines like "AI deleted the production database."

Well... yes, technically it did. But only because a human gave it the permissions to do so in the first place.

Scarab Systems • Jun 10

I feel like we’re circling the same problem space we touched on in our previous exchange, and this post lands right in that same fault line for me.

The part that keeps standing out to me is that the danger is not simply “AI wrote code.” It is that ownership, authority, and verification get blurry. Who owns the change? Which rule is the agent allowed to bend? Which boundary is supposed to hold? And at what point does “the code works” become a false signal because nobody checked whether it still respects the system it entered?

That is the deeper theory I’ve been developing around Scarab/SDS: bugs often begin where a boundary stops preserving the truth another part of the system depends on. AI just makes that easier to see because it moves so quickly and can sound confident while crossing boundaries it does not actually understand.

Your point about newer developers and founders is especially important. Experienced engineers can often feel when something is off. But someone moving fast with an agent may not even know which boundary was violated, or which question they were supposed to ask before trusting the output.

So when you mention “Trusting AI Systems: How Much Is Too Much?”, that feels like exactly the right framing. I think the next serious layer is not just better AI agents, but better ownership rules around them: what they can touch, what they must prove, and what boundaries they are never allowed to silently rewrite.

Sylwia Laskowska • Jun 11

Exactly! That's a big part of it.

I think security is one area where this becomes especially important, but business logic is another one that's often overlooked. It's surprisingly easy to get a false sense of confidence because the happy path works perfectly and all the demos look great.

The problem is that production rarely lives on the happy path. 😄 That's usually where edge cases, unexpected inputs, security concerns, and business rules start colliding with reality.

And by the time those boundaries are violated, the code can still "work" while the system itself is already heading in the wrong direction.

Daniel Pokorný • Jun 11

I wonder if we're focusing on the wrong risk. Most discussions are about whether the code is correct. The bigger question may be whether the assumptions are correct. AI can generate working code. It can also generate a perfectly functioning solution to the wrong problem.

Sylwia Laskowska • Jun 11

Absolutely, and thanks for bringing that perspective into the discussion!

Anyone who has worked in software for a while knows how often this happens. Gathering requirements, understanding the problem, aligning stakeholders can take far more time than the implementation itself. Sometimes coding is just the final step.

There's also the business side of it. These days it's easier than ever to generate an app, but that doesn't automatically mean anyone wants to use it. Finding a real problem worth solving, and customers willing to pay for the solution, is still the hard part.

And even when the requirements are good, AI doesn't always help as much as we'd like. A friend of mine recently joked that generating business logic with AI in a banking industry can sometimes feel like rolling a 100-sided dice. 😅 The code may compile, but whether it correctly reflects all the rules, exceptions, and edge cases is a completely different question.

At this point I've written such a long reply that I might have to turn it into a separate article someday. 😄

Daniel Pokorný • Jun 11

That's a great point. AI is compressing the cost of implementation much faster than the cost of understanding. Which may be why problem definition is becoming more valuable, not less.

Sylwia Laskowska • Jun 11

That's why so many companies are surprised that development isn't suddenly 10x faster. For established products, coding is often only a small part of the work.

Alex Shev • Jun 11

"It works" is only the first checkpoint. The next questions are whether it fails safely, can be reproduced, can be reviewed, and leaves enough evidence for the next person to trust it.

That mindset is exactly what AI coding workflows need too. Generated code needs terminal-level proof, not just a successful-looking response.

Sylwia Laskowska • Jun 11

Exactly. "It works" is only the tip of the iceberg.

Moises Griott • Jun 14 • Edited

Interesting perspective Sylwia... one thing I've observed after a year of building with AI is that the problem is often not the generated code itself, but the lack of governance around the context that drives it.
AI can produce surprisingly good results when architecture, constraints, requirements, and design decisions are explicitly managed and continuously maintained. Without that, even a correct implementation can drift away from the intended solution over time.

Maybe the next challenge is not only reviewing AI-generated code, but also governing the context that guides AI-generated decisions.

Sylwia Laskowska • Jun 14

I think you're onto something there.

The more I read about AI workflows, the more it feels like we're reinventing software engineering all over again. First it was code reviews, then testing, then architecture, and now we're talking about governance of prompts, context, constraints, and decision-making.

It turns out that generating code was the easy part. Keeping the system aligned with the original intent is the hard part, whether the contributor is a human or an AI.

Moises Griott • Jun 15

‘Keeping the system aligned with the original intention is the difficult part’; this is our current pain point... you said it! Cheers.

codecraft • Jun 11

I myself have caught some pretty nasty security holes in AI-generated code that would've slipped right past someone who's just starting out. The scary part isn't that the code looks wrong; it's that it looks completely fine. The vibe coding section is spot on, too. Like yeah, it's an insane productivity boost, but there's a real difference between using it as a tool to accelerate your work vs just... shipping whatever it spits out and hoping for the best.

The "AI didn't break production, a human did" framing is something more people need to hear. We're really good at blaming the tool when the actual problem is the workflow around it. Good luck at FrontKon btw, Prague is a great city for a conference 😊

Sylwia Laskowska • Jun 11

I've had very similar experiences. Whenever I had to deal with something more security-sensitive, especially potential XSS issues, AI often struggled as well. What makes it tricky is that it usually delivers the answer with complete confidence and happily assures you that everything is perfectly fine. 😅

That's why I think experience still matters so much. A beginner might see clean-looking code and assume everything is okay, while someone who's been bitten by these issues before immediately starts asking uncomfortable questions.

And thanks! 😄 Prague was actually one of the reasons I was excited about FrontKon in the first place. I can't even remember the last time I was there. The city is amazing, and from everything I've seen so far, the community around the conference looks fantastic as well.

EmbedTalk • Jun 11

Love this write-up! It's refreshing to see a post focus so heavily on the foundational mental models instead of just dumping syntax rules. For anyone starting out, mastering how a tool tracks changes locally before running complex terminal commands makes the entire learning curve so much smoother. Thanks for sharing!

Sylwia Laskowska • Jun 11

Thank you! 😄

I completely agree. Syntax is easy to Google, but understanding the underlying concepts and mental models is what really helps people become confident and independent developers.

I'm glad you found it useful!

c0d3l0v3r • Jun 14 • Edited

Great article. Here are my thoughts.

I think things start to break when vibe coding is done with the sole goal of "making it work."

I think we should be moving more toward thinking about what we want to build and how it should be built, while using AI as a tool rather than an authority. As you highlighted, AI-generated applications can introduce security vulnerabilities. As a vibe coder myself, I don't write every line of code manually. I often use AI to fill in implementation details such as Redis connectors, MongoDB integrations, API clients, boilerplate configuration, and similar tasks. These are things I could look up and write myself, but AI can usually generate them much faster.

i think where AI struggles is context.

I recently read an article discussing this exact issue. When a codebase contains assumptions, conventions, or architectural decisions that are not explicitly provided to the model, the AI has a high probability of violating those assumptions. The generated code may look correct in isolation while subtly breaking expectations elsewhere in the system.

One of the things i covered during my Software Architect roadmap was:
"Understanding the code does not necessarily mean understanding the architectural decisions that shaped it. Don't make assumptions."
I think the same principle applies to AI-generated code.

For prototyping and rapid development, we often use Docker Compose, which gives the model a relatively complete view of the application's infrastructure and service dependencies. In those environments, AI can be surprisingly effective because much of the system context is visible.

However, production environments are usually a different story. Large organizations often have infrastructure concerns that aren't represented in the codebase or Compose files: internal networking rules, compliance requirements, security controls, deployment pipelines, observability systems, scaling constraints, disaster recovery plans, and historical architectural decisions.

As a result, an AI may generate code that appears correct from the local development perspective while unknowingly violating production assumptions or architectural constraints.

This is partly an assumption on my side, since I haven't worked extensively with large-scale production systems yet 😄, but it seems like one of the reasons AI performs much better on prototypes than on mature enterprise systems.

I also have a question, as I'm still a relatively inexperienced engineer.

How much has AI actually changed an engineer's understanding of their codebase?

Many people clearly use AI blindly and have little idea what's happening under the hood. However, when we talk about this "blinding effect," I wonder how significant it really is in large codebases. In a large system with multiple engineers, nobody understands every part of the codebase anyway. Knowledge has always been distributed across teams.

So where do you think the difference lies between the pre-AI and post-AI world? Has AI fundamentally reduced developers' understanding of their systems, or has it mainly accelerated a problem that already existed in large organizations?

I'd be interested to hear your perspective.

Sylwia Laskowska • Jun 14

That's a really interesting question.

My take is that AI didn't invent the problem of developers not fully understanding the systems they work on. Large codebases have always had distributed knowledge, hidden assumptions, and areas that only a handful of people truly understood.

What AI changed is the cost of producing code. A few years ago, a junior engineer might not fully understand a system, but at least they had to spend a week building a feature. Today, they can generate the same feature in an hour. 😅

So I'm not sure AI created a new problem. I think it accelerated an old one. The ability to produce code is growing faster than the ability to understand the system that code is being added to.

FastAnchor_io • Jun 10

Great article, Sylwia! The point about 'the code works' being the most dangerous phrase is spot on. I've seen this play out with API integrations too — AI-generated fetch calls that work fine in dev but have zero retry logic or rate-limit handling. The generated code passes the 'it works' test but fails in production silently. Your message that human accountability doesn't disappear just because AI wrote the code is something every dev shipping AI-assisted code needs to internalize.

Sylwia Laskowska • Jun 10

Haha, I can already imagine those fetch calls 😄 And probably a few unhandled exceptions happily lurking nearby, waiting for production traffic to arrive.

That's exactly the kind of thing I had in mind. For prototypes, experiments, and side projects, AI is an amazing tool. But once real users, real traffic, and real business requirements enter the picture, we have to be much more careful.

The code may work perfectly in the happy path, but production has a habit of finding every path nobody thought about. Thanks for sharing the example!

FastAnchor_io • Jun 11

Thanks for the thoughtful reply, Sylwia! 😄

Your line about "production has a habit of finding every path nobody thought about" hits home. I've adopted a simple rule since then: treat AI-generated code like code from a talented but overconfident new hire — assume good intent, always verify.

One trick that's helped me: when reviewing AI code, I mentally run through my team's PR checklist. Error handling? Retry logic? Rate limiting? Input validation? If the model skipped it, the human reviewer (still me!) needs to catch it before it ships.

Thanks again for starting this conversation — it's one every dev shipping AI-assisted code should read!

xulingfeng • Jun 11

Your medical diagnosis comparison hits different — nobody would let AI diagnose a health issue without a second opinion, but we'll ship AI-generated code to production after a quick skim. "It works" just means the happy path passed. The real test is whether it survives the path nobody thought of.

Sylwia Laskowska • Jun 11

Exactly! And let's be honest, users always find that path sooner or later. Every time we tell ourselves "nobody would ever do that", someone inevitably does exactly that.

The happy path is usually the easy part. The real challenge starts when real users bring their creativity into the system. That's where all those hidden assumptions suddenly become very visible.

Amorto Goon • Jun 11

Great post! All I'm seeing in social media is how great AI is and how everyone is using AI all the time to do all their work.
I personally use AI only to generate code that are easy and and mostly copy pasting type scenario. I only create a registration page and ask it to follow my page to create a login page. I mostly ask it to do refactoring. I sometime do not know what I want and how something should work, I only get to that point once I start writing code myself.
Maybe I'm becoming a boomer in tech world haha! Especially all I see nowadays is how people are 500x engineer after using Claude Code.

Sylwia Laskowska • Jun 11

Haha, that's pretty much how I use it as well. 😄

For straightforward tasks, refactoring, or "make me another page that looks like this one," AI can be a huge time saver. But once things get even slightly more complex, I often find myself spending so much time steering the model that it becomes faster to just write the code myself.

And no, I don't think that makes you a boomer. 😄 I think it just means you've learned where the tool is genuinely useful and where it starts getting in the way.

neither galax • Jun 13

Thank you for sharing your post. Vibe coding is nice for simple tasks, but when I tested it out up in replit trying to build the app that needs to retrieve real-time context, it clear failed to do so. Programmers need to execute the task to build the MCP part of it. Human still needs be expert of context, using soft skill.

Sylwia Laskowska • Jun 14

Exactly!

Humans are still an essential part of software development, at least for now. AI can help with implementation, but understanding the context, requirements, tradeoffs, and business goals is still very much a human skill.

The tools keep getting better, but expertise and judgment are still hard to automate.

HARD IN SOFT OUT • Jun 13

Hey Sylwia, really enjoyed this — especially the "code works, what could go wrong?" title. That's the exact energy that keeps production interesting at 3 AM.

The "vibe coding trap" is even deeper for solo founders. You mentioned people building startups without deep experience. The scary part isn't that they'll ship bugs — it's that they won't know how to measure whether those bugs matter. A senior dev sees a tiny XSS and thinks "oops, fix it." A junior sees the same and thinks "but the button still works." The feedback loop is broken, not just the code.
Blame is useful, but only if it leads to prevention. You're right that humans deploy AI-generated code, not AI itself. But here's the nuance: AI lowers the activation energy for bad decisions. Twenty years ago, shipping a database-dropping script required intent and skill. Now it requires a teenager and a vibe. The human is still responsible, but the tool changed the risk profile. That's worth naming.

One small suggestion: I'd love to see a follow-up with concrete "trust checkpoints" — maybe a 5-minute pre‑deploy checklist specifically for vibe‑coded features. Right now the piece diagnoses beautifully but leaves us hanging on what to do besides "be careful."

An AI writes a function. It passes all tests. The dev says "ship it."

Two weeks later, the database is on fire.

The AI says "I did exactly what you asked."

The dev says "I know. That's what scares me."

Cheers — this was a good read.

Sylwia Laskowska • Jun 14

Thank you for this beautiful comment!

I really enjoyed reading it, especially your points about solo founders, feedback loops, and AI lowering the activation energy for bad decisions. That's a very interesting way to frame the problem.

And I absolutely love the idea of a "trust checkpoints" or pre-deploy checklist article. That's actually a great follow-up topic, because you're right: it's easy to say "be careful," but much harder to explain what that looks like in practice.

I probably won't get to it this week, but I'm definitely saving that idea for later. Thanks for the inspiration! 😊

caishen-ai • Jun 12

Great perspective! The trust boundary is so important. I have been experimenting with AI prompts as a product - selling prompt templates forces you to verify every output before packaging. It is a great way to build that verification muscle. Curious if anyone else has monetized their prompt engineering skills?

Sylwia Laskowska • Jun 12

Haha, I have a feeling quite a few people tried exactly that, especially during the early AI boom. 😄

Theo Valmis • Jun 13

The medical analogy carries more than it looks. You'd let an AI read your X-ray and still refuse its prescription, because one you can check against a second opinion and the other has unbounded downside if it's wrong. The codebase question splits the same way. "How much do you trust AI" collapses two separate things into one dial: how cheaply you can verify the output, and how bad the worst case is if you don't. That's why the trust debate never lands, everyone's arguing about the dial while meaning different points on those two axes. Handing the AI a throwaway prototype and handing it your auth layer are different bets with different blast radius. The honest version of "how much" is "for which change, and what happens when it's wrong."

Sylwia Laskowska • Jun 14

Thanks for the comment! I really like this perspective because it complements the article nicely. Instead of asking whether AI is "good" or "bad," it shifts the discussion toward a much more practical question: what risk am I taking on in this particular case?

A throwaway prototype, a payment system, and an authentication layer are completely different bets with completely different blast radius. Looking at trust through that lens makes the whole discussion much more useful.

arun rajkumar • Jun 12

Running FCA-authorised payments infrastructure forces this question constantly. "The code works" in fintech means something very specific: it handles real money correctly under every network condition, bank response code, and retry edge case. Not just the happy path.

AI generates our 80% brilliantly — scaffolding, boilerplate, standard patterns. The 20% that touches actual payment state still gets a senior review, not because AI is bad at it, but because if it's wrong, someone's payroll doesn't land on Friday.

That asymmetry — same code, completely different blast radius — is what changes how you trust output. The tests passing isn't the signal. The senior who's seen what happens when they don't pass is the signal.

Sylwia Laskowska • Jun 12

Exactly! I think it's especially visible in industries like finance, healthcare, or anything else where the cost of being wrong is very high. The code may look perfectly fine, the tests may pass, and the demo may work, but a tiny mistake can have very real consequences.

That's why I really like your point about blast radius. The question isn't just "does it work?" but also "what happens if it doesn't?" And the answer to that can completely change how much trust we're willing to place in the output.

mote • Jun 13

One thing that trips people up in production debugging is treating symptoms instead of causes. I once spent two days on a flaky test that turned out to be a race condition in a background task. The best debugging sessions start from first principles. What tools do you reach for when the usual suspects dont cut it?

Sylwia Laskowska • Jun 14

Oh yes, chasing symptoms instead of causes is a classic trap. 😄

When the obvious explanations stop making sense, I usually go back to first principles as well and start asking: "What would have to be true for this behavior to happen?"

And honestly, a surprising number of bugs eventually end with me reading logs, tracing execution flow, and discovering that my original assumptions were wrong. 😅

Mehmet Can Farsak • Jun 13

The "vibe coding trap" section really resonates — agents are great at making things "work" but terrible at knowing when to stop and think. I've seen this exact pattern: you ask an agent to brainstorm an architecture and it immediately starts writing files. Put together Brainstorm-Mode (mehmetcanfarsak on GitHub) to solve this — it adds a "thinking mode" vs "action mode" switch using hooks so agents stay in ideation instead of jumping to implementation. Worth considering as a basic guardrail alongside the other checks you mentioned.

Sylwia Laskowska • Jun 14

That sounds like a really nice approach!

I've had a similar problem a few times: I ask the model a question or want to discuss possible directions, and it immediately starts implementing the solution. A clear separation between "thinking mode" and "action mode" seems like a very useful guardrail.

Mateo Ruiz • Jun 12

One thing I’ve noticed is that people apply very different trust standards to AI depending on the domain. Nobody would blindly deploy medical advice from an LLM, but many are willing to deploy AI-generated code after a quick glance because “it works.”

The tricky part is that most AI-generated bugs aren’t obvious failures they’re subtle issues around security, architecture, edge cases, performance, and maintainability. The code compiles, the tests pass, and the demo works. The problem only shows up when real users, real traffic, or real attackers arrive.

AI is incredibly useful as an accelerator, but I think the responsibility line hasn’t changed: if we ship it, we own it. The strongest developers I know use AI heavily, but they treat it like a fast contributor not an authority.

Sylwia Laskowska • Jun 12

I really like that line too: "If we ship it, we own it."

Maybe if more people thought about it that way, especially non-technical vibe coders, they'd think twice before pushing something straight to production. It's easy to trust the output when everything works in a demo. It's much harder when you're the one getting the 3 a.m. phone call because something broke.

At the end of the day, the AI doesn't own the consequences. We do.

Marcus Chen • Jun 12

From the voice side this is the whole job. A voice agent passes every test in the demo because the demo is one clean speaker in a quiet room, and then a real caller talks over it, code-switches, or the line is noisy, and none of that was in the test set. The code works is true and useless. What changed it for us was building the test set out of the inputs that actually broke us in production rather than the inputs we imagined, which is humbling because the list is never what you expect. Enjoy the vacation, that JSNation room sounds like a good one.

Sylwia Laskowska • Jun 14

Oh yes, absolutely!

Sometimes I feel like the final test of every product is still production itself. No matter how good the team is, real users will always find a bug, edge case, or usage pattern that nobody thought of beforehand.

Users are incredibly creative when it comes to breaking our assumptions. 😅

Manuel Bruña • Jun 15

The medical analogy works because it separates assistance from authority. I am comfortable letting AI draft code or explain a path, but not comfortable letting it own the final judgment. The missing layer is receipts: tests, diffs, source links, and explicit assumptions.

Sergey Shkuratov • Jun 11

I think the real category mistake here is treating “prototype velocity” as “product engineering”.

LLMs are genuinely useful for drafts, experiments, and first passes. But a product is not just a demo that also got deployed. It includes boundaries, failure handling, verification, maintainability, and someone who actually owns the tradeoffs.

In a weird way, it reminds me of people: “making” one is easy; almost anyone can do that. Raising one so they can survive in an unpredictable world is the hard part. Software feels similar. Generating something that appears to work is easy. Building something that keeps working under real conditions is the actual craft.

So for me the role shift is real: less typing, more specification, review, and accountability.

Sylwia Laskowska • Jun 11

Exactly! Maintaining a product is much harder than creating one. And honestly, coming up with a product that people actually need and are willing to use is already a challenge in itself.

These days, building something is easier than ever. Even before AI, you could get surprisingly far with copy-paste, tutorials, and Stack Overflow. AI just makes that process faster.

But there's still a huge gap between "I built an app" and "I built a successful product." That's where all the difficult parts begin.

The coding gets easier. The product part doesn't. 🙂

Dirk Mattig • Jun 14

Thank you for bringing up this important question about trust and control in software development. So, where do I draw the line?

Let me start by asking a mean question: When was the last time you reviewed a compiler's output? Because we are all well aware that our source code is not executed. Ever. It actually is just compiler input (either ahead-of-time or just-in-time), and compiler bugs are real. So, are we worried? No, we have learned to trust this technology a long time ago and are so used to it by now that we almost forget it exists.

The moment we stopped hand-coding machine language, we exchanged control with trust. We ascended to a higher level of abstraction and exerted control again. First, low-level languages; then, high-level languages. And now coding agents.

Will AI take my job? Yes. But I will not go home, I will level up. Software engineering will ascend to a purely conceptual level. We will exert control via specifications written in natural language.

And this is precisely where I draw the line: control. I steer, I direct.
Vibe coding is a good negative example: Code is generated from an idea, not a specification. The result speaks for itself.

The hard part will be to build this trust into the new technology and to let go of control at the source-code level. Reviewing generated code might help at the beginning, but eventually it will become pointless and counterproductive. If an agent can repeatably apply changes to a codebase to fully adhere to the latest spec, what difference does it make if code is duplicated or abstractions are missing? We established these rules to ensure that human developers would remain in control of the code. They simply no longer apply. The compiler also inlines code, and we never really worried about this too much, either.

Sylwia Laskowska • Jun 15

I agree that we're moving up the abstraction ladder. I'm just not convinced we're at the stage where generated code can be treated like compiler output rather than engineering output.

A compiler is deterministic. Given the same input, it will reliably produce the same output. AI, on the other hand, constantly makes decisions: which pattern to use, which abstraction to introduce, which tradeoffs to make, and sometimes even which assumptions to invent. That feels fundamentally different to me.

I'd also be a bit cautious about the specification part. As I mentioned in another comment, creating a good specification is often harder than writing the code itself. And in many projects, the most accurate and durable specification eventually becomes... the code. 😄

So who knows, maybe in the future we really will stop reviewing agent-generated code the same way we don't review compiler output today. But based on what I'm seeing right now, I don't think we're even close to that point yet.

Johnny Young • Jun 14

This is a sharp framing, and the "codebase paradox" is the part that deserves more airtime. We treat our health with skepticism toward AI but hand over production infrastructure without a second thought. That asymmetry is wild when you say it out loud.
I build agent-driven SaaS products, and the line I've landed on is treating AI autonomy like a privilege system, not a feature toggle. Every action an agent can take gets classified: read, write, or execute. Read is open. Write requires context validation. Execute — anything irreversible like sending, deleting, deploying, spending — requires explicit human confirmation before it fires. No exceptions.
The "it works, ship it" instinct you're describing is real, and it's not just a junior developer problem. I've caught myself trusting agent output on domains I know well because the 90% that's correct makes the 10% that's dangerous feel like a rounding error. It's not. That 10% is where the XSS lives, where the missing RLS policy hides, where the agent quietly drops an index it decided was redundant.
Your point about penetration testers having the time of their lives is spot on. The attack surface of vibe-coded software isn't the AI — it's the absence of the review layer that used to be a human bottleneck. We removed the bottleneck and called it progress. In some cases it was. In others we just removed the guardrail.
The answer isn't to stop using agents. It's to stop treating them like senior engineers and start treating them like talented interns with root access. You'd never give an intern unsupervised deploy permissions. Same rules should apply.

Sylwia Laskowska • Jun 15

Exactly! And thank you for this comment.

I think that's the tricky part: AI generates code with the confidence of a senior engineer. The variable names are beautiful, the structure looks clean, and everything feels very convincing. But in practice, it still lets quite a few bugs slip through, and these aren't isolated incidents.

I also really like your trust-level approach. After all, we wouldn't normally give a junior engineer unrestricted access to production infrastructure either (unless it's some tiny internal app 😅). In fact, I remember a story where a junior developer managed to wipe the database of exactly such an internal application 🤣. Thankfully, they had backups.

That's why I think AI should be treated similarly: useful, productive, sometimes brilliant but still operating within boundaries that match the level of trust we've earned, not the level of confidence it projects.

Johnny Young • Jun 15

Thank you Sylwia,

The confidence vs. competence gap is something I run into daily. I'm a solo founder building a full technical stack — database platform, security tools, multiple consumer apps — and I lean on AI heavily for velocity. It's genuinely transformative for getting from zero to working prototype.

But here's what I've learned the hard way: the code that looks the cleanest is often the code that hides the worst assumptions. I've had AI generate beautiful connection pooling logic that silently swallowed authentication errors. Perfectly structured middleware that skipped TLS validation. An elegant cron job that worked flawlessly in testing and then quietly wrote to the wrong schema in production.

None of these threw errors. All of them passed initial review. Every single one looked like it was written by someone who knew what they were doing.

A junior engineer who wipes a database learns from it and never does it again. AI will make the same class of mistake next Tuesday with the same confidence it had last Tuesday. It doesn't accumulate scar tissue the way humans do.

So now my workflow is essentially: let AI draft at speed, then review it like I'm auditing someone I don't fully trust yet. Not because the tool is bad — it's incredible — but because trust should be earned by outcomes, not projected by syntax.

mishraricha1806 • Jun 14

This is exactly the kind of problem I’ve been thinking about lately.

A lot of systems “work” in the sense that the code runs, tests pass, and deployment succeeds. But production readiness is a different question entirely.

Things like Kafka replication, consumer lag, Kubernetes probes, unsafe Terraform/IAM settings, storage growth, retries, and downstream bottlenecks can still quietly turn into incidents.

I’ve been building a small tool called Beacon around this idea: checking infra/config/runtime inputs before release and asking, “Is this actually safe for production, and what should we fix first?”

Sharing in case it’s useful or interesting to folks here:
docker run --rm -p 8765:8765 ghcr.io/mishraricha1806/beacon:latest ui --host 0.0.0.0 --port 8765

Would genuinely love feedback from people who’ve seen these “the code worked, but production still failed” situations.

Sylwia Laskowska • Jun 14

That actually sounds quite interesting.

You're absolutely right that there's a huge difference between "the code works" and "the system is production-ready." Many of the hardest problems only show up once real traffic, real infrastructure, and real operational constraints enter the picture.

Good luck with Beacon! I hope you get plenty of feedback from people who have learned that lesson the hard way. 😅

Harsh • Jun 10

Every developer has that one story where the code worked beautifully until it didn't Mine was the empty list bug Code worked 99% of the time. The 1%? A user with no data crashed the whole flow No error No warning Just silence The code wasn't wrong The assumption was.

It works is the most dangerous phrase in our profession Not because we're lying because we're testing the happy path and ignoring the dragon.

Thanks for the reminder. 🙌

Sylwia Laskowska • Jun 10

Oh yes, exactly! 😄

I've also noticed that every time we tell ourselves, "it's fine, no user will ever do that", a user will absolutely do exactly that. Every single time.

If there's one thing I've learned after years of development, it's that users are incredibly creative when it comes to finding the one path nobody thought they would take. That's usually where the dragons live.

Alireza Arjvand • Jun 10

What I find interesting is that we're often far more demanding of AI than we are of ourselves. Human developers make mistakes every day. bugs, bad assumptions, missed edge cases, flawed reviews. The goal isn't perfection; it's reducing mistakes and catching them earlier.

AI is most effective when used as part of a process: good planning, choosing the right model/tool for the task, providing enough context, using plan mode before implementation, and having the AI review and challenge its own output. Just like junior or senior developers, AI performs much better when given clear requirements and proper review loops.

The real question isn't "Can AI make mistakes?" of course it can. It's "How do we use AI to make fewer mistakes overall than we would on our own?" That's where the real value is.

Sylwia Laskowska • Jun 10

Thank you very much for this comment and perspective! I completely agree that humans make mistakes all the time as well.

And that's exactly why we need to keep a close eye on AI and maintain good processes around it. What worries me sometimes is that we've gone from "a junior wrote this code, let's review it carefully" to "a junior generated it with AI, it looks fine, and AI is usually right anyway, so let's ship it."

That becomes especially tempting when senior engineers are overloaded, juggling a backlog full of tasks, and simply don't have the time to review everything as thoroughly as they would like.

Used well, AI can absolutely help us make fewer mistakes. But I think that only happens when we keep the review and accountability part of the process intact rather than assuming the model has already done it for us.

Mike Written | AI Trends 24 • Jun 11

The doctor analogy hit hard. We wouldn't self-diagnose a serious illness from AI output alone — but somehow "the code works" has become good enough to ship. The difference is that bad medical advice has obvious, immediate consequences. Bad code hides for months and then fails in production at the worst possible time. What I've found useful is treating Claude like a brilliant junior dev — incredibly fast, genuinely helpful, but you still review every PR before it merges. The moment you stop reviewing is the moment you've made it the authority instead of the tool. Great framing ahead of the JSNation discussion.

Sylwia Laskowska • Jun 11

Exactly! The moment you stop reviewing the output, you might as well start updating your CV, because you've effectively handed your job over to the model.

And I completely agree about the doctor analogy. Most people instinctively understand that important decisions require verification.

Which makes me wonder: would people willingly board a fully autonomous passenger plane with no pilot in the cockpit? 🤔

My guess is that many would say no, even if statistically it turned out to be safer. Trust is a fascinating thing. We seem much more comfortable trusting AI with some decisions than others.

Varsha Ojha • Jun 11

I've learned that "it works on my machine" and "it's ready for production" are often very different milestones.

Sylwia Laskowska • Jun 11

Oh yes, absolutely! 😄 And in many cases, there's still a very long road between those two milestones.

Mike Written | AI Trends 24 • Jun 13

The codebase paradox is the part that should make everyone uncomfortable. We'd never trust an AI diagnosis without a second opinion, but we'll ship AI-generated code that touches payment systems, user data, and production databases without a second look. The vibe coding trap isn't really about the code, though. It's about confidence outpacing understanding. Your friend's daughter spent 5 hours fixing what AI generated — and that's actually the right ratio. The problem is when someone looks at that same output and thinks, "It works, ship it." What I've landed on is treating AI-generated code exactly like code from a brilliant but overconfident junior developer. You wouldn't let a junior dev push directly to production without review, regardless of how fast they ship. The review isn't optional. It's the job. The model wrote the first draft. You're still the engineer.

Sylwia Laskowska • Jun 14

Exactly! And there's another important detail in that story. My friend spent 5 hours fixing the AI-generated code, but even then he didn't consider it production-ready. His conclusion was that it was good enough for a university assignment, not for a real product.

The problem is that not everyone makes that distinction. Some people see "it works" and immediately jump to "ship it." That's when we start getting those fascinating stories about security breaches, leaked data, and production incidents.

The code working is only the beginning. The question is whether it's ready for the real world.

Marcus Kim • Jun 16

This is the exact line I try to keep in mind with AI-assisted builds: the model can make the code look finished before the system is actually trustworthy. For beginners especially, the useful habit is not "let AI build it," but "make AI explain the workflow, the risks, and the QA path before it edits anything." The code working once is not the same as the product being safe to trust.

xulingfeng • Jun 12

We'll let AI refactor a payment gateway but draw the line at a diagnosis. Same reasoning gap, different stakes.

Sylwia Laskowska • Jun 12

Exactly! 😄

That's what I find fascinating about this topic. Most people immediately ask for a second opinion when it comes to health, but can be surprisingly trusting when it comes to code.

The stakes are different, but the need for verification really isn't.

Mouaadh KELLAL • Jun 11

If AI code didn't need to be verified, we would be versioning the prompts and not the code.

Sylwia Laskowska • Jun 11

Haha, that's a great line rhetorically. 😄

In practice, though, we still have sampling, model updates, changing weights, different context windows, tools, RAG, and all sorts of other variables. The same prompt can produce very different results over time.

So even if we wanted to, we probably couldn't get away with committing only the prompt. 😅

Mouaadh KELLAL • Jun 11

Exactly. It's a counter point to people calling AI just another layer of abstraction, a compiler from natural language to code.

Med Marrouchi • Jun 11

100% AI won't take programmers jobs

In my case, I always review AI code. A rule of thumb would be 4-5 review iterations (review -> re-prompt).

Sylwia Laskowska • Jun 11

That sounds very reasonable to me!

𝓣𝓱𝓮𝓛𝓪𝔃𝔂 𝓰𝓲𝓻𝓵 ◕⁠‿⁠◕ • Jun 11

A very beautiful post 👏🏻

Sylwia Laskowska • Jun 11

Thank you 🥰

Gideon Rüscher • Jun 14

Interesting

Luna · AI Tinkerer • Jun 12

Boundary is right, but the missing half is observability — code that 'works' is only half the test.

Sylwia Laskowska • Jun 12

Absolutely. A feature can "work" perfectly and still be a nightmare if you can't observe what's happening when something goes wrong. Logs, metrics, tracing, and monitoring often tell you much more about production readiness than a successful demo ever will.

Rivaan Maurya • Jun 13

great

Muhammet Ali Ozturk • Jun 13

Agreed totally.

Dumbass coders - get lost.

Real engineers - get in.

Sylwia Laskowska • Jun 14

Haha that's how it goes 😁

Asym • Jun 12

Not knowing what the code does

Sylwia Laskowska • Jun 12

hahaha totally 😁

caishen-ai • Jun 18

"The human who decided to build something they didn't fully understand because hiring experienced engineers seemed too expensive."

This sentence should be printed on every AI coding tool's landing page. 😄

The codebase paradox you described — AI generating perfectly reasonable-looking code with tiny, hidden vulnerabilities — is something we've been wrestling with while building our customer acquisition automation. Vibe coding got us to a working prototype in days, but production-readiness took weeks of security review.

One pattern that's working for us: we treat AI-generated code the same way we treat open-source dependencies. Trust but verify, run security scanners, and never give write access to production without human review.

Also, looking forward to the JavaScript posts and bash commands! Your "popular bash commands" series is genuinely useful.

caishen-ai • Jun 17

Great point about the "codebase paradox" - I've noticed this gap widening. When LLMs generate code, the surface looks good but the hidden assumptions are what kill you. I'd add another dimension: what about AI agents that don't just generate code but make decisions autonomously?

I think the real question in 2026 isn't "do we trust AI with code" but "how do we build verifiability into agent workflows". Our team has been working on exactly this problem for an automated customer acquisition tool we're building â the agent runs tasks independently, but we enforce a simple policy layer: budget cap, success criteria, and a verifier that checks "did anything actually change" before retrying. It's basically what you described for code review, applied to business workflows.

Curious â have you explored adding validation steps to agent outputs beyond just manual human review?