Gamya

Posted on Jun 21

When Judgment Becomes the Bottleneck

#discuss #ai #productivity #watercooler

Treating judgment as inspectable state

A few days ago I published a lighthearted post about building a coding mascot generator with Google AI Studio. The app itself — MascotCraft Studio, complete with a mascot named Octo-Byte — wasn't the point of the post. It was a fun side project. But the comments turned into something I've been thinking about ever since.

The Comment That Stuck With Me

Someone left a comment that's been rattling around in my head:

"We're moving from an era where implementation was the bottleneck to one where judgment becomes the bottleneck. When anyone can generate code, interfaces, and integrations in minutes, the differentiator becomes identifying worthwhile problems, defining clear requirements, and recognizing whether the result is actually good."

I read that, nodded, moved on with my day — and then kept coming back to it.

What "Implementation Was the Bottleneck" Used to Mean

Think about what it took to build something like MascotCraft Studio even three or four years ago. You'd need:

Someone who knows frontend (to build the UI)
Someone who knows how to call an image generation API
Someone who knows how to call a language model API
Someone who knows how to wire those together into a coherent app
Someone who knows how to deploy it

That's a team. Or at minimum, a single person wearing a lot of different hats, each requiring real expertise.

I described what I wanted in a paragraph. The implementation step — all of the above — happened in minutes.

So... What's Left?

If the hard part used to be "can we build this," and that part is now fast, what's the hard part now?

Based on that comment thread, it's things like:

Identifying worthwhile problems. Anyone can generate an app. Generating an app that solves a problem someone actually has is different.
Defining clear requirements. My prompt for Octo-Byte was reasonably specific, but Gemini still made a bunch of decisions I didn't ask for — color palettes, visual styles, a gallery feature with local storage. Some of those were great. One of them (the gallery using localStorage) was pointed out by another commenter as something that wouldn't actually hold up if this were a real product — saved mascots vanish if you switch browsers or clear your cache.
Recognizing whether the result is good. This is the one I think about most. I looked at Octo-Byte's bio and thought "this is charming and well-written." But charming and well-written isn't the same as correct or appropriate for the use case. Evaluating output quality is its own skill, separate from being able to produce output at all.

The Part That's a Little Uncomfortable

Here's the thing I keep circling back to: judgment isn't something you can prompt your way into.

You can ask an AI to "review this code for bugs" or "tell me if this design is good," and it'll give you an opinion. But knowing whether that opinion is trustworthy — knowing enough to push back, to say "actually, for my use case, that tradeoff doesn't make sense" — that still requires you to understand the problem space yourself.

In other words: the easier it gets to generate things, the more it seems to matter that you actually understand what you're generating and why. It's less "know how to build everything yourself" and more "be able to tell good implementation from bad, quickly, across a much wider range of things than you could personally build by hand."

What This Means in Practice

I don't have this fully figured out, but it's shifted how I think about a few things:

When I look at a piece of code or a generated feature now, I try to also think about "what would a wrong but plausible-looking version of this look like?" — because that's the version judgment needs to catch.
When AI tools generate something for me (like Gemini did with MascotCraft Studio), I try to actually read through what was added rather than just checking "does it work." The localStorage gallery point only came up because someone else looked closely enough to notice it.
I'm less worried about "will AI make skills obsolete" and more curious about "which skills are becoming more valuable because of this shift" — and judgment, evaluation, and knowing what questions to ask seem to be high on that list.

An Open Question

I don't have a tidy conclusion here, because I don't think there is one yet — this feels like something the whole industry is figuring out in real time. But I'm curious: if "judgment becomes the bottleneck," how do you actually practice and sharpen that judgment deliberately, rather than just hoping it accumulates as a side effect of experience?

If you've got thoughts on this, I'd genuinely like to hear them. 🌸

Top comments (16)

csm • Jun 21

When personal computers were first introduced, some choose to use it for office tasks, some choose to use it for gaming and some for making those software.
At the end of the day programming is all about giving instructions to the computer.
Whether they pass through assembler or python's interpreter or chatgpt's prompt, we get results.
Bottlenecks are present in all times:
In c world its about pointers and memory , in python world its about objects and references, and in rust world its about ownership and borrow checker, and in AI world its about context, memory and tokens.

Its true that Judgment is key, but I think its just about looking at the output!
If the output is matching our needs and expectations then its fine!

But, one thing we need to differentiate here:
Good Judgment or judgement is about making AI to work and get quality results
Its different from identifying worthwhile problems!

Because, not just AI, any thing like a programming language or any tool is just a tool.
We can't decide for what it should be used for!
For an office accountant only excel and PowerBI are worthwhile things, because they solve their problems.
For a Marketing guy or a designer, figma is worthwhile.
For, a pro gamer, video games are the worthwhile things.

Now, think atleast for video games, many people used to think they are waste of time and money, yet they exist!

A kid draws a sketch with a pencil, thats the same tool used to draw a pro drawing.
Can we say the kid has no right to use pencil for some rough sketches?

All I want to say is a professional in his office time will handle the real important things using AI as craft,
while a guy in his free time will enjoy the outputs of AI as art!

Gamya • Jun 22

Really enjoyed this perspective! 😊 The bottleneck-per-era framing is a great way to think about it — pointers in C, ownership in Rust, context and tokens in AI. Each era has its own version of "the thing you have to actually understand to get quality results."
And the pencil analogy is spot on — the tool doesn't define the worthwhile problem, the person holding it does. I think where I'd add a nuance is that "looking at the output" as judgment works well when you already have enough domain knowledge to know what good looks like. The tricky part is when someone is new enough to a space that the output looks right without them being able to tell it isn't—which is where the deeper judgment piece comes in. But that's probably a whole separate post! 🌸

csm • Jun 22

"The tricky part is when someone is new enough to a space that the output looks right without them being able to tell it isn't—which is where the deeper judgment piece comes in."
True, I agree!

NOVAInetwork • Jun 21

Your line "what would a wrong but plausible-looking version of this look like" is the whole skill in one sentence, and it is also the answer to your closing question. You sharpen judgment by forcing yourself to produce the plausible-wrong version on purpose, before you trust the real one.

Concretely, the practice that has moved my judgment fastest: before I accept any non-trivial output, I write the failing test first, the one that should fail if the thing is wrong, and I make it actually fail for the reason I expect before I let the fix make it pass. The discipline is not the test, it is that I have to articulate what wrong looks like in advance. If I cannot describe the failure, I do not understand the thing well enough to judge the output yet. That gap is the signal.

The other half is refusing to accept "it works" as evidence. It works is the happy path. Judgment lives in the failure modes, so I make myself enumerate them: what does this do on malformed input, on a dropped node, on the edge state. Most plausible-wrong output passes the happy path and dies in exactly the case I did not think to name. The 90 percent right version is more dangerous than the obviously broken one, because nothing trips an alarm.

So to your question directly: judgment does not accumulate as a side effect of experience, it accumulates from the habit of predicting failure before you look for it. Experience only sharpens it if every time you are surprised, you ask why your model of "wrong" missed that case. The surprises are the training signal, not the successes.

Gamya • Jun 22

"If I cannot describe the failure, I do not understand the thing well enough to judge the output yet" — that line alone is worth saving. It reframes the whole question from "does this look right" to "can I articulate what wrong would look like," which is a much harder and more honest bar.
The point about the 90% right version being more dangerous than the obviously broken one really landed too. The thing that trips no alarms is exactly the thing that costs the most later, because nothing flags it for review and it gets built on top of.
And yes — "judgment accumulates from the habit of predicting failure before you look for it" is the clearest answer to my closing question I've seen in any of the comments. The surprises being the training signal, not the successes, is going to change how I think about this going forward. Thank you for this! 🌸

NOVAInetwork • Jun 22

Glad it landed. The one caution I would add to "surprises are the training signal": only if you actually log the surprise the moment it happens. My instinct when something surprises me is to fix it and move on, and the lesson evaporates. The habit that makes the signal real is writing down what my model predicted versus what happened, right then, before the fix makes it obvious in hindsight. The surprise is only training data if you capture it while it still feels like a surprise.

Hiren Kava • Jun 21

Hi,

Thanks for sharing your article. I really liked your perspective that AI is shifting the bottleneck from implementation to judgment—it highlights a practical understanding that building software is no longer just about writing code, but making sound technical decisions and evaluating trade-offs.

I have a couple of technical questions related to our current project:

In a production blockchain application where AI helps generate parts of the codebase, how would you validate the correctness and security of smart contract interactions and backend transaction flows before deployment?
If you were designing a high-availability backend for an on-chain payment or bridge system, what monitoring, alerting, and failure-recovery strategies would you implement to ensure reliability during RPC outages or network congestion?

Gamya • Jun 22

Thank you for the kind words! 😊 Those are really interesting questions, though I have to be honest — smart contract validation and blockchain backend architecture are quite a bit outside my current area of focus (I'm primarily in iOS/Swift land!). I wouldn't want to give you half-baked answers on something as critical as on-chain payment systems.
For those specific challenges, you'd likely get much better responses from developers with direct production blockchain experience — might be worth posting them as a standalone discussion thread on DEV where that community can weigh in properly!

Hiren Kava • Jun 22

Thanks for your honest and thoughtful response—I really appreciate the transparency.

Mike Czerwinski • Jun 21

Your closing question — "deliberately practice judgment rather than accumulate it accidentally" — is the exact problem I've been trying to solve operationally.

What worked for me: treat judgments as inspectable state, not internal feelings. Every architectural decision goes into a separate store with status (proposed/accepted/locked) and a reason field. A few months in, the store is already a readable trace of how my judgment actually evolved — which calls I got right, which I reversed, why. Practicing judgment turns into reading your own record.

Wrote up the framework angle separately: dev.to/jugeni/vibe-coding-is-not-a-level-its-an-axis-12gb — yours is the why this matters, mine is one possible how.

Gamya • Jun 22

This is a really practical approach—"inspectable state, not internal feelings" is such a useful reframe. The idea of a decision store with a reason field is something I hadn't considered, but it makes a lot of sense: you can't really review a judgment you never recorded, and most of us just carry it around implicitly until something goes wrong and forces a retrospective.
Reading your own record as the practice is elegant too — it turns judgment from something abstract into something you can actually audit. Going to check out your piece on the framework angle now! 🌸

Mike Czerwinski • Jun 22

"Something you can actually audit" is the frame that matters — once it's a record you can review, the practice mostly runs itself. Most retros only fire when something breaks; a decision store flips that to retros the file schedules, not ones pain forces. Hope the framework piece lands.

Nazar Boyko • Jun 21

The "what would a wrong but plausible-looking version look like" habit is the one I'd steal from this. Most code review trains you to check "does it work", which AI output sails right past because it usually does work, just not the way you needed. On your open question, the thing that's grown my own judgment fastest is writing down what I expect before I run something, then seeing where I was off. You don't get that feedback loop if you only ever look at the result after it's already right. Have you tried keeping a record of the calls Gemini made that you'd have made differently?

Gamya • Jun 22

"Writing down what I expect before I run something" — that's such a concrete way to build the feedback loop, and I hadn't thought about it quite that way before. The gap between prediction and result is where the actual learning happens, and you're right that skipping straight to the output short-circuits that entirely.
I haven't kept a formal record of Gemini's calls I'd have made differently, but after reading this I'm genuinely considering it — especially the localStorage gallery decision, which is the exact kind of thing that would've shown up in that kind of log. Thanks for the practical suggestion! 🌸

𝕋𝕙𝕖 𝕃𝕒𝕫𝕪 𝔾𝕚𝕣𝕝 • Jun 21

OMG YES! This is exactly what I've been trying to put into words. As AI speeds up execution, judgment really does become the bottleneck. Such a refreshing perspective. 🔥👏

Gamya • Jun 22

Yes, exactly—it's one of those things that feels obvious once you name it but is surprisingly hard to articulate before that! Really glad it resonated! 🔥

View full discussion (16 comments)