DEV Community

Cover image for I Track What My AI Costs Every Day - Here's the Scoreboard Most Teams Skip
Mykola Kondratiuk
Mykola Kondratiuk

Posted on

I Track What My AI Costs Every Day - Here's the Scoreboard Most Teams Skip

A headline went around last week: 2026 is running over a thousand tech layoffs a day, and roughly 55% of them name AI as the cause.

Then I hit the line buried underneath it. Experts flagged that the productivity gains those cuts are based on mostly haven't shown up at scale yet.

So companies are cutting headcount on a forecast of AI output that hasn't arrived. That's not an engineering problem or a PM problem. It's a measurement problem. And it's the exact thing I deal with every day, so let me show you the boring tool that fixes it.

Deploying is not delivering, and the gap is where careers get decided

I run AI through real delivery work. Not demos. Actual shipped output with deadlines attached.

The most expensive lesson I've learned: deploying a tool and delivering value with it are two different events, and there's a pile of unglamorous work between them.

Deploying is easy. You wire it up, it generates something, the demo looks great, the slide says "40% faster."

Delivering is when you come back two weeks later and ask the question nobody enjoys: what did this actually return, after I subtract what it cost me to run and babysit it?

Most teams never ask. They deployed, so they assume they delivered. That assumption is now being used to justify layoffs. If you can't tell the difference with a number, you're guessing with a bigger budget.

The accountability layer was the measurement layer

Gergely Orosz wrote a piece this week about Meta restructuring into AI pods and stripping out a chunk of the program-management accountability layer in the name of speed.

Here's the trap. That accountability layer was the measurement layer. The people whose job was "did this work, what did it cost, do we keep it" weren't overhead. They were the part of the system that notices when a fast-moving bet is moving fast in the wrong direction.

Pull that out while you bet the company on AI and you don't get a leaner org. You get one that can no longer tell whether its biggest bet is paying off.

The scoreboard, as actual data

Enough principle. Here's the shape I keep for any AI workflow I depend on. It lives in one tracked file, not a dashboard with a vendor logo.

# ai-scoreboard.yml  - one entry per workflow you depend on
workflow: pr-triage-agent
baseline_minutes: 45        # how long this took BEFORE the agent
owner: kolya                # one name. not "the team".

cost:
  tokens_usd_per_run: 0.42
  seat_usd_per_month: 20
  steering_minutes_per_run: 8   # the cost everyone forgets

return:
  minutes_saved_per_run: 31     # vs baseline, not vs zero
  output_kept_pct: 70           # shipped without a human redo

rework_tax_pct: 30              # redone, overridden, or thrown out
kill_if:
  rework_tax_pct: "> 50"        # decided in advance, no sunk-cost debate
  net_minutes_saved: "< 5"
Enter fullscreen mode Exit fullscreen mode

Nothing clever here. The value is in three fields people skip:

  • steering_minutes_per_run. The time you spend prompting, correcting, and cleaning up. This is usually the biggest hidden cost and it's the one demos never show.
  • rework_tax_pct. The share of output you had to redo. When this creeps up, the tool is quietly costing more than it looks, even though it still "works."
  • kill_if. Thresholds decided in advance. When a workflow crosses them, it gets cut without a meeting about how much you've already invested.

A quick gut check you can run in your head on any AI workflow:

net_value = minutes_saved - steering_minutes - (rework_tax * output_volume)
Enter fullscreen mode Exit fullscreen mode

If that comes out near zero or negative, you deployed something that looks productive and delivers nothing. You only find that out if you write the numbers down.

Why this is the moat, not the chore

The doom framing says AI is coming for your job. I think that's backwards for anyone who measures.

The people who survive this era aren't the ones who deployed the most AI. Deploying is table stakes now. The ones who lead through it can sit in a room when budgets are tight and put a real scoreboard on the table: here's what we ran, what it cost, what it returned, what we killed and why.

That's extreme ownership of AI output. You don't get to claim ROI you never measured. But if you can prove it, you're very hard to cut, because you're the one person who can tell AI that works from AI that just looks busy. In a year defined by exactly that confusion, that skill is the whole moat.

A thousand cuts a day are being justified by a return nobody's keeping receipts for. So I'll ask you the way I ask myself every Friday: the AI you're running right now, are you measuring what it delivers, or trusting the slide? And if someone asked you to defend the spend on Monday, what would you actually put on the table?

Top comments (3)

Collapse
 
itskondrat profile image
Mykola Kondratiuk

honestly the scoreboard falls apart on the stuff that doesn't have a clean baseline. measuring an agent that triages PRs is easy. measuring one that changes which ideas even get proposed? i have no good number for that yet, and i pretended it was solved here.

Collapse
 
fastanchor_io profile image
FastAnchor_io

Couldn’t relate more. Stable baselines make scoring trivial, but ideation-focused agents have no reliable quantitative benchmark — this unresolved measurement gap often gets glossed over in formal writeups.

Collapse
 
sloan profile image
Sloan the DEV Moderator

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!