DEV Community

Cover image for Ultracode for Codex: Claude-style Dynamic Workflows with a Skill

Ultracode for Codex: Claude-style Dynamic Workflows with a Skill

Pavel Kalo on May 31, 2026

Claude Code added Dynamic Workflows. Dynamic Workflows are a way to run larger coding tasks as a sequence of planned steps. Claude Code can split ...
Collapse
 
itskondrat profile image
Mykola Kondratiuk

run eval is harder than plan -> split. that check step is where these workflows usually fall apart.

Collapse
 
pablonax profile image
Pavel Kalo

Agree very good point. it covers verification as a workflow step, but not as a real eval engine. That part still depends on project-specific tests, builds, lint, smoke checks, and explicit evidence

Collapse
 
itskondrat profile image
Mykola Kondratiuk

project-specific test dependency is the part that compounds - each agent adds its own eval surface, and coordinating those across a multi-step workflow means one agent can pass its checks but have no signal about what the next stage expects. that interface gap is usually what the "check" step is actually failing on, not the tests themselves.

Thread Thread
 
pablonax profile image
Pavel Kalo

Thanks, this was helpful. I updated the skill with lightweight eval contracts for exactly this handoff/interface gap

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

lightweight contracts are the right call — the hard part is keeping them current after a model version bump. eval that passed against v1 doesn't always cover v1.1 behavior drift.

Collapse
 
mudassirworks profile image
Mudassir Khan

the 'leaves evidence' framing is what makes this useful for debugging, not just correctness — in multi agent runs without an artifact trail you're left proving the agent did the right thing after the fact. genuinely miserable to debug.

the check step is the hard part though. similar orchestration with MCP based agents, the failure is always at integration handoff: explorer agents return contradictory facts about the same file, parent has to resolve with no tiebreaker. the orchestration.md is a good step toward making that conflict explicit.

what's your approach when two workers produce different conclusions about the same code path?

Collapse
 
pablonax profile image
Pavel Kalo

yes you are right ,and in this skill it handled in integration, not by voting.

Worker output has to come with evidence. The parent owns integration: it records conflicts in integration.md, checks the cited files and claimed edits directly, and resolves disagreements against the current source, tests, docs, or primary sources.

If a handoff has no evidence, it gets rejected. If the conflict is still unresolved, it stays in Conflicts / Verification still needed rather than being guessed; in practice that becomes a focused follow-up check.

So the tiebreaker is not confidence or majority vote. It is source-backed, reproducible evidence

Collapse
 
mudassirworks profile image
Mudassir Khan

the 'no evidence gets rejected' gate is the part most orchestration systems skip. they average or defer to the last response, both of which fail quietly.

curious how you handle the case where both workers have valid evidence but diverge because state changed mid run — like two explorer agents reading the same file at different points in the workflow, both technically sourced, but pointing different directions?

Collapse
 
harjjotsinghh profile image
Harjot Singh

Dynamic workflows packaged as a skill is the right shape, because it's the difference between an agent that improvises every time and one that follows a structure you can inspect and trust. A skill is essentially a reusable, named procedure with a contract, which is exactly what you want when a workflow needs to be repeatable: same triggers, same steps, same guardrails, instead of hoping the model reconstructs the right plan from scratch on every run. The dynamic part is where it gets interesting, because the win is a workflow that has structure where structure matters (the order, the checkpoints, the irreversible-step gates) but flexibility where judgment matters (which branch, what content), rather than a rigid script that breaks on anything unexpected or a free-for-all that's different every time. That structured-but-adaptive middle is hard to hit and it's the whole game. Bringing Claude-style skills to Codex is a nice portability point too, the methodology generalizes across models, which reinforces that the leverage is in the skill/harness, not any one model. Encode the workflow as an inspectable skill, keep the dynamism inside the guardrails. That structure-where-it-matters-flexibility-where-it-helps instinct is core to how I think about Moonshift. In your dynamic workflows, what stays fixed in the skill versus what you let the model decide at runtime?

Collapse
 
pablonax profile image
Pavel Kalo

it locks the main steps: plan, split the work if needed, check the results, then merge them

Codex still decides the details for each task: what to inspect, how to split it, and how to handle edge cases

Collapse
 
andreygoldman13 profile image
andreygoldman13

top !