DEV Community

Cover image for Four Evolutions of AI Workflow Definition: From Markdown to JS Scripts to Distributed Multi-Agent
WonderLab
WonderLab

Posted on

Four Evolutions of AI Workflow Definition: From Markdown to JS Scripts to Distributed Multi-Agent

Preface

"Use AI to automate development workflows" — almost every tech company talks about this.

But very few have a serious discussion about: how should the Workflow itself be defined? In what technical form do you express it?

This seems like a secondary concern. It's actually a core one. How a Workflow is defined determines whether it can execute reliably, whether it can be monitored, whether it remains stable as Agents are updated, and whether it can be shared and reused across teams.

This article is a complete retrospective of the evolution of Workflow definition methods in enterprise AI platform practice.


Four Stages of Technical Evolution

Stage 1: Markdown Prompt-Based Workflow Description

The original approach: use natural language and Markdown format to describe the workflow, place it in a designated directory, and let the Agent execute according to the description.

Advantages: Low barrier, flexible, human-readable — capable of quickly expressing complex business intent.

Fundamental limitation: LLMs executing Prompt descriptions are "interpreting" rather than "executing" — the same description produces a fresh interpretation each time it runs. There's no guarantee that steps will execute strictly in order and conditionally.

This isn't a Prompt quality problem; it's the theoretical ceiling of this form of expression. You can write the perfect Markdown, and the LLM will still deviate from expected step ordering on some runs, skip certain checks, or produce inconsistent judgments on branch conditions.

Stage 2: Script-Based Orchestration (Platform Built-in Solutions)

Some enterprise AI platforms provide built-in process orchestration scripting languages to replace natural language descriptions, achieving deterministic control flow.

Progress: Solved the execution determinism problem — process execution order is controlled by code, not LLM interpretation.

Hard limit encountered: Orchestration layer and Skill layer are disconnected. If scripts can only call a fixed toolset (like bash commands only) and cannot call AI Skill-layer capabilities, then Workflow execution is severely limited.

This revealed a critical requirement: the Workflow orchestration layer needs to seamlessly call AI Skills.

Stage 3: Workflow Defined as a Large Skill (Transitional)

To work around the disconnection between orchestration layer and Skill layer, a common transitional approach is to package the entire workflow as a large Skill containing complete step descriptions.

This approach has four inherent flaws:

  1. Execution accuracy unchanged: Fundamentally still using LLM to interpret natural language — no improvement in determinism
  2. No node-level monitoring: The entire Workflow is a black box with no visibility into which step failed
  3. Conceptual mixing: Workflow orchestration logic is mixed into what should be atomic capability Skills
  4. No cross-Agent collaboration: All logic in a single Agent's context, no distributed execution support

Stage 4: Native JS Workflow (Current Most Viable Form)

Enterprise AI coding platforms like Claude Code have introduced native Workflow mechanisms based on JS scripts, where each phase has code-level control and Agents are directed to complete tasks through prompts.

This is currently the closest form to engineering-ready — it also introduces new boundary problems discussed below.


The Core Tension in AI Workflows: Expressiveness vs. Execution Determinism

These four stages are fundamentally a search for balance between two extremes:

  • Natural language: High expressiveness, handles ambiguity and context dependencies, human-readable and writable — but LLM execution is unpredictable, with no guarantee of flow determinism
  • Code: Strong execution determinism, version-controllable, testable — but has technical barriers for non-developers, and rigid structure limits the Agent's ability to self-adapt to unexpected situations

Currently the most reasonable design principle: use code to control flow structure (execution order, branching conditions, checkpoints), use natural language to define the task semantics within each node.

// Code controls flow structure (deterministic)
phase('Bug Analysis')
const analysis = await agent('Analyze the root cause of this bug', { schema: BUG_ANALYSIS_SCHEMA })

// Checkpoints controlled by code, not by LLM judgment
if (analysis.confidence < 0.8) {
  await waitForHumanConfirmation('Insufficient analysis confidence, please confirm manually')
}

// Next phase triggered by code, not relying on LLM's "judgment"
phase('Code Fix')
const fix = await agent(`Fix the code based on the following analysis: ${JSON.stringify(analysis)}`)
Enter fullscreen mode Exit fullscreen mode

Assessment of Current Native Workflow

Core Advantages

Execution determinism: Phase execution order, conditional branching, and loop logic are controlled by JS code. The Workflow's "skeleton" doesn't drift between runs.

Node-level observability: Each Phase's execution state can be tracked with a real-time view clearly showing where the Workflow is and which step failed. This is a qualitative improvement over the black-box execution of Prompt-based Workflows.

Skill invocation viable: JS scripts can explicitly specify an Agent to call a particular Skill for a specific Phase using prompts — accuracy is higher than pure Prompt description, and the disconnection between Skill invocation and Workflow orchestration is largely resolved.

Workflow as a manageable code asset: JS scripts can be git-managed, code-reviewed, shared across teams, and continuously iterated — a prerequisite for Workflows to go from one-time use to reusable assets.

Critical Limitations

Cross-process / cross-container orchestration not supported

The native Workflow execution model is a master-servant relationship within a single session: one main Agent reads the Workflow definition, dynamically spawning subagents during execution, with all Agents sharing the same session context.

But enterprise Agent Platform design targets multi-process peer collaboration: development Agents, testing Agents, and requirements management Agents each run in independent containers as independent processes. These two models are fundamentally incompatible.

Directly applying native Workflows to an Agent Platform would hit the cross-process orchestration problem — there's no native mechanism for one instance to schedule another.

Tool ecosystem lock-in risk

The native JS Workflow format and execution engine are proprietary to the specific tool. If a later decision is made to switch AI programming tools, the accumulated Workflow script assets can't be directly migrated — they need to be reimplemented from scratch. In a period where AI tool selection hasn't stabilized, this is a strategic risk that needs explicit trade-off analysis.


Industry Status: No Cross-Platform Standard for AI Workflows Yet

Current mainstream AI Workflow tool definition formats are fragmented:

Tool Workflow Form
Claude Code JS scripts, phase-based, session-internal execution
OpenAI Codex API-layer function calling + thread
LangGraph Python DAG, graph structure
Custom platforms Each with their own YAML/JSON spec

The direct consequence of fragmentation: Workflow assets across different tools can't be horizontally reused. If a team switches tools, workflow accumulation must start over. Traditional BPM took about ten years to evolve from proprietary formats to the BPMN 2.0 standard — AI Workflow is in a similarly chaotic early phase.


Recommendation: Decouple Workflow Intent Layer from Execution Engine

Given the current situation of no established standards and fragmented formats, the recommended design strategy is decoupling the intent layer from the execution engine:

Short-term: Don't use any tool's native format (like Claude Code JS, LangGraph Python) as the enterprise Workflow storage format. Use a vendor-neutral format (YAML/JSON spec) to describe the Workflow's "intent layer" — including: what goal this Workflow achieves, what phases it has, each phase's inputs/outputs and success criteria, where checkpoints are. The execution engine is a replaceable adapter layer that renders the spec into the target platform's native format for execution.

Medium-term: Establish an internal enterprise Workflow spec standard, providing translation layers for different execution engines. Higher initial cost, but achieves true portability — business knowledge accumulated in Workflows won't be lost when tools are switched.


Architecture Choices for Distributed Multi-Agent Scenarios

When the Agent Platform's goal is to have multiple specialized Agents (development Agent, testing Agent, requirements management Agent) collaborate, who holds the Workflow state is the key architectural question.

Three Architecture Patterns

Pattern 1: Platform as Orchestrator

Agent Platform (workflow engine)
    ├── Maintains Workflow state machine
    ├── Decides which Agent to invoke based on current phase
    ├── Sends task spec to target Agent (inputs + success criteria)
    ├── Waits for Agent to return results
    └── Advances to next phase based on results
Enter fullscreen mode Exit fullscreen mode

The greatest advantage is clear responsibilities: Agents only need to focus on "doing this task well" — they don't need to know which phase of which business process they're in. Workflow business logic changes don't affect Agent implementation, and Agent capability upgrades don't affect Workflow definitions.

Pattern 2: Orchestrator Agent Model

Add a special "orchestrator Agent" type running in its own container, responsible for reading the Workflow definition, maintaining state, and invoking other Agents through the platform API.

Suitable for scenarios where Workflow logic is complex and requires AI judgment to determine flow paths. The trade-off is introducing an additional Agent type that increases system complexity.

Pattern 3: Shared Workspace + Event-Driven

Agents don't communicate directly — they collaborate through a shared workspace (Git repositories, task boards). The Workflow defines what trigger conditions cause each Agent to intervene, and what artifacts they produce to trigger the next event.

Most robust (any one Agent failing doesn't affect others), but Workflow overall state is hard to visualize and difficult to debug — not suitable for enterprise scenarios requiring strong auditing and monitoring.

Recommendation: Platform Orchestration + Agent Execution

For enterprise Agent Platforms, Pattern 1 is recommended as the primary architecture, following these principles:

Agents don't hold Workflow knowledge: A development Agent shouldn't know it's part of a "requirements to deployment" Workflow. It should only know: what task input it received, and what it needs to output when done. Workflow business knowledge lives in the platform layer, Agent technical capability lives in the Agent layer — don't mix them.

Workflow phase-to-Agent mapping maintained at the platform layer: Each Workflow phase defines "what type of Agent is needed," and the platform finds the corresponding Agent instance and assigns the task. This way, Agent instance scaling doesn't affect Workflow definitions, and Workflow flow logic changes don't require Agent modifications.

Agents expose standard task interfaces: Each Agent should have a unified task-receiving interface — accepting structured task specs (objective, inputs, success criteria, constraints) and returning structured results (deliverables, status, confidence). This interface design is the Agent-layer mirror of the I/O standardization problem discussed earlier.

Workflow state persisted at the platform: The platform needs to maintain the state of each Workflow instance (current phase, inputs/outputs of each phase, historical execution records). This is both the foundation for monitoring and auditing, and the prerequisite for resume-from-checkpoint support — if a phase fails, it can be re-triggered from the last successful checkpoint rather than requiring the entire Workflow to restart.


A Concrete Architecture Example

Using "requirements analysis → development → testing" end-to-end workflow as an example:

Trigger: PM submits requirements description

[Platform workflow engine]
  Phase 1: Requirements Analysis
    → Invoke Requirements Agent (input: raw requirements description)
    → Output: structured requirements spec (acceptance criteria, subtask breakdown)
    → Human checkpoint: PM confirms spec

  Phase 2: Development
    → Invoke Development Agent (input: requirements spec + codebase context)
    → Output: code changes + unit tests + implementation notes
    → Auto gate: code compiles, unit test coverage meets threshold

  Phase 3: Testing
    → Invoke Testing Agent (input: code changes + requirements spec)
    → Output: test report + defect list
    → Branch condition: no defects → merge approval; defects found → return to Phase 2

Agents have no direct communication — everything passes through the platform layer
Enter fullscreen mode Exit fullscreen mode

In this architecture, each Agent runs independently in its own container, the platform handles task distribution and state transitions, and the end-to-end Workflow view is completely visible at the platform layer.


Summary

The four evolutions of AI Workflow definition, each solving problems exposed by the previous stage:

  • Markdown → Script orchestration: Solved execution determinism
  • Script orchestration → Large Skill: Solved Skill invocation (but introduced new problems)
  • Large Skill → Native JS Workflow: Solved both determinism and Skill invocation simultaneously
  • Native JS Workflow → Decoupled intent layer: Solved tool lock-in and cross-process collaboration

With no established industry standards and fragmented tool formats, this is the current reality of AI Workflow engineering. Decoupling the Workflow intent layer from the execution engine is the pragmatic strategy for reducing tool-switching costs. For distributed multi-Agent scenarios, platform as orchestrator is the architecture with the clearest responsibility boundaries.


Visit PrimeSkills — a curated AI Agent and skills marketplace where all content is validated through real enterprise workflows. No hype, just what actually works.

For more practical knowledge and interesting products, visit my personal homepage

Top comments (0)