Skip to content

How videoclaw works — for agents

how an agent host integrates: schema discovery via vclaw schema --json, driving the video commands, MCP serve for live read-only queries, and the on-disk artifacts as the source of truth

Diagram source (live Mermaid)

This page is written for an AI agent or orchestration layer that will drive videoclaw on a user's behalf — Claude Code, Claude Desktop, OpenAI Codex, or any other agent host. videoclaw is a neutral target: it doesn't care which assistant is driving, only that the driver can run a CLI or speak MCP. If you're a person, the Guide is friendlier — but you're welcome to read on.

You talk to your agent; your agent drives videoclaw

The one idea that explains everything

videoclaw is a deterministic toolkit, not an orchestrator.You (the agent) do the intent reasoning. vclaw executes explicit, inspectable steps. The on-disk project folder is the single source of truth — not the chat, not memory.

videoclaw deliberately contains no model calls for "deciding what the user wants." That's your job. videoclaw's job is to turn a decided plan into artifacts and provider calls, reproducibly, with every step written to disk.

Discover the surface in one call

Before driving anything, learn the contract:

bash
vclaw schema --json

This returns the entire command tree: every command, every flag, every artifact JSON schema, every exit code, and every error code. Parse it once and you know the whole surface — don't guess flags.

For hosts that prefer MCP:

bash
vclaw mcp serve

exposes read-only project introspection as MCP tools (status, metrics, artifacts).

The exit-code contract (read this on every call)

vclaw communicates with you through exit codes, and emits machine-readable JSON on stdout:

ExitMeaningWhat you do
0SuccessContinue.
1Your input was wrongFix the flags/args and retry.
2System / provider errorInvestigate; a retry may help.
3A gate is blockingClear the gate first, then retry.

On any non-zero exit, stdout is {"code": "...", "message": "...", "details": {…}}. Read code and act on it — do not just re-run blindly.

The canonical flow

init → brief → storyboard → assets → review → publish

        readiness · plan · produce(execute) · execute-status · execute-cancel
                    (the runtime layer, between assets and review)

Aliases you'll see: plan = execution-plan, produce = execute. A typical end-to-end drive:

  1. vclaw video init <slug> --mode storyboard|director
  2. vclaw video brief --project <slug> --title "…" --intent "…" [--aspect-ratio 16:9|9:16|1:1]
  3. vclaw video storyboard --project <slug> --scene "…" [--scene "…" …]
  4. vclaw video assets --project <slug> --asset image:path:0
  5. vclaw video readiness --project <slug> — check blockers before spending
  6. vclaw video plan --project <slug> — see the recommended provider route
  7. vclaw video execute --project <slug> [--dry-run]
  8. vclaw video assemble --project <slug> — stitch the final narrated MP4

Always --dry-run first. It plans the whole thing and spends nothing.

Invariants you must respect

These are load-bearing. Violating them is a bug, not a shortcut.

  • No silent fallback. Provider routes hard-fail (exit 2) rather than quietly switching to a different route. If a route isn't configured, surface that — don't paper over it.
  • The director approval gate. In director mode, execute exits 3 with storyboard_approval_required. Do not set the approval yourself on the user's behalf without their say-so — show them the storyboard, get explicit approval, then set VIDEOCLAW_APPROVE_STORYBOARD=1 and retry.
  • Review freshness. A project has a review-state ladder (missing → current → stale). A stale director review blocks execute/execute-status even when approval is set. If you edit the storyboard after approving, re-run the review.
  • Characters by visual descriptor, not name. Proper names don't survive across generations and some providers reject photoreal real-person faces. Describe characters by appearance; lean on the story bible and reference sheets for continuity.
  • Reference budget. Submissions cap references (≤9 image / ≤3 video / ≤3 audio). The toolkit preflights this; respect the failure rather than retrying blindly.

Artifacts are the contract

Every stage writes a canonical JSON artifact under projects/<slug>/artifacts/ (brief, storyboard, story-bible, asset-manifest, execution-plan, execution-report, review-report, publish-report, …). Every write also appends an event to events/events.jsonl. To know the true state of a project, read the artifacts — they're the source of truth, and their shapes are defined by the JSON Schemas in schemas/video/.

A handoff is ready only when review-report.json has verdict: "pass" andmetrics.publishReady: true. Don't claim "done" on anything weaker.

Skills: where to start

videoclaw ships agent skills so you don't have to rediscover the surface each time:

  • Canonical entry skills — start broad, specialize later:
    • video-framework — any video request.
    • brand-presenter — narrated presenter / host-led videos.
  • Starter skill pack — copy-paste templates in mcp/skills-pack/ (videoclaw-create-video, videoclaw-check-status, videoclaw-portfolio-review).
  • Machine-readable indexskills/catalog.json. Don't scrape the markdown; read the catalog.

Read these next (in the repo)


This page exists so an agent can orient in one read. The human-facing version of the same workflow is Use it with Claude Code.

Built to be driven by agent hosts like Claude Code, Claude Desktop, or Codex · Source-available, commercial use requires a paid license.