Architecture

System architecture — operator or agent drives the vclaw CLI into the domain modules, which write the artifact, checkpoint and event ledgers and feed the execution runtime and adapter layer (native, command shim, custom binary) plus optional Bun and Python sidecars

Diagram source (live Mermaid)

videoclaw (npm package: videoclaw) is a multi-provider video CLI that grew out of two predecessor codebases:

the original videoclaw package (v0.11.x) which had an orchestration layer (ralph / ralplan / team / MCP servers) on top of a video pipeline
the clean-room vclaw-video-core rebuild which kept only the video pipeline with strict on-disk artifacts + approval gates

v2 takes vclaw-video-core as the foundation, drops the orchestration layer (Claude Code / Codex now cover those concerns natively), and ports forward selected pieces from videoclaw: the vclaw-cli Bun package (formerly veo-cli), the Runway transport, a curated Python pipeline, and the Google Flow v1 + Omni Flash backend additions.

What's intentionally NOT here

The following subsystems from the original videoclaw v0.11.x are deliberately dropped. See MERGE_PLAN.md §3 for the rationale:

src/team/ (tmux team coordination)
src/ralph/, src/ralplan/ (persistent loops + consensus planning)
src/mcp/ (state / memory / code-intel / team / trace MCP servers)
src/hooks/, src/autoresearch/, src/hud/, src/visual/, src/openclaw/, src/sparkshell/, src/runtime/, src/subagents/, src/notifications/, src/verification/
crates/ (would have been Rust performance components if they had ever been checked in — they were referenced in CLAUDE.md but never existed in source)

Equivalents exist in Claude Code (native subagents), the OMC plugin (ralph / team / autopilot skills), and similar host CLI tooling.

Sidecars

The main repo is pure TypeScript Node 20+, but two opt-in sidecars extend it:

vclaw-cli/ — Bun package (formerly veo-cli). Multi-provider video automation: Google Labs Flow (Veo 3.x direct + Omni Flash) via Puppeteer scraping, UseAPI (Veo / Seedance / Runway), local SQLite job tracking. The main repo's native-veo.ts invokes this package for the veo-useapi transport.
skills/video-replicator/scripts/ — Python 3.10+ pipeline. 122 modules covering the Seedance Prompt Director (compose / chain / critique / reference-validator / hooks), the bunty cricket pipeline, character sheet generation, presenter helpers, video assembly, and audio utilities. Documented at docs/PYTHON_PIPELINE.md.

Current layers

src/cli/
- user-facing command entrypoints
src/video/provider-platform/
- provider route descriptors
src/video/provider-status.ts
- environment, dependency, and route health reporting
src/video/pipeline-manifests/
- built-in stage definitions for storyboard and director
schemas/video/
- canonical machine-readable contracts
src/video/*
- portfolio management, reporting, templates, readiness, character consistency, execution planning, adapter-backed execution runtime, and Obsidian export
- story-bible.ts — deterministic continuity bible (buildStoryBibleArtifact, writeStoryBibleForProject) derived from brief.json + storyboard.json + character profiles. Pure apart from reading project files; spends no credits and calls no providers. Auto-generated at storyboard-write time by every storyboard-producing command (video create|storyboard|clone-execute|storyboard-from-clone|storyboard-review and director-preflight --apply-content-fixes, regenerated after content-fixes apply) and written to artifacts/story-bible.json (contract: schemas/video/artifacts/story-bible.schema.json). It captures cast, settings, props, and a scene timeline with continuity notes so downstream generation stays consistent across scenes/regenerations; recorded in the storyboard checkpoint, the artifact.storyboard.written event payload, and the command JSON output as storyBiblePath, and validated by the doctor layer.
- assemble/ — the FFmpeg assemble/stitch layer. assemble/media-qc.ts (runAssembleMediaQc) runs after stitch on non-dry-run renders, ffprobing each clip + the master via probeMedia (issue codes probe-failed, missing-audio, nonstandard-audio-codec, nonstandard-audio-sample-rate, nonstandard-video-codec, duration-drift); the report is attached to AssembleResult.qc and the assemble-report artifact under qc, with issues surfaced as qc.<code>[<scope>] assemble warnings. assemble/narration-fit.ts (planNarrationFit) is a pure planner (adapted from Google's story-generator timing rule) that either speeds up narration within a tempo threshold or keeps speech natural and loops the visual bed.
- native in-process transports: native-veo.ts (→ vclaw-cli/Bun), native-seedance.ts (SUTUI_API_KEY), native-runway.ts (UseAPI Bearer, pure-Node fetch), native-dreamina.ts (Seedance 2.0 / Dreamina via useapi.net — reuses USEAPI_API_TOKEN + VCLAW_DREAMINA_ACCOUNT, pure-Node fetch)
- review-ui.ts — HTTP server (port 4317) that drives the browser-based storyboard review station at tmp/review-station/index.html. See docs/REVIEW_UI_STORYBOARD_WORKFLOW.md.
- prompt-quality.ts — six Seedance-handbook anti-pattern checks (adjective soup, multiple actions, multiple camera moves, style-word overload, literary emotion language, overlong prompts) wired into director-preflight, warnings by default and promotable to blocking errors via DIRECTOR_STRICT_PROMPT_QUALITY=1; also houses runMultiShotChecks, the validator for the multi-shot prompt framework
- multi-shot-prompt.ts — multi-shot cinematic prompt authoring utility: provider-aware presets (cinematic-15s, seedance-10s, veo-8s, runway-10s), buildShotPlan (deterministic timecode scaffold, non-repeating camera grid), parsed shots[], and generateMultiShotPromptText (Gemini-backed or stub). video multi-shot --from-storyboard hydrates prompts from project storyboard scenes and persists source metadata.
- dialogue-fit.ts — short-clip dialogue duration checks wired into director-preflight, warnings by default and promotable to blocking errors via DIRECTOR_STRICT_DIALOGUE_FIT=1
- generation-telemetry.ts — route/task/config/cost/timing/output telemetry recorded into project event ledgers and used by cost estimates when completed Seedance USD samples exist
src/video/providers/
- per-provider HTTP adapter code (runway-useapi.ts and dreamina-useapi.ts). Each adapter exports submit/poll/cancel functions that accept an optional fetchImpl for test injection. Wrapped by src/video/native-*.ts for production use.

Principles

No silent fallback across materially different provider paths
Every stage should eventually have a canonical artifact
CLI output should be machine-readable by default
Architecture remains small until the contracts are stable

Near-term roadmap

Add more review/publish automation around generated outputs
Keep tightening the transport contracts without widening orchestration complexity
Expand higher-level operator ergonomics on top of the current runtime
Keep docs/help output aligned with the actual product surface
Add selective provider-specific polish only where real runs justify it

Current implemented flow

video init
- creates canonical project workspace
video brief
- writes brief.json
- marks brief checkpoint complete
video storyboard
- writes storyboard.json
- marks storyboard checkpoint complete
video assets
- writes asset-manifest.json
- marks assets checkpoint complete
video review, video review-ui, or video review-autopilot
- writes review-report.json
- marks review checkpoint to completed, retry-required, or failed
- allows publish handoff only when the saved report has verdict: "pass" and metrics.publishReady: true
video publish
- writes publish-report.json
- marks publish checkpoint complete or failed
video status
- resolves next stage from manifest + checkpoints
video doctor-project
- validates checkpoint/artifact consistency
video doctor-portfolio
- validates the whole portfolio
video metrics|workload|next-actions|dependencies

portfolio management views

video report|report-snapshot|report-history|report-diff|trends|export-csv

reporting and snapshot history

video export-obsidian|sync-obsidian|scaffold-obsidian-vault

Obsidian operations layer

video playbook-list|playbook-show

bundled prompt/playbook registry

video prompt-lib-list|prompt-lib-show

imported prompt/reference library

video template-save|template-list|template-show|clone-plan|clone-init|storyboard-from-clone

reusable template / clone bridge

video clone-execute

template -> storyboard -> execution-seed -> runtime in one flow

video readiness

artifact, character-consistency, image-input, scene-selection, and director identity-sheet readiness before runtime execution

video plan|produce|execute-status

route selection, payload generation, dry-run validation, built-in or external adapter execution, polling, output ingestion, native Seedance direct transport, native Veo direct transport, and prompt-guided execution context

video character-add|character-list|character-show|character-consistency

character profile subsystem and continuity enforcement

video reference-sheet-add|list|show|bind|validate
video candidates-list|candidates-show|select-candidate|reject-candidate|reroll-scene|chain-from|unchain|candidates-migrate-from-assets
- per-scene candidate registry + operator selection state + chain-from-prev
- partial rerun via produce --scene <n>
- role-tagged reference sheets with closed-vocabulary validation and per-scene binding
video cost-estimate
- static default estimate with optional historical Seedance USD telemetry override
video multi-shot
- cinematic prompt-authoring utility with standalone and project-storyboard entry paths
- --plan generates a deterministic timecode scaffold + non-repeating camera grid
- --validate (--file or stdin) checks prompt quality via runMultiShotChecks; nonzero exit on errors
- --auto --image <path> calls Gemini to author prose from a reference image (offline-stubbable via VCLAW_MULTISHOT_AUTO_STUB)
- --from-storyboard --project <slug> --scene <sceneIndex> derives action, characters, and default location from the project artifacts
- --provider / --route resolve provider-shaped presets when --preset is omitted
- --project <slug> persists the result as a multi-shot-prompt artifact; status/readiness summarize the latest artifact for review

Compatibility aliases:

video execution-plan -> video plan
video execute -> video produce

Commercial track + quantified prompt-craft (landed)

The prompt-craft layer is category-driven and quantified. It generalises beyond "cinematic character video" to cover commercial / product work, and locks Seedance identity through the official Asset Library. The pieces below are all implemented and on main.

Category Descriptor registry

src/video/category-registry.ts defines nine categories (CATEGORY_IDS), each a CategoryDescriptor carrying a subjectType of character or product, a beatTemplate (three-act / ad-hook-feature-cta / turntable / lookbook), a cameraVocab, a genre, an audioProfile (diegetic / ad-mix), and hookSeconds. resolveCategory(id?) defaults to the cinematic character descriptor. The subjectType selects which branch filmmaking-prompts takes:

character path — character sheets + storyboard-grid character lock (the existing cinematic path, unchanged).
product path — no character sheets, no grid lock; each product in artifacts/product-references.json becomes a text-driven Seedance packet whose timeline follows the descriptor's beat template, with orbit grammar woven in for orbit/turntable vocabularies. src/video/product-references.ts reads the product references and degrades gracefully (description-only hero from the brief) when the artifact is absent.

referenceBuildOrder() fixes the identity-reference build sequence (base-ref → sheet → scene-plate) so scene lighting can't contaminate the identity anchor.

Quantified cinematography + standing prompt rules

src/video/cinematography.ts holds pure, deterministic prompt-fragment emitters whose density scales with a DetailLevel of terse | standard | rich: cameraSpec (shot/lens-mm/angle/movement, velocity in ft/s at rich), lightingSpec (Kelvin / key angle / ratio), gradeSpec (shadow/highlight hue+sat splits), and audioMix (a dB hierarchy at rich). It also defines five cinema modes (CINEMA_MODE_IDS: narrative, studio, action, performance, atmospheric) resolved by cinemaMode and stacked for multi-world intercuts by stackModes (adjacent modes are never merged), resolveCameraVocab, per-genre look defaults (genreDefaults), beat-template layout (beats()), and precise orbit grammar (orbitGrammar, three ORBIT_KINDS).

A library of six named 2-second opening hooks (HOOK_PATTERN_IDS: black-to-light, silence-to-sound, reverse-motion, beat-drop, match-cut-in, whip-reveal) is resolved by resolveHookPattern / hookBeat, which throw on an unknown id (hooks must be explicit).

src/video/prompt-rules.ts holds the standing prompt rules as pure scrubbers: stripProperNames (swap cast names for stable visual descriptors), brandNeutralize (strip brand tokens), noFaceMorphTag (forbid identity drift), and diegeticAudioLine (diegetic audio only).

Filmmaking-prompts two-phase gate

vclaw video filmmaking-prompts (src/video/filmmaking-prompts.ts) takes a --phase storyboard|video gate: storyboard returns the storyboard / camera-language portion only (seedancePackets gated to []); video and the default (omitted) return the full video-generation packets. The same command takes --category <id>, --genre, --detail terse|standard|rich (appends a quantified cinematography suffix at rich), --panels 9|12|15|20, --aspect-ratio, --no-faces, and --storyboard-grid <path>.

Multi-shot output formats

vclaw video multi-shot --plan (src/video/multi-shot-prompt.ts) renders a deterministic shot plan in several formats:

--format default|seedance-paragraph|per-shot — composeSeedanceParagraph (Seedance native single-paragraph), composePerShotFormat (per-shot SHOT N blocks), or the default layout.
--lang en|zh|en+zh — composeBilingual wraps the rendered prompt in fenced blocks (single, translated, or EN + 中文); numeric/technical spec tokens pass through unchanged.
--dialogue "<speaker>: <line> [|| <speaker>: <line>]" — parseDialogueLine
- withDialogue attach one- or two-speaker spoken dialogue.
--hook <patternId> — prepends a resolved opening-hook directive.
--category <id> — resolves a CategoryDescriptor to shape the prompt.

Seedance Asset Library end-to-end flow

Seedance character/product consistency on the official ark/seedance-2.0 endpoint goes through managed Asset Library avatars (Asset:// URIs), not raw photoreal URLs (which trip the "real person" content filter and don't lock identity). The end-to-end flow:

vclaw video seedance-register-assets (src/video/seedance-asset-library.ts) registers each --character <name>:<imageUrl> as an Image asset under a group, polls until it reaches the international Ark profile (sync_status: active), and writes artifacts/seedance-assets.json (contract: schemas/video/artifacts/seedance-assets.schema.json). Requires SUTUI_API_KEY.
At runtime, src/video/execution-runtime.ts reads seedance-assets.json (only when recommendedRouteId === 'seedance-direct') and auto-resolves each scene's cast names to their Asset:// URIs, which become that scene's reference set.
src/video/native-seedance.ts routes Asset:// references into the Seedance reference_images param and enforces the per-generation reference budget via assertReferenceBudget (≤9 image, ≤3 video, ≤3 audio) — validated for every task before any network call, so an over-budget task can't cause a partial submit.

Architecture ​

What's intentionally NOT here ​

Sidecars ​

Current layers ​

Principles ​

Near-term roadmap ​

Current implemented flow ​

Commercial track + quantified prompt-craft (landed) ​

Category Descriptor registry ​

Quantified cinematography + standing prompt rules ​

Filmmaking-prompts two-phase gate ​

Multi-shot output formats ​

Seedance Asset Library end-to-end flow ​