Skip to content

Architecture

System architecture — operator or agent drives the vclaw CLI into the domain modules, which write the artifact, checkpoint and event ledgers and feed the execution runtime and adapter layer (native, command shim, custom binary) plus optional Bun and Python sidecars

Diagram source (live Mermaid)

videoclaw (npm package: videoclaw) is a multi-provider video CLI that grew out of two predecessor codebases:

  • the original videoclaw package (v0.11.x) which had an orchestration layer (ralph / ralplan / team / MCP servers) on top of a video pipeline
  • the clean-room vclaw-video-core rebuild which kept only the video pipeline with strict on-disk artifacts + approval gates

v2 takes vclaw-video-core as the foundation, drops the orchestration layer (Claude Code / Codex now cover those concerns natively), and ports forward selected pieces from videoclaw: the vclaw-cli Bun package (formerly veo-cli), the Runway transport, a curated Python pipeline, and the Google Flow v1 + Omni Flash backend additions.

What's intentionally NOT here

The following subsystems from the original videoclaw v0.11.x are deliberately dropped. See MERGE_PLAN.md §3 for the rationale:

  • src/team/ (tmux team coordination)
  • src/ralph/, src/ralplan/ (persistent loops + consensus planning)
  • src/mcp/ (state / memory / code-intel / team / trace MCP servers)
  • src/hooks/, src/autoresearch/, src/hud/, src/visual/, src/openclaw/, src/sparkshell/, src/runtime/, src/subagents/, src/notifications/, src/verification/
  • crates/ (would have been Rust performance components if they had ever been checked in — they were referenced in CLAUDE.md but never existed in source)

Equivalents exist in Claude Code (native subagents), the OMC plugin (ralph / team / autopilot skills), and similar host CLI tooling.

Sidecars

The main repo is pure TypeScript Node 20+, but two opt-in sidecars extend it:

  • vclaw-cli/ — Bun package (formerly veo-cli). Multi-provider video automation: Google Labs Flow (Veo 3.x direct + Omni Flash) via Puppeteer scraping, UseAPI (Veo / Seedance / Runway), local SQLite job tracking. The main repo's native-veo.ts invokes this package for the veo-useapi transport.
  • skills/video-replicator/scripts/ — Python 3.10+ pipeline. 122 modules covering the Seedance Prompt Director (compose / chain / critique / reference-validator / hooks), the bunty cricket pipeline, character sheet generation, presenter helpers, video assembly, and audio utilities. Documented at docs/PYTHON_PIPELINE.md.

Current layers

  1. src/cli/
    • user-facing command entrypoints
  2. src/video/provider-platform/
    • provider route descriptors
  3. src/video/provider-status.ts
    • environment, dependency, and route health reporting
  4. src/video/pipeline-manifests/
    • built-in stage definitions for storyboard and director
  5. schemas/video/
    • canonical machine-readable contracts
  6. src/video/*
    • portfolio management, reporting, templates, readiness, character consistency, execution planning, adapter-backed execution runtime, and Obsidian export
    • story-bible.ts — deterministic continuity bible (buildStoryBibleArtifact, writeStoryBibleForProject) derived from brief.json + storyboard.json + character profiles. Pure apart from reading project files; spends no credits and calls no providers. Auto-generated at storyboard-write time by every storyboard-producing command (video create|storyboard|clone-execute|storyboard-from-clone|storyboard-review and director-preflight --apply-content-fixes, regenerated after content-fixes apply) and written to artifacts/story-bible.json (contract: schemas/video/artifacts/story-bible.schema.json). It captures cast, settings, props, and a scene timeline with continuity notes so downstream generation stays consistent across scenes/regenerations; recorded in the storyboard checkpoint, the artifact.storyboard.written event payload, and the command JSON output as storyBiblePath, and validated by the doctor layer.
    • assemble/ — the FFmpeg assemble/stitch layer. assemble/media-qc.ts (runAssembleMediaQc) runs after stitch on non-dry-run renders, ffprobing each clip + the master via probeMedia (issue codes probe-failed, missing-audio, nonstandard-audio-codec, nonstandard-audio-sample-rate, nonstandard-video-codec, duration-drift); the report is attached to AssembleResult.qc and the assemble-report artifact under qc, with issues surfaced as qc.<code>[<scope>] assemble warnings. assemble/narration-fit.ts (planNarrationFit) is a pure planner (adapted from Google's story-generator timing rule) that either speeds up narration within a tempo threshold or keeps speech natural and loops the visual bed.
    • native in-process transports: native-veo.ts (→ vclaw-cli/Bun), native-seedance.ts (SUTUI_API_KEY), native-runway.ts (UseAPI Bearer, pure-Node fetch), native-dreamina.ts (Seedance 2.0 / Dreamina via useapi.net — reuses USEAPI_API_TOKEN + VCLAW_DREAMINA_ACCOUNT, pure-Node fetch)
    • review-ui.ts — HTTP server (port 4317) that drives the browser-based storyboard review station at tmp/review-station/index.html. See docs/REVIEW_UI_STORYBOARD_WORKFLOW.md.
    • prompt-quality.ts — six Seedance-handbook anti-pattern checks (adjective soup, multiple actions, multiple camera moves, style-word overload, literary emotion language, overlong prompts) wired into director-preflight, warnings by default and promotable to blocking errors via DIRECTOR_STRICT_PROMPT_QUALITY=1; also houses runMultiShotChecks, the validator for the multi-shot prompt framework
    • multi-shot-prompt.ts — multi-shot cinematic prompt authoring utility: provider-aware presets (cinematic-15s, seedance-10s, veo-8s, runway-10s), buildShotPlan (deterministic timecode scaffold, non-repeating camera grid), parsed shots[], and generateMultiShotPromptText (Gemini-backed or stub). video multi-shot --from-storyboard hydrates prompts from project storyboard scenes and persists source metadata.
    • dialogue-fit.ts — short-clip dialogue duration checks wired into director-preflight, warnings by default and promotable to blocking errors via DIRECTOR_STRICT_DIALOGUE_FIT=1
    • generation-telemetry.ts — route/task/config/cost/timing/output telemetry recorded into project event ledgers and used by cost estimates when completed Seedance USD samples exist
  7. src/video/providers/
    • per-provider HTTP adapter code (runway-useapi.ts and dreamina-useapi.ts). Each adapter exports submit/poll/cancel functions that accept an optional fetchImpl for test injection. Wrapped by src/video/native-*.ts for production use.

Principles

  1. No silent fallback across materially different provider paths
  2. Every stage should eventually have a canonical artifact
  3. CLI output should be machine-readable by default
  4. Architecture remains small until the contracts are stable

Near-term roadmap

  1. Add more review/publish automation around generated outputs
  2. Keep tightening the transport contracts without widening orchestration complexity
  3. Expand higher-level operator ergonomics on top of the current runtime
  4. Keep docs/help output aligned with the actual product surface
  5. Add selective provider-specific polish only where real runs justify it

Current implemented flow

  1. video init
    • creates canonical project workspace
  2. video brief
    • writes brief.json
    • marks brief checkpoint complete
  3. video storyboard
    • writes storyboard.json
    • marks storyboard checkpoint complete
  4. video assets
    • writes asset-manifest.json
    • marks assets checkpoint complete
  5. video review, video review-ui, or video review-autopilot
    • writes review-report.json
    • marks review checkpoint to completed, retry-required, or failed
    • allows publish handoff only when the saved report has verdict: "pass" and metrics.publishReady: true
  6. video publish
    • writes publish-report.json
    • marks publish checkpoint complete or failed
  7. video status
    • resolves next stage from manifest + checkpoints
  8. video doctor-project
    • validates checkpoint/artifact consistency
  9. video doctor-portfolio
    • validates the whole portfolio
  10. video metrics|workload|next-actions|dependencies
  • portfolio management views
  1. video report|report-snapshot|report-history|report-diff|trends|export-csv
  • reporting and snapshot history
  1. video export-obsidian|sync-obsidian|scaffold-obsidian-vault
  • Obsidian operations layer
  1. video playbook-list|playbook-show
  • bundled prompt/playbook registry
  1. video prompt-lib-list|prompt-lib-show
  • imported prompt/reference library
  1. video template-save|template-list|template-show|clone-plan|clone-init|storyboard-from-clone
  • reusable template / clone bridge
  1. video clone-execute
  • template -> storyboard -> execution-seed -> runtime in one flow
  1. video readiness
  • artifact, character-consistency, image-input, scene-selection, and director identity-sheet readiness before runtime execution
  1. video plan|produce|execute-status
  • route selection, payload generation, dry-run validation, built-in or external adapter execution, polling, output ingestion, native Seedance direct transport, native Veo direct transport, and prompt-guided execution context
  1. video character-add|character-list|character-show|character-consistency
  • character profile subsystem and continuity enforcement
  1. video reference-sheet-add|list|show|bind|validate
  2. video candidates-list|candidates-show|select-candidate|reject-candidate|reroll-scene|chain-from|unchain|candidates-migrate-from-assets
    • per-scene candidate registry + operator selection state + chain-from-prev
    • partial rerun via produce --scene <n>
    • role-tagged reference sheets with closed-vocabulary validation and per-scene binding
  3. video cost-estimate
    • static default estimate with optional historical Seedance USD telemetry override
  4. video multi-shot
    • cinematic prompt-authoring utility with standalone and project-storyboard entry paths
    • --plan generates a deterministic timecode scaffold + non-repeating camera grid
    • --validate (--file or stdin) checks prompt quality via runMultiShotChecks; nonzero exit on errors
    • --auto --image <path> calls Gemini to author prose from a reference image (offline-stubbable via VCLAW_MULTISHOT_AUTO_STUB)
    • --from-storyboard --project <slug> --scene <sceneIndex> derives action, characters, and default location from the project artifacts
    • --provider / --route resolve provider-shaped presets when --preset is omitted
    • --project <slug> persists the result as a multi-shot-prompt artifact; status/readiness summarize the latest artifact for review

Compatibility aliases:

  1. video execution-plan -> video plan
  2. video execute -> video produce

Commercial track + quantified prompt-craft (landed)

The prompt-craft layer is category-driven and quantified. It generalises beyond "cinematic character video" to cover commercial / product work, and locks Seedance identity through the official Asset Library. The pieces below are all implemented and on main.

Category Descriptor registry

src/video/category-registry.ts defines nine categories (CATEGORY_IDS), each a CategoryDescriptor carrying a subjectType of character or product, a beatTemplate (three-act / ad-hook-feature-cta / turntable / lookbook), a cameraVocab, a genre, an audioProfile (diegetic / ad-mix), and hookSeconds. resolveCategory(id?) defaults to the cinematic character descriptor. The subjectType selects which branch filmmaking-prompts takes:

  • character path — character sheets + storyboard-grid character lock (the existing cinematic path, unchanged).
  • product path — no character sheets, no grid lock; each product in artifacts/product-references.json becomes a text-driven Seedance packet whose timeline follows the descriptor's beat template, with orbit grammar woven in for orbit/turntable vocabularies. src/video/product-references.ts reads the product references and degrades gracefully (description-only hero from the brief) when the artifact is absent.

referenceBuildOrder() fixes the identity-reference build sequence (base-ref → sheet → scene-plate) so scene lighting can't contaminate the identity anchor.

Quantified cinematography + standing prompt rules

src/video/cinematography.ts holds pure, deterministic prompt-fragment emitters whose density scales with a DetailLevel of terse | standard | rich: cameraSpec (shot/lens-mm/angle/movement, velocity in ft/s at rich), lightingSpec (Kelvin / key angle / ratio), gradeSpec (shadow/highlight hue+sat splits), and audioMix (a dB hierarchy at rich). It also defines five cinema modes (CINEMA_MODE_IDS: narrative, studio, action, performance, atmospheric) resolved by cinemaMode and stacked for multi-world intercuts by stackModes (adjacent modes are never merged), resolveCameraVocab, per-genre look defaults (genreDefaults), beat-template layout (beats()), and precise orbit grammar (orbitGrammar, three ORBIT_KINDS).

A library of six named 2-second opening hooks (HOOK_PATTERN_IDS: black-to-light, silence-to-sound, reverse-motion, beat-drop, match-cut-in, whip-reveal) is resolved by resolveHookPattern / hookBeat, which throw on an unknown id (hooks must be explicit).

src/video/prompt-rules.ts holds the standing prompt rules as pure scrubbers: stripProperNames (swap cast names for stable visual descriptors), brandNeutralize (strip brand tokens), noFaceMorphTag (forbid identity drift), and diegeticAudioLine (diegetic audio only).

Filmmaking-prompts two-phase gate

vclaw video filmmaking-prompts (src/video/filmmaking-prompts.ts) takes a --phase storyboard|video gate: storyboard returns the storyboard / camera-language portion only (seedancePackets gated to []); video and the default (omitted) return the full video-generation packets. The same command takes --category <id>, --genre, --detail terse|standard|rich (appends a quantified cinematography suffix at rich), --panels 9|12|15|20, --aspect-ratio, --no-faces, and --storyboard-grid <path>.

Multi-shot output formats

vclaw video multi-shot --plan (src/video/multi-shot-prompt.ts) renders a deterministic shot plan in several formats:

  • --format default|seedance-paragraph|per-shotcomposeSeedanceParagraph (Seedance native single-paragraph), composePerShotFormat (per-shot SHOT N blocks), or the default layout.
  • --lang en|zh|en+zhcomposeBilingual wraps the rendered prompt in fenced blocks (single, translated, or EN + 中文); numeric/technical spec tokens pass through unchanged.
  • --dialogue "<speaker>: <line> [|| <speaker>: <line>]"parseDialogueLine
    • withDialogue attach one- or two-speaker spoken dialogue.
  • --hook <patternId> — prepends a resolved opening-hook directive.
  • --category <id> — resolves a CategoryDescriptor to shape the prompt.

Seedance Asset Library end-to-end flow

Seedance character/product consistency on the official ark/seedance-2.0 endpoint goes through managed Asset Library avatars (Asset:// URIs), not raw photoreal URLs (which trip the "real person" content filter and don't lock identity). The end-to-end flow:

  1. vclaw video seedance-register-assets (src/video/seedance-asset-library.ts) registers each --character <name>:<imageUrl> as an Image asset under a group, polls until it reaches the international Ark profile (sync_status: active), and writes artifacts/seedance-assets.json (contract: schemas/video/artifacts/seedance-assets.schema.json). Requires SUTUI_API_KEY.
  2. At runtime, src/video/execution-runtime.ts reads seedance-assets.json (only when recommendedRouteId === 'seedance-direct') and auto-resolves each scene's cast names to their Asset:// URIs, which become that scene's reference set.
  3. src/video/native-seedance.ts routes Asset:// references into the Seedance reference_images param and enforces the per-generation reference budget via assertReferenceBudget (≤9 image, ≤3 video, ≤3 audio) — validated for every task before any network call, so an over-budget task can't cause a partial submit.

Built to be driven by agent hosts like Claude Code, Claude Desktop, or Codex · Source-available, commercial use requires a paid license.