Multi-Shot Prompt Framework — Integration Design
Date: 2026-05-27 Status: Approved (design); pending implementation plan Source skill: multi-shot-prompt-framework.md (compressed cinematic multi-shot prompt builder)
Summary
Integrate the multi-shot cinematic prompt framework into videoclaw-v3 across three coordinated layers: a reference doc (source of truth), a repo skill (authors the cinematic prose), and a code module (deterministically scaffolds, enforces the hard rules, and persists an artifact). A new standalone vclaw video multi-shot command exposes the code layer. The framework's opinionated values (15s total, 2–5s shots, ≤1500 chars, fixed Nolan/IMAX Style line, diegetic-only Audio) ship as the default cinematic-15s preset and are parametrizable per provider/project.
Decisions (locked during brainstorming)
- All three layers — reference doc as source of truth, a skill that authors prose, and a code module that enforces the rules.
- Standalone now, scene-aware later — Phase 1 ships a standalone
vclaw video multi-shotcommand independent of a full project; Phase 2 wires it into the per-scene flow. - Parametrize, framework defaults — duration, char-limit, shot-length bounds, Style line, and Audio line are configurable; the framework's values are the default
cinematic-15spreset. - Authoring split A+C — the code module owns the deterministic timecode plan, provider preset, validator, and artifact; the skill writes the cinematic prose; an optional
--autoflag calls Gemini (mirroring the existinganalyze/analyze --autopattern) to author prose end-to-end.
Rationale
videoclaw-v3 already separates guidance (skills + references/video/*) from enforcement (code in src/video/*, validated by prompt-quality.ts, contracted by schemas/video/). The framework is fundamentally a prompt-formatting strategy for a single 15s clip, so it maps cleanly onto these layers:
- Creative shot prose ("neon reflecting off wet asphalt") is non-deterministic and belongs to the skill / Gemini — not to TypeScript.
- The hard rules (char budget, contiguous timecodes totaling exactly the preset duration, non-repeating camera parameters, required metadata block) are deterministic and belong to a pure, unit-testable validator.
The --auto path reuses the repo's established analyze vs analyze --auto dual pattern, so it is idiomatic rather than novel.
Architecture
Layer 1 — Reference doc (source of truth)
- File:
references/video/multi-shot-framework.md - Adapted from the source skill, reframed so the hard-coded values are presented as the
cinematic-15spreset (the default) rather than universal law. The workflow, shot-design constraints, trim priority, example, and variation guidance are preserved. - Registration: add to
REFERENCE_REGISTRYinsrc/video/prompt-library.ts:{ name: 'multi-shot-framework', category: 'framework', summary: 'Compressed timecoded multi-shot cinematic prompt builder (cinematic-15s preset).', file: 'multi-shot-framework.md' } - Surfaced via:
vclaw video prompt-lib-show --name multi-shot-framework.
Layer 2 — Skill (authors prose)
- File:
skills/multi-shot-prompt/SKILL.md - Adapted from the source file; frontmatter (
name,description) and trigger phrases preserved. The workflow is re-anchored to the code layer:vclaw video multi-shot --plan→ timecode scaffold + preset + suggested non-repeating camera grid.- Author cinematic prose into each shot slot (the skill's existing creative work).
vclaw video multi-shot --validate→ confirm the hard rules pass.
- References a real command, so it passes
check:skill-frontdoorwithout being added to the ignore list. (Do not modify the ignore list — it is load-bearing.)
Layer 3 — Code module (enforces)
src/video/multi-shot-prompt.ts
MultiShotPresetinterface:{ name: string; totalSeconds: number; minShotSeconds: number; maxShotSeconds: number; maxChars: number; styleLine: string; audioLine: string }- Default export preset
cinematic-15s:{ totalSeconds: 15, minShotSeconds: 2, maxShotSeconds: 5, maxChars: 1500, styleLine: 'Cool shadows, natural skin tones. IMAX-scale composition, deep focus, practical lighting. High contrast, grounded realism. In the style of a Christopher Nolan movie.', audioLine: 'Diegetic sound only — natural ambience, environmental foley, and subject-driven sound.' } buildShotPlan(preset, opts?)→ picks a shot count in 3–7 (seedable so it varies between calls), assigns per-shot durations that sum tototalSecondswith each within[minShotSeconds, maxShotSeconds], and returns:- per-shot slots
{ index, start, end, timecode } - a suggested non-repeating
{ shotSize, lens, angle, movement }grid drawn from the vocabularies already exported byprompt-quality.ts.
- per-shot slots
assembleMetadataBlock(preset, location, timeOfDay)→ the 3-line Location/Style/Audio block (no blank lines between the three).formatTimecode(seconds)→MM:SS.
Extend src/video/prompt-quality.ts
runMultiShotChecks(prompt: string, preset: MultiShotPreset): PromptQualityIssue[]— reuses the existingPromptQualityIssuetype andCAMERA_MOVE_VOCABULARY/SHOT_TYPE_VOCABULARYexports. Checks:- timecodes parse, are contiguous, start at
00:00, end exactly attotalSeconds; - each shot duration within
[minShotSeconds, maxShotSeconds]; - total character count ≤
maxChars; - no repeated shot-size / lens / angle / movement in consecutive shots;
- Location/Style/Audio metadata block present.
- timecodes parse, are contiguous, start at
- New
PromptQualityIssueCodevalues are added for the multi-shot-specific checks.
Layer 4 — Artifact + schema
- Schema:
schemas/video/artifacts/multi-shot-prompt.schema.json - Shape:
{ preset: string, location: string, timeOfDay: string, shots: [{ timecode, start, end, shotSize, lens, angle, movement, description }], promptText: string, charCount: number, valid: boolean, issues: PromptQualityIssue[] } - Persisted under
projects/<slug>/artifacts/(via the existing artifact-store conventions) only when--projectis supplied; standalone runs emit to stdout without persisting.
CLI — vclaw video multi-shot
Dispatched from src/cli/vclaw.ts via the existing hand-rolled command === 'video' && subcommand === 'multi-shot' pattern.
Modes:
--plan→ emit timecode scaffold + preset + suggested camera grid (JSON) for the skill/operator to fill.--validate (--file <path> | stdin)→ runrunMultiShotChecks, emit issues JSON, exit nonzero on anyerror-severity issue.--auto --image <path> [--character <t>] [--action <t>] [--location <t>] [--time <t>]→ Gemini (gemini-analyze.ts+ the Gemini key pool) analyzes the image and authors prose into the plan, then validates → finished prompt. (The "C" half.)
Overrides / flags: --preset <name>, --shots <N>, --max-chars <N>, --total-seconds <N>, --style-line <t>, --audio-line <t>, --project <slug> (persist artifact), --raw (print just the prompt code block).
Output is machine-readable JSON by default; --raw prints the copy-paste prompt block only.
Data flow
operator/skill ──▶ vclaw video multi-shot --plan ──▶ {preset, shots[], grid}
│ │
│ (author prose into slots) │
▼ │
prompt text ──▶ vclaw video multi-shot --validate ──▶ PromptQualityIssue[]
└─▶ (exit 0 ok / nonzero on error)
--auto path: image + brief ──▶ gemini-analyze ──▶ plan filled ──▶ validate ──▶ prompt
--project: finished result ──▶ multi-shot-prompt artifact (schema-validated)Error handling
- Invalid preset name / out-of-range override (e.g.
minShotSeconds > maxShotSeconds, or no valid shot-count partition for the requested--shots/--total-seconds) → fail fast with a clear message, nonzero exit. --validatewith unparseable timecodes → reported as anerrorissue, nonzero exit.--autoGemini failure (no key, API error) → surface the error; do not silently fall back to an empty/template prompt (consistent with the repo's no-silent-fallback rule).- Missing
--imagefor--auto, or missing prompt input for--validate→ usage error.
Testing
src/tests/multi-shot-prompt.test.ts(module-contract):buildShotPlandurations sum tototalSecondsand stay within bounds across many seeds; shot count varies; validator catches char overflow, non-contiguous / over-budget timecodes, consecutive-parameter repeats, and missing metadata; preset overrides apply.src/tests/cli-multi-shot.test.ts(end-to-end):--planoutput shape;--validatepass (exit 0) and fail (nonzero) paths;--autowith a stubbed Gemini adapter (no network);--projectpersistence writes a schema-valid artifact. Usemkdtemp/tmpdirfor isolation.
Docs & guardrails (per CLAUDE.md conventions)
- Add the command to
README.mdanddocs/CLI_REFERENCE.md. - Add the
prompt-library.tsregistry entry (above). - Nice-to-have:
smoke:multi-shotnpm script doing a plan→validate round-trip with no network. check:skill-frontdoorandcheck:cleanroom-docsmust stay green; the new skill references the realvclaw video multi-shotcommand, so no ignore-list change.
Out of scope (Phase 2, separate spec)
- Scene-aware integration: expose multi-shot as a per-scene prompt format inside
scene-candidates/storyboard-markdown. - Provider-specific presets (
seedance-*,veo-*,runway-*) with their real char-length and duration limits.
Build sequence (Phase 1)
multi-shot-prompt.tsmodule (preset +buildShotPlan+ metadata/timecode helpers).- Extend
prompt-quality.tswithrunMultiShotChecks+ new issue codes. schemas/video/artifacts/multi-shot-prompt.schema.json+ artifact-store wiring.vclaw video multi-shotCLI command (--plan,--validate, then--auto).- Tests:
multi-shot-prompt.test.ts, thencli-multi-shot.test.ts. references/video/multi-shot-framework.md+prompt-library.tsregistry entry.skills/multi-shot-prompt/SKILL.md.- Docs (
README.md,docs/CLI_REFERENCE.md) + optionalsmoke:multi-shot.
