Skip to content

Brand DNA & Brand Extraction

Clawbot, the videoclaw mascot, illustrating brand dna

Point videoclaw at any company's website and it reads that site like a brand strategist — pulling the real colors, fonts, voice, and messaging into a single brand-dna.json file that can then seed a whole video brief --project automatically. You go from a URL to a brand-aware video plan --project in two commands.

What it does

  • Reads a live website. Fetches the page (plain fetch, no headless browser, no extra dependencies) and scrapes the title, meta description, body text, logo images, color values, and font families.
  • Extracts the real color palette deterministically. It harvests every #hex and rgb() color on the page, throws out white/black/grey neutrals, ranks the rest by frequency, and keeps the top 3 as primary and the next 5 as secondary. Same site in, same palette out — every time.
  • Understands the brand's voice. One strict-JSON pass through Google Gemini turns the page text into structured fields: brand name, industry, tagline, value proposition, tone of voice, brand personality, target audience, key messages, plus an imageryStyle and layoutStyle chosen from a fixed list.
  • Captures the logo. Picks the best logo candidate (a logo-tagged image, falling back to the og:image, then the favicon).
  • Writes one machine-readable artifactbrand-dna.json — that downstream stages can consume.
  • Seeds your brief. video brief --from-brand-dna reads that artifact and auto-fills the brief's title and intent (from the brand name + value proposition) and parks the full brand profile under metadata.brandDna.
  • Stays safe. It records logo URL, colors, and text only. Photoreal faces (founder headshots, people in og:image) are never registered or passed downstream — they trip the real-person content filter and don't help anyway.

Why it matters: this is the only AI step in the whole brief → storyboard → story-bible chain. It is quarantined in its own stage with its own artifact, so everything after it stays deterministic and reproducible.

How to use it

All commands use node dist/cli/vclaw.js video .... If you installed videoclaw globally, you can type vclaw video ... instead.

bash
node dist/cli/vclaw.js video init acme-launch

Creates the project workspace at projects/acme-launch/.

bash
export GEMINI_API_KEYS="key1,key2"   # or GOOGLE_API_KEY=...
node dist/cli/vclaw.js video brand-extract --project acme-launch --url https://www.acme.com

Scrapes the site, ranks the palette, runs the Gemini voice pass, and writes projects/acme-launch/artifacts/brand-dna.json. Prints the full artifact as JSON on stdout.

bash
node dist/cli/vclaw.js video brand-extract --project acme-launch --url https://www.acme.com --gemini-endpoint https://my-proxy/v1beta/models/gemini-2.0-flash:generateContent

Same, but routes the AI call through a custom endpoint (also overridable via VCLAW_GEMINI_API_ENDPOINT).

bash
node dist/cli/vclaw.js video brief --project acme-launch --from-brand-dna

Builds the brief straight from the extracted brand DNA — no --title or --intent needed. The brand name becomes the title and the value proposition becomes the intent.

bash
node dist/cli/vclaw.js video brief --project acme-launch --from-brand-dna --title "Acme Spring Launch"

Same, but you override one field by hand. Explicit flags always win; brand DNA only supplies the fallbacks you leave out.

How it flows

brand dna diagram

Diagram source (live Mermaid)

Artifacts & outputs

video brand-extract writes one artifact:

  • projects/<slug>/artifacts/brand-dna.json — schema: schemas/video/artifacts/brand-dna.schema.json.

Shape (schemaVersion: 1, kind: "brand-dna"):

  • brandName, industry, tagline, valueProposition, targetAudience — strings
  • toneOfVoice, brandPersonality, keyMessages — string arrays (3-5 entries each)
  • primaryColors (top 3), secondaryColors (next 5), fonts — string arrays
  • logoUrl — string or null
  • imageryStyle — one of professional | casual | illustrated | cinematic | minimalist | editorial
  • layoutStyle — one of modern | classic | minimalist | bold | editorial
  • sourceUrl, createdAt, projectSlug — provenance

When you run video brief --from-brand-dna, the full brand profile is copied into the brief's metadata.brandDna so later stages (story bible, storyboard) can read it.

Tips & gotchas

Deterministic palette, AI voice

The colors and fonts are computed by code, not guessed by the model — so the palette is stable across runs. Only the voice/messaging fields come from Gemini (low temperature 0.2), and the deterministic palette is merged over any color the model might have hallucinated.

Gemini keys are required

brand-extract needs a Gemini API key. Set GEMINI_API_KEYS (comma-separated for round-robin) or GOOGLE_API_KEY. Without it the run fails with "No Gemini API keys configured". See the Gemini key pool.

URL must be http(s) and reachable

The URL is validated up front (only http:/https: are accepted) and the scrape fetch hard-times-out at 20 seconds so an unattended/overnight run can't hang. A non-200 response fails the command.

Run extract before brief

video brief --from-brand-dna errors if no brand-dna.json exists yet. Run brand-extract for that project first.

No faces leak downstream

By design, founder headshots and people in the page's og:image are never carried forward as reference images — they break the ARK/Seedance real-person filter. Describe characters by visual descriptor instead (see characters).

Driving it from an agent

An AI agent (e.g. Claude Code) runs node dist/cli/vclaw.js video brand-extract --project <slug> --url <site>, parses the JSON printed on stdout, then chains video brief --project <slug> --from-brand-dna to start the pipeline. Both commands exit non-zero with a structured missing_required_flag error (listing the missing flags) when --project/--url are absent or when the brand-dna artifact is missing for --from-brand-dna, so the agent can branch on exit code and re-run with corrected arguments.

Built to be driven by agent hosts like Claude Code, Claude Desktop, or Codex · Source-available, commercial use requires a paid license.