The "Reference Pack" Workflow: Keep Characters Consistent Across 10+ AI Video Clips in Veo3Gen (Borrowed From Sora 2's Official Guide)

If you’ve ever generated a great first clip—then watched your “same” character slowly morph (hair, age, face shape, outfit, even vibe) by clip five—you’re not alone. The fix usually isn’t “write longer prompts.” It’s building a small, reusable Reference Pack and a Character Card so you can keep identity stable while changing only what the scene needs.

This workflow is inspired by transferable principles from Sora 2’s official prompting guidance—especially the idea that some controls belong to the generation “container” (settings/API parameters), while the prompt mostly controls the content. The OpenAI Cookbook notes that prompts drive the content, but certain attributes are governed only by API parameters and can’t be reliably enforced in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

Below is a Veo3Gen-friendly, beginner tutorial approach you can use as of 2026-06-13.

Why character consistency breaks (and what “container vs prompt” means)

A practical way to think about consistency is:

Container controls (settings / API parameters): duration, aspect ratio, resolution, sometimes reference attachments, batch runs, etc. These are “harder” controls.
Prompt controls: who/what appears, actions, environment, style cues, camera language—these are “softer” controls.

Sora 2’s guide makes this split explicit: the prompt controls the content, but some attributes are only controlled via API parameters rather than being requestable in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

Why it matters for creators: if you try to brute-force consistency by piling on descriptors (“same face, same hair, same jacket, same everything”), you often introduce contradictions, noise, and drift.

Instead, you want:

A small set of stable identity anchors (visual references + immutable identifiers).
A prompt structure where the character block stays fixed, while each clip only changes a “scene block.”

Also note: Sora 2 specifically supports uploading a character once and reusing it across videos via character references. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide) Veo3Gen may implement references differently—treat this as a workflow principle: reuse the same reference assets and wording across a series.

Build your 5-asset “Reference Pack” (one-time setup)

Your Reference Pack is a tiny folder of assets you reuse for every clip in the series.

The 5 assets (what they are and why they work)

Hero reference image/frame
- The “official” look: face, hair, signature outfit, and overall vibe.
- Use this whenever you need to re-anchor identity.
Full-body neutral pose
- Standing, arms relaxed, clean background.
- Solves outfit continuity and body-proportion drift.
Close-up face
- Neutral expression, good lighting.
- Solves facial-feature drift across shots.
Palette / style anchor
- A simple board: 3–5 colors, texture notes (e.g., “matte,” “film grain”), and one-line style label.
- Prevents the series from “genre hopping.”
Prop / wardrobe sheet
- A single image (or 1-page PDF) with: 2–3 wardrobe variants + 2–4 key props.
- Keeps changes intentional (and avoids accidental outfit swaps).

Naming + versioning (so your series doesn’t turn into chaos)

Use a consistent naming scheme:

CHAR_MayaChen_v1_hero.png
CHAR_MayaChen_v1_fullbody.png
CHAR_MayaChen_v1_face.png
CHAR_MayaChen_v1_palette.png
CHAR_MayaChen_v1_wardrobe-props.png

When you make a deliberate update (new hairstyle, new jacket), bump the version: v2. Don’t silently overwrite v1—that’s how series consistency quietly breaks.

Write a reusable Character Card (copy-paste template)

A Character Card is the text counterpart to your Reference Pack. You paste it into every clip prompt, unchanged.

Character Card template

Copy, fill, and reuse:

CHARACTER CARD (keep fixed across the series)

Name/role: [Name], [role]
Immutable identifiers (do not change):
- Face/age cues: [e.g., late 20s, oval face, defined cheekbones]
- Ethnicity cues (if relevant): [keep respectful + minimal]
- Distinctive features: [e.g., small scar on left eyebrow, freckles, dimple]
Hair (locked): [color, length, style]
Signature clothing items (locked):
- [Item 1]
- [Item 2]
Accessories (locked): [e.g., round glasses, silver ring]
Voice/vibe (if applicable): [e.g., calm, friendly, concise; “UGC-style presenter energy”]
Never change list:
- Don’t change: [hair color/style]
- Don’t change: [glasses type]
- Don’t change: [signature jacket]
- Don’t change: [overall age range]

Tip: keep this short, stable, and non-contradictory. If you add five new adjectives every clip, you’re training the model to remix the character.

Create 3 “wardrobe + lighting” variants (to avoid samey outputs)

Consistency doesn’t mean repetition. If every clip has the same outfit, same background, same lighting, your series can feel templated.

Make three approved variants that still read as the same character:

Variant A (Default): signature outfit + natural daylight
Variant B (Studio): same signature item (e.g., jacket) + clean key light
Variant C (Night): same silhouette + practical neon / street light

Store these in the wardrobe sheet and reference them by name (A/B/C). The identity anchors stay fixed; the presentation changes.

Generate Clip 1–3: the calibration phase

Your first three clips aren’t “production.” They’re calibration.

What to lock vs what to explore

Lock: Character Card, hero reference, face close-up reference, signature clothing, and 1–2 props.
Explore: camera distance, background complexity, and one lighting variant.

If you’re using a tool that supports “container” controls (duration, size, etc.), set them deliberately and keep them stable during calibration. Sora 2’s guide emphasizes that some outputs are better handled via API parameters rather than prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

Also, prompt organization helps. One 2026 prompting tips post notes Sora 2 responds best to well-organized prompts with sections like what happens, how it looks, and what we hear. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026) You can borrow that structure even if you’re not using Sora.

Scale to 10+ clips with the “prompt diff” method

The core scaling trick: keep 80–90% of the prompt identical, and only “diff” the parts that must change.

Concrete example: 1 character + a 10-clip series concept

Character: Maya Chen, a practical, calm creator who demos productivity hacks.

10-clip series: “One-Minute Desk Fixes” (shorts/ads/UGC)

Cable clutter fix
Glare-free monitor setup
Laptop stand posture
Keyboard cleaning
Pomodoro timer habit
One-tab browsing
Note-taking template
Desk lighting upgrade
Travel kit packing
End-of-series recap

Base prompt (constant block)

Use a consistent structure:

CHARACTER CARD: (paste your Character Card here, unchanged)
STYLE: handheld UGC, natural skin tones, clean modern look, palette per CHAR_MayaChen_v1_palette
CAMERA: mid-shot to close-up, stable focus on character
AUDIO (optional): soft room tone; no music unless specified

Prompt diffs (only what changed)

Below, only the SCENE block changes.

Clip A (Cable clutter fix)

SCENE: Maya at a home desk, daytime. She holds a small cable clip and tidies two charging cables on the desk edge. End with a neat “before/after” reveal.

Clip B (Glare-free monitor setup)

SCENE (diff): Same desk, but she rotates the monitor slightly away from a window and pulls down a sheer curtain. She points to the reduced glare on screen.

Clip C (Travel kit packing)

SCENE (diff): Maya in a hotel room desk area at night (Variant C lighting). She packs a compact pouch with two key items from the prop sheet (charging cable + small adapter). End with pouch zipped.

This is the method: one stable identity container + small scene edits.

When consistency slips: fast fixes

Drift is normal. Fix it quickly by diagnosing the cause.

Common drift causes (and what to change)

Too many new descriptors
- Symptom: face shape or age subtly changes.
- Fix: delete extra adjectives; keep immutable identifiers only.
Conflicting wardrobe instructions
- Symptom: jacket changes color, accessories vanish.
- Fix: reference Variant A/B/C explicitly and restate only signature items.
Big camera/lighting jumps
- Symptom: “new person” feeling due to harsh shadows or extreme lenses.
- Fix: return to your calibrated camera language (mid-shot/close-up), and reuse the palette/style anchor.
Introducing new characters
- Symptom: identity bleed—your main character starts borrowing traits.
- Fix: keep early clips solo. If you add a second person, define them minimally and avoid overlapping descriptors. (Note: Sora 2’s optional characters parameter can reference up to two uploaded characters in a generation. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide) Treat that as a reminder to limit cast size when you care about continuity.)
Scene chaos (crowds, fast action, heavy props)
- Symptom: inconsistent hands, clothing swaps, “random extras.”
- Fix: simplify blocking; keep the character centered; reduce background complexity.

The quickest re-anchor move

When you see drift, do one clip that’s intentionally simple:

Use the hero reference + face close-up
Neutral background
Variant A lighting
Mid-shot

Then resume the series with your prompt diffs.

Mini checklist before you export a series

Use this right before you batch-generate or finalize your 10+ clips.

Same Character Card pasted into every prompt (no “helpful” edits)
Same Reference Pack version (v1 vs v2) attached/used consistently
Wardrobe variants limited to A/B/C (no surprise outfits)
One stable style/palette anchor across all clips
Each clip changes only the SCENE block (prompt diff discipline)

Explore build-friendly endpoints and automation options in the Veo3Gen API.
Estimate costs for a 10+ clip series in Pricing.