Creator How-To (Image-to-Video) ·
Fix “It Didn’t Follow My Image” in Veo3Gen: A 7‑Step Reference‑Lock Checklist (inspired by Luma’s Best Practices)
Fix AI video reference image drift in Veo3Gen with a 7-step reference-lock checklist, prompt templates, and a drift triage matrix.
On this page
- The 3 most common “reference not followed” failure modes (and how to identify yours)
- Step 1 — Start with a “reference-first” prompt (tight subject + action, loose style)
- The ordering principle: lock subject → lock action → then add style
- Rewrite example: “too cinematic / too many adjectives” → reference-first
- Step 2 — Use positive constraints before negative constraints (what to say vs what to forbid)
- Practical translation for Veo3Gen
- Step 3 — Reduce motion complexity to re-anchor identity (micro-motion pass)
- Micro-motion pass recipe (use before complex moves)
- Step 4 — Re-assert key invariants (wardrobe, hair, props, environment) with a short lock list
- Step 5 — Iterate with targeted edits (crop/region focus, one change at a time)
- Targeted iteration rules
- Step 6 — When to switch from text-only to image+text (and back)
- Practical guidance for Veo3Gen (tool-agnostic)
- Step 7 — A/B test with 4 prompt variants (the “drift triage matrix”)
- Drift triage matrix (symptom → likely cause → best next change)
- Copy/paste templates: 5 reference-lock prompts for creators & marketers
- The 3-line “Reference-Lock” template (copy/paste)
- 1) Talking-head ad (founder or spokesperson)
- 2) UGC-style “holding product” clip
- 3) Product tabletop shot (ecommerce)
- 4) App/SaaS “laptop demo” scene
- 5) Character consistency for a series
- Quick FAQ (why it changes faces, why logos morph, why backgrounds wander)
- Why does the face change even when I supply a reference image?
- Why do logos and label text morph into nonsense?
- Should I use negative prompts like “no face change”?
- My background keeps changing. What’s the fastest fix?
- Related reading
- Ready to automate reference-locked generations?
The 3 most common “reference not followed” failure modes (and how to identify yours)
When creators say “It didn’t follow my image,” they usually mean drift: the generated video gradually stops honoring what your reference image establishes.
Here are the three drift types worth separating (because each one has a different fix):
- Identity drift: the face/character stops matching the reference.
- 1-sentence diagnostic: If you pause at random frames and the person no longer looks like the same person, you have identity drift.
- Wardrobe/prop drift: outfit, hair, accessories, product label, or a key object morphs.
- 1-sentence diagnostic: If the same character remains, but the jacket becomes a hoodie or the logo changes shape, you have wardrobe/prop drift.
- Scene drift: the background, location, or set dressing “wanders” away from the reference.
- 1-sentence diagnostic: If the subject is mostly right but the environment keeps changing (room becomes a street, color palette shifts, new objects appear), you have scene drift.
This post uses prompting best-practice concepts popularized in Luma Dream Machine—like clear descriptors, iteration, and a bias toward positive prompting—as a practical framework you can apply in Veo3Gen today, without retraining or switching tools. (Use it as guidance “as of 2026-02-12,” since model behavior and product features can evolve.)
Step 1 — Start with a “reference-first” prompt (tight subject + action, loose style)
A common cause of drift is prompting that over-emphasizes cinematic style before the model has “locked” the subject and action.
Luma’s best-practices guidance encourages using natural language and adding clear descriptors to get more accurate results. (https://lumalabs.ai/learning-hub/best-practices) The trick is where those descriptors go.
The ordering principle: lock subject → lock action → then add style
- Lock the subject/identity (who/what must match the image)
- Lock the action/motion (what happens, how much motion)
- Add style last (cinematic, film stock, mood—only after you’ve anchored identity)
Rewrite example: “too cinematic / too many adjectives” → reference-first
Before (drift-prone):
Ultra cinematic anamorphic masterpiece, dramatic volumetric god rays, intense film grain, epic cyberpunk mood, 35mm, fast handheld push-in, high energy, breathtaking…
After (reference-first):
Same person as the reference image. Medium close-up talking-head, facing camera. Calm delivery, subtle mouth movement and blinking. Keep the same hairstyle, outfit, and skin tone as the reference. Background stays the same room. Style: clean commercial lighting, lightly cinematic.
Notice what changed: we didn’t remove adjectives—we reordered them so the model is rewarded for consistency first.
Step 2 — Use positive constraints before negative constraints (what to say vs what to forbid)
If your prompt is a long list of “don’ts,” you can accidentally amplify the very mistakes you’re trying to avoid.
Luma’s prompting support article explains that negative prompting tells the AI what to exclude, but also warns a positive-only approach is recommended, and that negative prompting can be counterproductive by increasing the chance unwanted elements appear. (https://lumaai-help.freshdesk.com/support/solutions/articles/151000219614-understanding-prompting-for-dream-machine-positive-vs-negative)
Practical translation for Veo3Gen
- Start with positive constraints: “same face,” “same red hoodie,” “same product label,” “same kitchen background.”
- Add only 1–2 negatives if needed, phrased tightly (e.g., “no face swap,” “no text changes”).
Step 3 — Reduce motion complexity to re-anchor identity (micro-motion pass)
Big motion invites the model to “solve” continuity in ways you don’t want—especially with faces, logos, and fine details.
Micro-motion pass recipe (use before complex moves)
Generate a short clip where the subject does only:
- gentle breathing
- a blink every few seconds
- a slow head turn (small angle)
- minimal camera movement (or none)
If Veo3Gen can hold identity here, you’ve created a reliable anchor. Then you can step up motion (walking, dancing, fast pans) while keeping the same invariants.
Step 4 — Re-assert key invariants (wardrobe, hair, props, environment) with a short lock list
When drift happens, it’s often because the prompt never explicitly said what must not change.
Create a short “lock list” of invariants—keep it to 4–8 items. This is especially helpful for:
- Product shots (label text, cap color, bottle shape)
- Talking-head ads (hairline, earrings, brand tee)
- UGC-style clips (phone model, room decor, specific outfit)
Example lock list:
- same face and age as reference
- same hairstyle (length + part)
- same outfit color and type
- same primary prop (e.g., white mug)
- same background location
Luma’s best-practices doc also notes you can ask for specific text by specifying it in the prompt (e.g., a poster that reads a certain phrase). (https://lumalabs.ai/learning-hub/best-practices) Even if you’re not generating a poster, the underlying idea is useful: be explicit about the exact text/label you want preserved.
Step 5 — Iterate with targeted edits (crop/region focus, one change at a time)
Drift troubleshooting is an iteration game: change one variable, observe, repeat.
Luma describes a Modify workflow where you describe specific changes (e.g., warmer colors, more trees). (https://lumalabs.ai/learning-hub/best-practices) The general best practice applies regardless of tool: avoid rewriting your whole prompt between attempts.
Targeted iteration rules
- Change one thing per run (e.g., only reduce camera motion; don’t also add a new style).
- If the face drifts, tighten identity and reduce motion.
- If the background drifts, tighten scene invariants and remove extra setting adjectives.
Mini checklist: “One change at a time”
- Keep the same reference image
- Keep the same subject line
- Adjust only one variable (motion or scene or wardrobe)
- Re-run 2–4 variations
- Promote the best result and iterate again
Step 6 — When to switch from text-only to image+text (and back)
Use image+text when the exact look matters (identity, product packaging, wardrobe). Use text-only when you’re exploring concepts and don’t want the model overfitting to a single still.
A useful mental model from Luma’s ecosystem: character reference vs visual reference—uploading an image to keep a character consistent, or to carry a style forward. (https://lumalabs.ai/learning-hub/best-practices)
Practical guidance for Veo3Gen (tool-agnostic)
- If you’re getting identity drift, go back to image+text with a micro-motion pass.
- If you’re getting scene drift from an overly specific reference, try text-only to define the environment, then reintroduce the reference once the scene stabilizes.
Step 7 — A/B test with 4 prompt variants (the “drift triage matrix”)
When you don’t know what’s causing drift, run four controlled variants. Luma’s “Reply” workflow emphasizes iterative branching—replying to a generation with a new prompt to produce a fresh batch. (https://lumalabs.ai/learning-hub/how-to-use-reply)
Even if Veo3Gen’s workflow differs, the principle holds: branch, compare, then converge.
Drift triage matrix (symptom → likely cause → best next change)
| Symptom | Likely cause | Single best next change |
|---|---|---|
| Face changes across frames | Too much motion or style-first prompt | Do a micro-motion pass + move style to the end |
| Outfit/label morphs | Invariants not stated | Add a 4–8 item lock list (colors, materials, exact text) |
| Background “teleports” | Prompt describes multiple locations/moods | Remove extra setting adjectives; lock one location explicitly |
| Logos/text become gibberish | Text not specified or over-styled | Specify exact text plainly; reduce heavy style terms (https://lumalabs.ai/learning-hub/best-practices) |
| Everything is consistent but “boring” | Over-constrained motion | Keep invariants; add one camera move (slow pan/zoom) |
Note: Luma’s best practices list camera motion options like pan/orbit/zoom. (https://lumalabs.ai/learning-hub/best-practices) In Veo3Gen, you can still describe these motions in natural language.
Copy/paste templates: 5 reference-lock prompts for creators & marketers
Luma-oriented guides often suggest a structured element order (camera/shot → subject → action → camera movement → lighting → mood). (https://filmart.ai/luma-dream-machine/) For drift fixing, we’ll slightly tweak that order to prioritize the reference.
The 3-line “Reference-Lock” template (copy/paste)
Subject / Identity: Same subject as the reference image. [age range, gender presentation, defining features].
Action / Motion: [simple action]. [camera distance]. [slow/steady camera instruction].
Invariants + exclusions: Keep [hair], [outfit], [props], [background], [colors], [exact text]. Exclude [1–2 critical negatives].
1) Talking-head ad (founder or spokesperson)
Subject / Identity: Same person as the reference image, same face shape, skin tone, and hairstyle.
Action / Motion: Medium close-up, looking at camera, speaking calmly. Subtle blinking and breathing only.
Invariants + exclusions: Keep the same outfit and earrings. Keep the same indoor background and lighting. Exclude face swap, extra people.
2) UGC-style “holding product” clip
Subject / Identity: Same person as the reference image.
Action / Motion: Handheld selfie framing, slow sway, person holds the product toward the camera for 2 seconds, then smiles.
Invariants + exclusions: Keep the same top color, same hair, same product shape and label text. Exclude brand/logo changes.
3) Product tabletop shot (ecommerce)
Subject / Identity: Same product as the reference image, same packaging and materials.
Action / Motion: Static tripod shot, slow push-in. Product rotates slightly (small angle).
Invariants + exclusions: Keep label text exactly: “[YOUR LABEL TEXT]”. Keep cap color and bottle shape. Exclude extra text, warped logo.
4) App/SaaS “laptop demo” scene
Subject / Identity: Same laptop and desk setup as the reference image.
Action / Motion: Over-the-shoulder shot, slow pan across the screen.
Invariants + exclusions: Keep the same UI layout and brand colors. Keep the same room background. Exclude random pop-ups, extra icons.
5) Character consistency for a series
Subject / Identity: Same character as the reference image, keep the same facial features and hair silhouette.
Action / Motion: Standing pose, slow head turn, minimal body movement.
Invariants + exclusions: Keep the same outfit style and primary colors. Keep the same environment. Exclude age change, hairstyle change.
Quick FAQ (why it changes faces, why logos morph, why backgrounds wander)
Why does the face change even when I supply a reference image?
Identity drift often increases with complex motion and style-heavy prompting. Try a micro-motion pass and move style to the end of the prompt.
Why do logos and label text morph into nonsense?
Text is a high-precision detail. Be explicit about the exact text you want. Luma’s best practices note you can request specific text by stating it in the prompt. (https://lumalabs.ai/learning-hub/best-practices)
Should I use negative prompts like “no face change”?
Use negatives sparingly. Luma’s prompting guidance recommends a positive-first approach and notes negative prompting can be counterproductive. (https://lumaai-help.freshdesk.com/support/solutions/articles/151000219614-understanding-prompting-for-dream-machine-positive-vs-negative)
My background keeps changing. What’s the fastest fix?
Remove extra location/mood adjectives and lock one setting with a short invariants list (e.g., “same kitchen, same cabinets, same lighting”).
Related reading
Ready to automate reference-locked generations?
If you’re generating lots of variants (ads, UGC batches, product angles), it’s worth standardizing your “reference-lock” prompts and running A/B tests programmatically.
- Explore the developer workflow in the Veo3Gen API docs
- Estimate costs and scale plans on pricing
Keep your workflow simple: lock identity first, constrain motion next, then layer style—so the reference stays in charge.
Try Veo 3 & Veo 3 API for Free
Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.