The 3 most common “reference not followed” failure modes (and how to identify yours)

When creators say “It didn’t follow my image,” they usually mean drift: the generated video gradually stops honoring what your reference image establishes.

Here are the three drift types worth separating (because each one has a different fix):

Identity drift: the face/character stops matching the reference.
- 1-sentence diagnostic: If you pause at random frames and the person no longer looks like the same person, you have identity drift.
Wardrobe/prop drift: outfit, hair, accessories, product label, or a key object morphs.
- 1-sentence diagnostic: If the same character remains, but the jacket becomes a hoodie or the logo changes shape, you have wardrobe/prop drift.
Scene drift: the background, location, or set dressing “wanders” away from the reference.
- 1-sentence diagnostic: If the subject is mostly right but the environment keeps changing (room becomes a street, color palette shifts, new objects appear), you have scene drift.

This post uses prompting best-practice concepts popularized in Luma Dream Machine—like clear descriptors, iteration, and a bias toward positive prompting—as a practical framework you can apply in Veo3Gen today, without retraining or switching tools. (Use it as guidance “as of 2026-02-12,” since model behavior and product features can evolve.)

Step 1 — Start with a “reference-first” prompt (tight subject + action, loose style)

A common cause of drift is prompting that over-emphasizes cinematic style before the model has “locked” the subject and action.

Luma’s best-practices guidance encourages using natural language and adding clear descriptors to get more accurate results. (https://lumalabs.ai/learning-hub/best-practices) The trick is where those descriptors go.

The ordering principle: lock subject → lock action → then add style

Lock the subject/identity (who/what must match the image)
Lock the action/motion (what happens, how much motion)
Add style last (cinematic, film stock, mood—only after you’ve anchored identity)

Rewrite example: “too cinematic / too many adjectives” → reference-first

Before (drift-prone):

Ultra cinematic anamorphic masterpiece, dramatic volumetric god rays, intense film grain, epic cyberpunk mood, 35mm, fast handheld push-in, high energy, breathtaking…

After (reference-first):

Same person as the reference image. Medium close-up talking-head, facing camera. Calm delivery, subtle mouth movement and blinking. Keep the same hairstyle, outfit, and skin tone as the reference. Background stays the same room. Style: clean commercial lighting, lightly cinematic.

Notice what changed: we didn’t remove adjectives—we reordered them so the model is rewarded for consistency first.

Step 2 — Use positive constraints before negative constraints (what to say vs what to forbid)

If your prompt is a long list of “don’ts,” you can accidentally amplify the very mistakes you’re trying to avoid.

Luma’s prompting support article explains that negative prompting tells the AI what to exclude, but also warns a positive-only approach is recommended, and that negative prompting can be counterproductive by increasing the chance unwanted elements appear. (https://lumaai-help.freshdesk.com/support/solutions/articles/151000219614-understanding-prompting-for-dream-machine-positive-vs-negative)

Practical translation for Veo3Gen

Start with positive constraints: “same face,” “same red hoodie,” “same product label,” “same kitchen background.”
Add only 1–2 negatives if needed, phrased tightly (e.g., “no face swap,” “no text changes”).

Step 3 — Reduce motion complexity to re-anchor identity (micro-motion pass)

Big motion invites the model to “solve” continuity in ways you don’t want—especially with faces, logos, and fine details.

Micro-motion pass recipe (use before complex moves)

Generate a short clip where the subject does only:

gentle breathing
a blink every few seconds
a slow head turn (small angle)
minimal camera movement (or none)

If Veo3Gen can hold identity here, you’ve created a reliable anchor. Then you can step up motion (walking, dancing, fast pans) while keeping the same invariants.

Step 4 — Re-assert key invariants (wardrobe, hair, props, environment) with a short lock list

When drift happens, it’s often because the prompt never explicitly said what must not change.

Create a short “lock list” of invariants—keep it to 4–8 items. This is especially helpful for:

Product shots (label text, cap color, bottle shape)
Talking-head ads (hairline, earrings, brand tee)
UGC-style clips (phone model, room decor, specific outfit)

Example lock list:

same face and age as reference
same hairstyle (length + part)
same outfit color and type
same primary prop (e.g., white mug)
same background location

Luma’s best-practices doc also notes you can ask for specific text by specifying it in the prompt (e.g., a poster that reads a certain phrase). (https://lumalabs.ai/learning-hub/best-practices) Even if you’re not generating a poster, the underlying idea is useful: be explicit about the exact text/label you want preserved.

Step 5 — Iterate with targeted edits (crop/region focus, one change at a time)

Drift troubleshooting is an iteration game: change one variable, observe, repeat.

Luma describes a Modify workflow where you describe specific changes (e.g., warmer colors, more trees). (https://lumalabs.ai/learning-hub/best-practices) The general best practice applies regardless of tool: avoid rewriting your whole prompt between attempts.

Targeted iteration rules

Change one thing per run (e.g., only reduce camera motion; don’t also add a new style).
If the face drifts, tighten identity and reduce motion.
If the background drifts, tighten scene invariants and remove extra setting adjectives.

Mini checklist: “One change at a time”

Keep the same reference image
Keep the same subject line
Adjust only one variable (motion or scene or wardrobe)
Re-run 2–4 variations
Promote the best result and iterate again

Step 6 — When to switch from text-only to image+text (and back)

Use image+text when the exact look matters (identity, product packaging, wardrobe). Use text-only when you’re exploring concepts and don’t want the model overfitting to a single still.

A useful mental model from Luma’s ecosystem: character reference vs visual reference—uploading an image to keep a character consistent, or to carry a style forward. (https://lumalabs.ai/learning-hub/best-practices)

Practical guidance for Veo3Gen (tool-agnostic)

If you’re getting identity drift, go back to image+text with a micro-motion pass.
If you’re getting scene drift from an overly specific reference, try text-only to define the environment, then reintroduce the reference once the scene stabilizes.

Step 7 — A/B test with 4 prompt variants (the “drift triage matrix”)

When you don’t know what’s causing drift, run four controlled variants. Luma’s “Reply” workflow emphasizes iterative branching—replying to a generation with a new prompt to produce a fresh batch. (https://lumalabs.ai/learning-hub/how-to-use-reply)

Even if Veo3Gen’s workflow differs, the principle holds: branch, compare, then converge.

Drift triage matrix (symptom → likely cause → best next change)

Symptom	Likely cause	Single best next change
Face changes across frames	Too much motion or style-first prompt	Do a micro-motion pass + move style to the end
Outfit/label morphs	Invariants not stated	Add a 4–8 item lock list (colors, materials, exact text)
Background “teleports”	Prompt describes multiple locations/moods	Remove extra setting adjectives; lock one location explicitly
Logos/text become gibberish	Text not specified or over-styled	Specify exact text plainly; reduce heavy style terms (https://lumalabs.ai/learning-hub/best-practices)
Everything is consistent but “boring”	Over-constrained motion	Keep invariants; add one camera move (slow pan/zoom)

Note: Luma’s best practices list camera motion options like pan/orbit/zoom. (https://lumalabs.ai/learning-hub/best-practices) In Veo3Gen, you can still describe these motions in natural language.

Copy/paste templates: 5 reference-lock prompts for creators & marketers

Luma-oriented guides often suggest a structured element order (camera/shot → subject → action → camera movement → lighting → mood). (https://filmart.ai/luma-dream-machine/) For drift fixing, we’ll slightly tweak that order to prioritize the reference.

The 3-line “Reference-Lock” template (copy/paste)

Subject / Identity: Same subject as the reference image. [age range, gender presentation, defining features].

Action / Motion: [simple action]. [camera distance]. [slow/steady camera instruction].

Invariants + exclusions: Keep [hair], [outfit], [props], [background], [colors], [exact text]. Exclude [1–2 critical negatives].

1) Talking-head ad (founder or spokesperson)

Subject / Identity: Same person as the reference image, same face shape, skin tone, and hairstyle.

Action / Motion: Medium close-up, looking at camera, speaking calmly. Subtle blinking and breathing only.

Invariants + exclusions: Keep the same outfit and earrings. Keep the same indoor background and lighting. Exclude face swap, extra people.

2) UGC-style “holding product” clip

Subject / Identity: Same person as the reference image.

Action / Motion: Handheld selfie framing, slow sway, person holds the product toward the camera for 2 seconds, then smiles.

Invariants + exclusions: Keep the same top color, same hair, same product shape and label text. Exclude brand/logo changes.

3) Product tabletop shot (ecommerce)

Subject / Identity: Same product as the reference image, same packaging and materials.

Action / Motion: Static tripod shot, slow push-in. Product rotates slightly (small angle).

Invariants + exclusions: Keep label text exactly: “[YOUR LABEL TEXT]”. Keep cap color and bottle shape. Exclude extra text, warped logo.

4) App/SaaS “laptop demo” scene

Subject / Identity: Same laptop and desk setup as the reference image.

Action / Motion: Over-the-shoulder shot, slow pan across the screen.

Invariants + exclusions: Keep the same UI layout and brand colors. Keep the same room background. Exclude random pop-ups, extra icons.

5) Character consistency for a series

Subject / Identity: Same character as the reference image, keep the same facial features and hair silhouette.

Action / Motion: Standing pose, slow head turn, minimal body movement.

Invariants + exclusions: Keep the same outfit style and primary colors. Keep the same environment. Exclude age change, hairstyle change.

Quick FAQ (why it changes faces, why logos morph, why backgrounds wander)

Why does the face change even when I supply a reference image?

Identity drift often increases with complex motion and style-heavy prompting. Try a micro-motion pass and move style to the end of the prompt.

Why do logos and label text morph into nonsense?

Text is a high-precision detail. Be explicit about the exact text you want. Luma’s best practices note you can request specific text by stating it in the prompt. (https://lumalabs.ai/learning-hub/best-practices)

Should I use negative prompts like “no face change”?

Use negatives sparingly. Luma’s prompting guidance recommends a positive-first approach and notes negative prompting can be counterproductive. (https://lumaai-help.freshdesk.com/support/solutions/articles/151000219614-understanding-prompting-for-dream-machine-positive-vs-negative)

My background keeps changing. What’s the fastest fix?

Remove extra location/mood adjectives and lock one setting with a short invariants list (e.g., “same kitchen, same cabinets, same lighting”).

Ready to automate reference-locked generations?

If you’re generating lots of variants (ads, UGC batches, product angles), it’s worth standardizing your “reference-lock” prompts and running A/B tests programmatically.

Explore the developer workflow in the Veo3Gen API docs
Estimate costs and scale plans on pricing

Keep your workflow simple: lock identity first, constrain motion next, then layer style—so the reference stays in charge.

Fix "It Didn't Follow My Image" in Veo3Gen: A 7-Step Reference-Lock Checklist (inspired by Luma's Best Practices)