Veo 3.1 Image-to-Video Prompts That Actually Animate (Not Just Wiggle): A Beginner Guide to "Describe What Changes"

TL;DR

Stop spending tokens re-describing the photo. In Veo 3.1 image-to-video, treat the input image as the “static truth” and write text for what changes over time: (1) subject motion, (2) camera behavior, (3) one small environment motion. Use the 7-line Change Lines template, then iterate by changing one line per render so you learn what actually caused the animation (not just a micro-wiggle).

Key takeaways

The fastest fix for “it only wiggles” is: don’t describe what’s visible—describe what changes (subject → camera → environment).
Veo guidance explicitly rewards shot framing + camera motion, action, plus clear lighting and style references. (https://deepmind.google/models/veo/prompt-guide/)
Keep stability by limiting motion: one primary action + one camera move (or locked-off), then add one subtle environmental change.
Make iteration scientific: change one line at a time and record what improved/broke.
If you want fast iterations with sound in the same generation, Veo3Gen includes native, synchronized audio (dialogue/SFX/music) in one pass, plus image-to-video and first/last-frame control on Veo 3.1.

Why “describing the image” produces the dreaded micro-wiggle

With image-to-video, the model already has the frame’s composition, objects, colors, and implied style from the image itself. If your prompt repeats that (“a woman in a cafe, warm tones”), you’re not adding new temporal instructions.

When the model isn’t told what should change, it often plays it safe: tiny eye/hair shimmer, subtle background crawl, maybe a mild zoom. That’s not animation; it’s stabilization artifacts.

What actually moves the needle is the same principle you see in broader video prompting guidance: clearly direct motion, camera behavior, and temporal progression instead of stacking descriptive keywords. (https://queststudio.io/blog/runway-prompts)

The core rule: the image defines what is; your text defines what changes

Paste this above your prompt box:

Rule: Don’t re-describe what’s visible in the image. Describe what moves/changes over time and how.

This fits Veo’s own prompt guidance:

Specify shot framing and camera motion (e.g., low angle, pan). (https://deepmind.google/models/veo/prompt-guide/)
Describe action: what the character is doing and what else is happening. (https://deepmind.google/models/veo/prompt-guide/)
Specify lighting (warm even lighting, spotlight) and a style reference (cartoon, claymation, film noir on 35mm, worn VHS texture). (https://deepmind.google/models/veo/prompt-guide/)

Think in three motion layers (and don’t max all three)

Subject motion (primary): the main observable action.
Camera behavior (secondary): what the viewer “does” (push-in, pan, locked-off).
Environment motion (optional): one supporting element (steam, wind, flicker).

Beginner failure pattern: writing big changes in all three layers at once. Common result: drifting backgrounds, rubber faces, or the “whole frame melts” look.

The 7 “Change Lines” template (copy-paste)

This template forces you to include what Veo responds to—motion, camera, timing, lighting/style—without bloating the prompt.

[1] SUBJECT CHANGE (primary motion)
- The main subject [does X], with [specific body/parts] movement, at [pace].

[2] CAMERA BEHAVIOR
- Shot framing: [close-up / medium / wide], [angle].
- Camera motion: [locked-off / slow push-in / pan left / handheld drift].

[3] ENVIRONMENT CHANGE (one only)
- Add one secondary motion: [steam rises / hair moves slightly / neon flickers], subtle.

[4] TIMING / SEQUENCE
- Start: [state]. Then: [action]. End: [state]. Smooth continuous motion.

[5] STYLE + LIGHTING
- Style reference: [cartoon / claymation / film noir on 35mm / worn VHS texture].
- Lighting: [warm even lighting / spotlight on subject], consistent exposure.

[6] AUDIO (optional)
- Ambient + SFX, or dialogue: "[exact line]".

[7] CONSTRAINTS
- Preserve identity. Keep background stable. No new objects. No text changes.

Why these lines are not arbitrary:

Framing + camera motion are called out as useful controls in the Veo guide. (https://deepmind.google/models/veo/prompt-guide/)
Lighting is explicitly steerable (warm even vs spotlight) and helps reduce ambiguity that can look like flicker. (https://deepmind.google/models/veo/prompt-guide/)
Veo can generate dialogue; prompts can provide a topic or exact lines. (https://deepmind.google/models/veo/prompt-guide/)

Worked example: same image, “wiggle” vs real animation

Assume you’re using the same portrait photo for both prompts.

Before (common beginner prompt)

A cinematic portrait of a woman with curly hair in a cozy cafe, warm tones, high quality, detailed, 35mm film look.

Typical outcome: minimal motion (micro-wiggle), because you described the still image—not the timeline.

After (using Change Lines)

[1] SUBJECT CHANGE
- She slowly smiles, blinks once naturally, and takes a calm breath (shoulders rise/fall subtly).

[2] CAMERA BEHAVIOR
- Medium close-up, eye-level.
- Slow push-in; keep her face centered.

[3] ENVIRONMENT CHANGE
- Faint steam rises from a cup in the foreground (subtle).

[4] TIMING / SEQUENCE
- Start neutral → gradual smile → blink near the end; continuous natural motion.

[5] STYLE + LIGHTING
- Film noir shot on 35mm.
- Warm even key light; consistent exposure.

[6] AUDIO
- Cafe room tone + soft cup clink.

[7] CONSTRAINTS
- Preserve identity and facial structure. Keep background stable. No new objects.

Why this version “actually animates”:

The action is measurable (smile, blink, breathing), aligning with Veo’s guidance to describe what characters are doing and what else is happening. (https://deepmind.google/models/veo/prompt-guide/)
The camera is explicit (framing + push-in), matching the guide’s emphasis on camera/shot direction. (https://deepmind.google/models/veo/prompt-guide/)
Environment change is capped at one subtle motion, which keeps the frame stable.

A 10-minute iteration loop (so each render teaches you something)

Most “prompting frustration” is really an experimentation problem: changing everything at once.

Step 1: Run a 3-line minimum prompt

Start with only:

Subject change
Camera behavior
Constraints

If motion is still weak, your subject line is not observable.

Weak: “moves naturally.”
Observable: “turns head slightly toward camera and blinks once.”

Step 2: Add one environment motion

Only after subject + camera look right, add one small environmental change (steam, hair movement, dust motes).

Step 3: Add style + lighting last

Use one style reference and a clear lighting plan. Veo’s guide provides examples like claymation, film noir on 35mm, or worn VHS texture, and lighting like warm even lighting or a spotlight. (https://deepmind.google/models/veo/prompt-guide/)

Step 4: Record changes (mini table)

Render	What you changed	Improved	Broke
01	Subject change	Motion visible	Background drift
02	Constraints	Background steadier	Smile too subtle
03	Subject change	Better smile	Blink uncanny

This makes your workflow cumulative instead of random.

5 starter prompts (copy-paste) that follow the “describe what changes” rule

Each one has: 1 primary motion, 1 camera instruction, 1 environment motion max, plus constraints.

1) Product hero shot (simple rotation)

[1] SUBJECT CHANGE
- The product rotates slightly from front view to a 3/4 view; highlights slide across the surface.

[2] CAMERA BEHAVIOR
- Close-up, slight high angle.
- Slow orbit/pan right; keep product centered.

[3] ENVIRONMENT CHANGE
- Subtle dust motes drifting in the light.

[4] TIMING / SEQUENCE
- Start front view → rotate to 3/4 → settle.

[5] STYLE + LIGHTING
- Clean studio commercial style.
- Warm even lighting + soft rim light.

[6] AUDIO
- Minimal room tone + soft whoosh.

[7] CONSTRAINTS
- No label/text changes. Background stays seamless. No extra objects.

2) Creator intro portrait (head turn + blink)

[1] SUBJECT CHANGE
- The person turns their head slightly toward camera, gives a confident half-smile, then blinks once.

[2] CAMERA BEHAVIOR
- Medium shot, eye-level.
- Slow push-in; stable framing.

[3] ENVIRONMENT CHANGE
- Very subtle hair movement like a gentle indoor fan.

[4] TIMING / SEQUENCE
- Look off-camera → turn toward lens → smile → blink near end.

[5] STYLE + LIGHTING
- Slightly worn VHS texture.
- Soft key light with a subtle spotlight on the face.

[6] AUDIO
- Light room tone.

[7] CONSTRAINTS
- Preserve identity. No face warping. Keep background fixed.

(Style and lighting examples align with the Veo prompt guide’s examples.) (https://deepmind.google/models/veo/prompt-guide/)

3) Travel establishing shot (pan + clouds)

[1] SUBJECT CHANGE
- The landscape remains still while clouds drift slowly across the sky.

[2] CAMERA BEHAVIOR
- Wide shot.
- Slow pan left across the scene.

[3] ENVIRONMENT CHANGE
- Gentle tree leaves rustle (subtle).

[4] TIMING / SEQUENCE
- Continuous slow pan; clouds drift the entire time.

[5] STYLE + LIGHTING
- Realistic.
- Golden-hour warm even lighting.

[6] AUDIO
- Wind ambience.

[7] CONSTRAINTS
- No new buildings/objects. Keep horizon stable.

4) Food shot (steam + fork lift)

[1] SUBJECT CHANGE
- Steam rises continuously; a fork lifts one bite slowly.

[2] CAMERA BEHAVIOR
- Close-up macro-style framing.
- Slow push-in; keep the dish sharp.

[3] ENVIRONMENT CHANGE
- One soft specular flicker from a candle reflection (subtle).

[4] TIMING / SEQUENCE
- Steam first → fork lift begins → fork exits frame near end.

[5] STYLE + LIGHTING
- Clean commercial food style.
- Warm spotlight on the dish; darker background.

[6] AUDIO
- Soft utensil clink + gentle sizzle.

[7] CONSTRAINTS
- Keep plating intact. No ingredient morphing. Background stable.

5) Poster/logo (animate light, not typography)

[1] SUBJECT CHANGE
- The poster stays perfectly still while a light sweep passes across it left to right.

[2] CAMERA BEHAVIOR
- Locked-off, straight-on. No camera movement.

[3] ENVIRONMENT CHANGE
- Subtle floating dust in the beam.

[4] TIMING / SEQUENCE
- Light sweep once, smooth.

[5] STYLE + LIGHTING
- Clean studio.
- Spotlight focused on the poster.

[6] AUDIO
- Soft whoosh.

[7] CONSTRAINTS
- Do not change any text or logos. No warping. No new elements.

Mid-article CTA: iterate faster (especially if you need audio)

If your workflow includes sound cues (room tone, SFX, or dialogue), Veo3Gen can help you test faster because generations include native, synchronized audio in the same pass—no separate audio step. It also supports image-to-video, text-to-video, and first-and-last-frame control on Veo 3.1, with 16:9/9:16 and 720p/1080p/4K options (4K on Fast/Quality).

Common failure modes (and the exact line to fix)

1) “It only wiggles”

Cause: no observable change.

Prompt fix: replace generic verbs with body-part verbs.

Replace: “moves slightly.”
With: “blinks once, raises eyebrows, then turns head slightly toward camera.”

Also specify camera motion (or locked-off). Veo guidance calls out shot framing and camera motion as useful. (https://deepmind.google/models/veo/prompt-guide/)

2) Rubber faces / identity drift

Cause: too much deformation + too many motions at once.

Prompt fix:

Reduce subject motion to one action.
Add constraints: “Preserve identity and facial structure.”
Keep camera simpler (locked-off or gentle push-in).

If you’re doing continuity, note that Veo 3.1 supports first/last frame input and reference-driven workflows are discussed in the Veo 3.1 prompting ecosystem. (https://replicate.com/blog/veo-3-1)

3) Drifting backgrounds / melting edges

Cause: camera move + environment move + no constraints.

Prompt fix:

Choose either a camera move or strong environment motion.
Add: “Keep background stable. No new objects.”
Specify lighting consistency (e.g., “consistent exposure”). (https://deepmind.google/models/veo/prompt-guide/)

4) Overdescribed prompts that fight themselves

Cause: conflicting style adjectives.

Prompt fix: pick one style reference. The Veo guide’s examples work best as discrete choices (e.g., claymation or film noir or worn VHS). (https://deepmind.google/models/veo/prompt-guide/)

5) Timing feels wrong (motion happens all at once)

Cause: missing temporal structure.

Prompt fix: add a 3-beat timing line:

“Start X → then Y → end Z, smooth continuous motion.”

Checklist

Input image: one clear subject, clean silhouette, simple background, some negative space
Prompt describes changes over time, not the static scene
Subject motion is observable (blink/turn/pour/lift), not “moves naturally”
Camera behavior is explicit (push-in/pan/locked-off + framing)
Only one environment motion is added (or none)
Style is a single clear reference; lighting is specified (warm even vs spotlight) (https://deepmind.google/models/veo/prompt-guide/)
Constraints protect stability: preserve identity, stable background, no new objects, no text changes

FAQ

How do I avoid the “wiggle” effect with a Veo 3.1 image to video prompt?

Give one measurable subject action (with body-part detail), specify camera behavior, and add a constraints line that protects identity and background stability.

Do I need camera terms like “pan” or “push-in”?

They help because Veo guidance explicitly recommends specifying shot framing and camera motion to steer results. (https://deepmind.google/models/veo/prompt-guide/)

How long should my prompt be?

Long enough to cover motion, camera, timing, lighting/style, and constraints—short enough that it doesn’t contradict itself. The 7 Change Lines are a practical ceiling for beginners.

Can Veo generate dialogue if I include lines in the prompt?

The Veo prompt guide states Veo can generate dialogue, and prompts can provide a topic or specify exact lines for characters to say. (https://deepmind.google/models/veo/prompt-guide/)

What if my prompt gets blocked or filtered?

The Gemini Enterprise Agent Platform guide notes Veo applies safety filters and prompts that violate Responsible AI guidelines can be blocked. (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/video/video-gen-prompt-guide)

Closing CTA: make your iterations cheaper in time, not just in hope

Once you adopt “describe what changes,” your bottleneck becomes iteration: testing motion lines, camera lines, and constraints quickly.

Veo3Gen is an affordable way to access Google’s Veo 3.1 video models without Google’s enterprise pricing, with three modes (Veo 3.1 Fast / Quality / Lite), non-expiring pay-as-you-go credits plus optional monthly plans, free credits for new users, and a developer API for programmatic generation. If you want to iterate prompts rapidly and keep audio in the same generation, start there and run the worked example with two or three single-line variations.

Start creating with Veo3Gen

Veo3Gen gives you affordable Veo 3.1 video generation with native audio, up to 4K, and credits that never expire — with free credits to start.

Generate your first video now: Get started
Compare plans and pay-as-you-go pricing: See pricing

Veo 3.1 Image-to-Video Prompts That Actually Animate (Not Just Wiggle): A Beginner Guide to "Describe What Changes"

Try Veo 3 & Veo 3 API for Free