The problem: your prompt sounds descriptive—but the model can’t “direct” it

A lot of Veo prompts read like moodboards: “cinematic, cool lighting, stylish product shot, nice camera movement.” They’re descriptive, but they don’t direct.

Veo 3.1 is positioned as moving from simple generation toward creative control, with professional-grade controls and rich synchronous audio—so it rewards prompts that behave like a mini shot plan, not a paragraph of vibes. (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1)

This post isn’t another “magic template.” It’s a 10-point rubric you can run in ~60 seconds to diagnose why a shot fails (weak motion, random reframes, style mud, audio mismatch) and decide the single line to add next.

The Veo 3.1 Prompt Rubric (10 points) — score your prompt in 60 seconds

How scoring works:

Each item is 0–2 points.
- 0 = missing
- 1 = present but vague
- 2 = clear and constrained
Aim for 16+ / 20 before you generate.

The rubric aligns to common prompt layers—camera/lens, subject, action & physics, environment, lighting, style/texture, and audio—similar to the repeatable layered approach many creators use. (https://invideo.io/blog/google-veo-prompt-guide/)

Rubric #1–3: Subject clarity, scene constraints, and “what must stay the same”

1) Subject ID (0–2)

What good looks like: Who/what is on screen, with identifying traits.

Symptom when missing: The “main subject” changes (different person/product), or the model chooses a different hero object.
Fix line to paste: Subject: one [age/gender/role] with [2–3 defining features]; one hero object: [exact product name/color].

2) Scene & environment constraints (0–2)

What good looks like: Place, time-of-day, and a few anchored details.

Symptom when missing: Background drifts (random locations), props appear/disappear.
Fix line to paste: Setting: [specific place], [time of day], with fixed elements: [3 anchors—e.g., oak table, north window, white tile wall].

3) Continuity “must not change” constraints (0–2)

What good looks like: Explicit invariants: wardrobe, logo orientation, object placement.

Symptom when missing: Logos warp, outfits swap, the cup becomes a bottle, etc.
Fix line to paste: Continuity: keep [wardrobe/object colors/logo placement] unchanged; do not change subject identity; no new objects introduced.

Rubric #4–6: Action specificity, motion realism, and timing (0–12s)

4) Action verbs + objective (0–2)

What good looks like: A concrete action with an intention.

Symptom when missing: “Weak motion”—micro-gestures, idle swaying, or a static tableau.
Fix line to paste: Action: [subject] performs [clear verb sequence] to achieve [goal].

5) Physics & interaction detail (0–2)

What good looks like: What touches what; resistance, weight, cause/effect.

Symptom when missing: Floaty hands, objects clipping, unmotivated movement.
Fix line to paste: Physics: hands make firm contact; [object] has visible weight; motion follows real inertia; no impossible bending.

6) Timing & beat map (0–2)

What good looks like: A simple timeline (even for short clips). If you’re targeting a 0–12s clip, give it beats.

Symptom when missing: The “moment” happens too early/late, or nothing resolves.
Fix line to paste: Timing (0–12s): 0–3s establish; 3–8s action; 8–12s payoff/hold on hero frame.

Rubric #7–8: Camera language that prevents random reframes

Veo prompts often improve when you specify composition (single shot, two shot, over-the-shoulder) and framing—because you’re telling the model what the audience should see. (https://replicate.com/blog/veo-3-1)

7) Shot type + composition lock (0–2)

What good looks like: The shot name + what must remain in frame.

Symptom when missing: Random reframes, surprise close-ups, subject cropped strangely.
Fix line to paste: Composition: [single shot/two shot/OTS], [waist-up/close-up/wide], keep [subject + hero object] centered and fully visible.

8) Camera movement + lens (0–2)

What good looks like: One camera move (not three) and a lens/feel.

Symptom when missing: Camera jitters, drifts, or “teleports” between angles.
Fix line to paste: Camera: [slow dolly-in OR static tripod], lens: [24mm/35mm/50mm look], no cuts, no zoom unless specified.

Rubric #9: Style + lighting without conflicts (avoid muddy blends)

9) Style/texture + lighting coherence (0–2)

What good looks like: One visual direction with compatible lighting.

Symptom when missing: Muddy hybrid aesthetics (e.g., “anime photoreal oil painting”), inconsistent lighting across frames.
Fix line to paste: Look: [one style], texture: [one finish], Lighting: [single key source + mood]; avoid mixing multiple art styles.

If you like structured prompting, the “master template” approach—camera/lens → subject → action/physics → setting → lighting → style/texture → audio—helps you spot conflicts quickly. (https://invideo.io/blog/google-veo-prompt-guide/)

Rubric #10: Audio alignment (dialogue, SFX, music) without fighting the visuals

Veo 3.1 is described as having rich synchronous audio, so it’s worth prompting audio as deliberately as visuals. (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1)

10) Audio plan: who/what/when (0–2)

What good looks like: Audio sources that match on-screen events and timing.

Symptom when missing: Audio doesn’t match the action (wrong SFX, awkward timing), or the vibe fights the visuals.
Fix line to paste: Audio: [ambience], SFX synced to actions ([list]), music [tempo/mood], dialogue (if any): “...” in [tone], mixed under -[level] vs SFX.

Note: A Google Cloud quote highlights lifelike lip-sync as part of Veo 3.1’s perceived quality in real use; if you rely on dialogue, be extra explicit about when the line happens and what the mouth should be doing. (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1)

3 worked examples (bad → scored → fixed)

Example 1: UGC-style product demo (skincare)

Bad prompt:

“A creator shows a skincare serum in a bathroom, aesthetic, cinematic, nice lighting, satisfying.”

Quick score (out of 20): 1)0 2)1 3)0 4)1 5)0 6)0 7)0 8)0 9)1 10)0 → 3/20

Fixed prompt (single cohesive version):

Composition: single shot, waist-up, keep creator and bottle fully visible, bottle label facing camera.
Camera: static tripod, 35mm look, no cuts.
Subject: one creator with tied-back hair, neutral robe; hero object: 30ml amber serum bottle with white dropper.
Setting: small modern bathroom, morning light, fixed anchors: round mirror, white tile wall, oak counter.
Action: she picks up the bottle, unscrews dropper, dispenses 3 drops onto fingertips, applies to cheeks, then holds bottle to camera for 2-second hero hold.
Physics: visible liquid drops; fingers press skin naturally; bottle remains upright.
Timing (0–12s): 0–3s establish; 3–9s dispense + apply; 9–12s hero hold.
Lighting: soft window key from the left, gentle shadow, clean highlights on glass.
Look: natural UGC, realistic skin texture, minimal color grading.
Audio: bathroom room tone, subtle bottle cap twist SFX synced, quiet upbeat music under, no dialogue.

Example 2: Creator B-roll shot (coffee shop laptop)

Bad prompt:

“B-roll of someone working in a cozy cafe, cinematic bokeh, smooth camera movement.”

What breaks: weak objective (action), drifting camera, random inserts.

Fixed prompt (focus on Action + Camera + Constraints):

Composition: over-the-shoulder shot; keep laptop screen edge and hands in frame.
Camera: slow dolly-in only, 50mm look, no zoom, no cuts.
Subject: one person in denim jacket, hands typing.
Setting: cozy cafe, rainy window behind, anchors: ceramic mug on right, notebook on left, warm pendant light overhead.
Action: continuous typing, then deliberate pause to underline one sentence in notebook, then sip from mug.
Physics: steam rises from mug; sip motion is natural; pen contacts paper.
Timing (0–12s): 0–6s typing; 6–9s underline; 9–12s sip + hold.
Lighting/Look: warm tungsten interior with cool window fill; realistic filmic contrast.
Audio: cafe ambience, soft rain on window, mug set-down clink synced.

Example 3: Talking character with synced audio + SFX (announcement)

Bad prompt:

“A person talks to camera announcing a new feature, with background music.”

Common failure: the line lands, but mouth timing and SFX cues feel off.

Fixed prompt (make audio and beats explicit):

Composition: single shot, chest-up, eye-level, keep face centered.
Camera: static tripod, 35mm look, no reframing.
Subject: one spokesperson, clear facial features, calm expression.
Setting: simple studio backdrop, soft gradient, anchors: small desk mic at frame bottom.
Timing (0–12s): 0–2s silent breath + smile; 2–9s speaks line; 9–12s pause + nod.
Dialogue (2–9s): “Today we’re launching the update you’ve been waiting for.” spoken clearly, upbeat but controlled.
SFX: subtle UI chime exactly at 7.5s (as the word “launching” is said).
Music: light pulse bed, low volume under dialogue; no overpowering bass.
Lighting/Look: soft key, clean catchlights, realistic texture.

A one-change-at-a-time iteration loop (use the rubric without chasing your tail)

When a generation misses, resist rewriting everything. Use the rubric like a debugging checklist:

Score the output against the 10 items.
Identify the lowest-scoring single item that would most improve the shot.
Add one fix line (from that item).
Re-generate variants, keeping the rest stable.

If your workflow supports references, note that Veo 3.1 prompting guidance mentions character reference images and first/last frame input, and also describes “reference to video” generation using up to three reference images to guide a coherent scene. (https://replicate.com/blog/veo-3-1)

Printable mini-checklist: the 10 lines to paste into your next prompt

FAQ

Does Veo 3.1 support audio prompts?

A Google Cloud post describes Veo 3.1 as having rich synchronous audio, so specifying dialogue/SFX/ambience is a reasonable best practice. (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1)

Should I describe shot types like “two shot” or “over-the-shoulder”?

Yes—Veo 3.1 prompting guidance recommends specifying composition/framing and even uses terms like single shot, two shot, and over-the-shoulder shot. (https://replicate.com/blog/veo-3-1)

How do I reduce prompt drift?

Add explicit constraints: continuity (“must not change”), fixed environment anchors, and composition locks. Then iterate by changing one rubric line at a time.

Is Veo 3.1 production-ready?

A Google Cloud blog post states Veo 3.1 is stable and generally available for production on Vertex AI. (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1)

If you only change one thing

Prioritize Action + Camera + Constraints:

Action: give a verb sequence with a payoff.
Camera: pick one shot type and one movement.
Constraints: state what must stay the same (identity, hero object, logo, anchors).

Those three fixes eliminate the majority of “weak motion,” “vague shots,” and “random reframes” before you spend another generation.

CTA: Generate, evaluate, iterate (with tooling that fits your workflow)

If you’re building an app or pipeline around repeatable prompt testing, explore the Veo3Gen API to programmatically run variants and store your rubric scores alongside outputs: /api.

For teams scaling creators, campaigns, or batch video generation, see plans and throughput options here: /pricing.

Try Veo3Gen (Affordable Veo 3.1 Access)

If you want to turn these tips into real clips today, try Veo3Gen:

Start generating via the API: /api
See plans and pricing: /pricing

Veo 3.1 Prompting “Rubric”: A 10-Point Self-Check to Fix Weak Motion, Vague Shots, and Audio Mismatch (as of 2026-03-08)

The problem: your prompt sounds descriptive—but the model can’t “direct” it

The Veo 3.1 Prompt Rubric (10 points) — score your prompt in 60 seconds

Rubric #1–3: Subject clarity, scene constraints, and “what must stay the same”

1) Subject ID (0–2)

2) Scene & environment constraints (0–2)

3) Continuity “must not change” constraints (0–2)

Rubric #4–6: Action specificity, motion realism, and timing (0–12s)

4) Action verbs + objective (0–2)

5) Physics & interaction detail (0–2)

6) Timing & beat map (0–2)

Rubric #7–8: Camera language that prevents random reframes

7) Shot type + composition lock (0–2)

8) Camera movement + lens (0–2)

Rubric #9: Style + lighting without conflicts (avoid muddy blends)

9) Style/texture + lighting coherence (0–2)

Rubric #10: Audio alignment (dialogue, SFX, music) without fighting the visuals

10) Audio plan: who/what/when (0–2)

3 worked examples (bad → scored → fixed)

Example 1: UGC-style product demo (skincare)

Example 2: Creator B-roll shot (coffee shop laptop)

Example 3: Talking character with synced audio + SFX (announcement)

A one-change-at-a-time iteration loop (use the rubric without chasing your tail)

Printable mini-checklist: the 10 lines to paste into your next prompt

FAQ

Does Veo 3.1 support audio prompts?

Should I describe shot types like “two shot” or “over-the-shoulder”?

How do I reduce prompt drift?

Is Veo 3.1 production-ready?

If you only change one thing

CTA: Generate, evaluate, iterate (with tooling that fits your workflow)

Try Veo3Gen (Affordable Veo 3.1 Access)

Try Veo 3 & Veo 3 API for Free