Prompt Engineering & Creative Control ·

Text-to-Video Prompts That Actually Move: The “Subject → Action → Environment → Camera” Template for Veo3Gen (as of 2026-02-15)

A beginner-friendly text to video prompt template for Veo3Gen: Subject → Action → Environment → Camera, plus a motion library, examples, and fixes.

On this page

Why your text-to-video looks like a slideshow (the missing “Action” layer)

A lot of first-time text-to-video prompts read like a still photo description: who/what is in frame, what it looks like, where it is. That’s a scene.

But video models need something else: change over time.

Runway’s prompting guide breaks effective prompts into two essentials: visual descriptions (what we see, where, and how it looks) and motion descriptions (how the scene moves and behaves). (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

When motion is missing—or vague—you often get “pretty but stiff” clips: a subject hovering in place, tiny accidental jitter, or random movement that doesn’t match your intent.

The fix isn’t complicated: write motion beats.

  • Static scene description: “A barista in a sunlit café, cinematic.”
  • Motion beats (video description): “A barista wipes the counter, then turns to pour steamed milk; steam drifts upward; the camera slowly pushes in.”

That second version tells the model what changes, in what direction, and what to focus on.

The 4-part Veo3Gen prompt template: Subject → Action → Environment → Camera

You can write prompts in many ways. Runway notes structure/order matters less than clarity and reducing ambiguity. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Still, beginners write faster (and iterate cleaner) with one consistent scaffold.

FlexClip suggests a formula that includes Subject + Action + Scene + (Camera Movement + Lighting + Style), and emphasizes that Action drives the storyline and should be clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

Here’s the Veo3Gen-friendly version I recommend as of 2026-02-15:

Copy/paste prompt template (fill-in-the-blanks)

[SUBJECT] — Who/what is the main focus? Be specific (age/role/object, key traits).
[ACTION] — What changes over time? Use 1–3 motion beats in sequence (do X, then Y).
[ENVIRONMENT] — Where is it? Add 2–4 concrete details (time of day, weather, props).
[CAMERA] — How do we film it? One camera move + framing + optional lens/DOF.
[STYLE + LIGHTING] — Optional: mood, lighting, and style cues (keep it short).

How to keep it model-friendly

A plug-and-play motion library (12 actions that read well to video models)

When you’re stuck, borrow motions that are easy to visualize and easy to animate.

Micro-movements (subtle, “alive” motion)

  1. breath visible in chest/shoulders
  2. eyes track something off-camera
  3. fingers tap / fidget with an object
  4. fabric shifts as they turn

Character actions (clear, story-driving motion)

  1. walks toward camera, then stops
  2. turns to look over their shoulder
  3. picks up [object], examines it, sets it down
  4. opens [door/laptop/box], reacts with a smile

Environmental motion (movement in the world)

  1. steam/smoke drifts upward
  2. wind moves hair and nearby leaves
  3. neon sign flickers; reflections ripple on wet ground
  4. dust motes float through a sunbeam

Camera moves (direct attention without “random cuts”)

  1. slow push-in (dolly in)
  2. gentle handheld sway (subtle)
  3. orbit 15–30° around the subject
  4. tilt down from sign to subject

Tip: choose one primary motion (character or object) and one supporting motion (environment or camera). That combination often feels “cinematic” without chaos.

Camera & lens phrases that improve realism (without over-directing)

You don’t need a film-school spec sheet. Add just enough to guide framing and movement.

Simple camera recipe

  • Framing: close-up / medium shot / wide shot
  • Move: slow push-in / pan left / orbit slightly / static tripod
  • Focus: shallow depth of field / subject in sharp focus

Examples:

  • Medium shot, slow push-in, shallow depth of field; background softly blurred.”
  • Wide shot, static tripod, subject centered; gentle wind in trees.”

If you’re unsure, keep camera direction minimal. Runway recommends starting simple—focus on the most critical visual and motion components—then refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Style + lighting: how to add flavor without breaking adherence

Style is seasoning. Action is the meal.

FlexClip highlights lighting as a major driver of mood and depth, and suggests lighting descriptions should support atmosphere and emotion. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

Keep style short and concrete

Good:

  • “soft morning window light, warm tones”
  • “overcast daylight, muted colors”
  • “neon night lighting, high contrast reflections”

Risky (often causes drift):

  • “in the style of [famous director/brand]”
  • “hyper-detailed masterpiece, award-winning”

3 complete prompt examples

These are written to be copied directly, then customized.

1) UGC-style product demo (single shot)

Goal: feels like a creator demo, not a glossy ad.

A young adult creator’s hands holding a compact insulated water bottle with a matte finish and a simple logo.
They twist the lid open, tilt it toward the camera to show the seal, then pour ice water into a clear glass; condensation forms on the bottle.
Bright kitchen counter near a window, morning light, a cutting board and lemon slices in the background, subtle steam from a nearby mug.
Medium close-up, slight handheld sway, slow push-in as the pour starts, shallow depth of field.
Natural, casual UGC look, soft warm window light.

2) Creator talking-head B-roll alternative (no “talking head”)

Goal: channel “personal brand” with motion beats and visual hooks.

A confident solo creator in a tidy home studio with a desk, laptop, and a small LED light panel in the background.
They step into frame, pull a sticky note from the monitor, write one bold word, and place it on the laptop; they nod and smile as they sit.
Cozy studio environment, late afternoon, warm practical lamp on, subtle dust motes in the light beam.
Medium shot, static tripod, then a gentle 20° orbit to the right as they sit, subject stays in sharp focus.
Clean modern look, warm cinematic lighting, calm and motivational.

3) Small local service promo (trust-building, neighborhood vibe)

Goal: show service action + environment motion.

A friendly local plumber in a clean uniform kneeling beside a kitchen sink with a small toolbox.
They tighten a fitting under the sink, wipe their hands with a cloth, then turn the faucet on to show the water running smoothly; a small drip stops completely.
Suburban kitchen, daylight, a plant by the window, light breeze moving the curtain slightly.
Wide-to-medium framing, slow push-in from doorway toward the sink, steady camera, natural depth of field.
Bright, realistic lighting, clean and trustworthy tone.

Quick iteration loop: 3 prompts that progressively add detail (save credits)

A practical workflow is: simple → clearer motion → better camera/style.

Runway explicitly recommends starting with a simple prompt focused on the most critical visual and motion components, then adding detail to refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Iteration 1: The core shot

  • Subject + one action + basic environment

Iteration 2: Motion beats

  • Add “then…” sequence + one environmental motion

Iteration 3: Camera + lighting

  • Add one camera move + simple lighting mood

This keeps your intent stable while you discover what the model “hears” reliably.

Mini checklist: what to change first

  • If motion is ignored: rewrite Action with fewer verbs, more specific verbs, and a clear sequence (“does X, then Y”).
  • If the subject drifts/changes: repeat the subject identity once (“same person/object throughout”), remove extra characters/props.
  • If unwanted elements appear: simplify Environment and Style; remove “busy” nouns that invite new objects.
  • If camera becomes chaotic: specify “single continuous shot” and only one camera move.

Troubleshooting: 7 common failure modes (and quick fixes)

1) Stiff motion

Fix: swap generic verbs (“moving,” “doing stuff”) for visible actions (pour, twist, step, turn). Keep it to 1–3 beats.

2) Random cuts or scene jumps

Fix: say “single continuous shot” and remove multi-location cues.

3) The model adds extra people

Fix: specify “one person only” and reduce crowd-like environment words (festival, party, busy street).

4) The subject morphs mid-clip

Fix: restate stable identifiers (color, clothing, shape) and avoid competing descriptors.

5) Camera ignores your instruction

Fix: use simpler camera language and fewer simultaneous moves. Natural language tends to give more control than keyword stacks. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

6) The background steals the attention

Fix: explicitly assign focus: “subject in sharp focus, background softly blurred.”

7) The vibe is right, but it feels flat

Fix: add one environmental motion (steam, wind, reflections) plus lighting mood. Lighting can strongly affect mood and depth. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

FAQ

Do I have to write prompts in this exact order?

No. Clear intent matters more than strict ordering; reducing ambiguity is the priority. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Should I use keywords or full sentences?

Both can work, but natural language often provides more control. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

How much motion should I include?

Usually 1–3 action beats is enough for a single shot. FlexClip frames “Action” as the core driver of the storyline—so keep it clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

If I leave out details, will the model fail?

Not necessarily. Omitting components can give the model creative freedom, which is useful when you don’t care about specifics. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Next step: generate at scale with Veo3Gen

Once you have a repeatable prompt template, the real advantage is consistent iteration—testing variants, swapping action beats, and producing multiple cuts for different placements.

  • Explore programmatic generation and automation in the Veo3Gen API: /api
  • See plans and usage options here: /pricing

Sources

Limited Time Offer

Try Veo 3 & Veo 3 API for Free

Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.