Text-to-Video Prompts That Actually Move: The "Subject → Action → Environment → Camera" Template for Veo3Gen

Why your text-to-video looks like a slideshow (the missing “Action” layer)

A lot of first-time text-to-video prompts read like a still photo description: who/what is in frame, what it looks like, where it is. That’s a scene.

But video models need something else: change over time.

Runway’s prompting guide breaks effective prompts into two essentials: visual descriptions (what we see, where, and how it looks) and motion descriptions (how the scene moves and behaves). (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

When motion is missing—or vague—you often get “pretty but stiff” clips: a subject hovering in place, tiny accidental jitter, or random movement that doesn’t match your intent.

The fix isn’t complicated: write motion beats.

Static scene description: “A barista in a sunlit café, cinematic.”
Motion beats (video description): “A barista wipes the counter, then turns to pour steamed milk; steam drifts upward; the camera slowly pushes in.”

That second version tells the model what changes, in what direction, and what to focus on.

The 4-part Veo3Gen prompt template: Subject → Action → Environment → Camera

You can write prompts in many ways. Runway notes structure/order matters less than clarity and reducing ambiguity. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Still, beginners write faster (and iterate cleaner) with one consistent scaffold.

FlexClip suggests a formula that includes Subject + Action + Scene + (Camera Movement + Lighting + Style), and emphasizes that Action drives the storyline and should be clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

Here’s the Veo3Gen-friendly version I recommend as of 2026-02-15:

Copy/paste prompt template (fill-in-the-blanks)

[SUBJECT] — Who/what is the main focus? Be specific (age/role/object, key traits).
[ACTION] — What changes over time? Use 1–3 motion beats in sequence (do X, then Y).
[ENVIRONMENT] — Where is it? Add 2–4 concrete details (time of day, weather, props).
[CAMERA] — How do we film it? One camera move + framing + optional lens/DOF.
[STYLE + LIGHTING] — Optional: mood, lighting, and style cues (keep it short).

How to keep it model-friendly

Prioritize clarity over cleverness. Natural language often gives more control than keyword piles. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Don’t overstuff. You don’t need every component; leaving parts out can give the model room to solve details creatively. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Write like a shot. One shot is easier than “a whole commercial” in one prompt.

A plug-and-play motion library (12 actions that read well to video models)

When you’re stuck, borrow motions that are easy to visualize and easy to animate.

Micro-movements (subtle, “alive” motion)

breath visible in chest/shoulders
eyes track something off-camera
fingers tap / fidget with an object
fabric shifts as they turn

Character actions (clear, story-driving motion)

walks toward camera, then stops
turns to look over their shoulder
picks up [object], examines it, sets it down
opens [door/laptop/box], reacts with a smile

Environmental motion (movement in the world)

steam/smoke drifts upward
wind moves hair and nearby leaves
neon sign flickers; reflections ripple on wet ground
dust motes float through a sunbeam

Camera moves (direct attention without “random cuts”)

slow push-in (dolly in)
gentle handheld sway (subtle)
orbit 15–30° around the subject
tilt down from sign to subject

Tip: choose one primary motion (character or object) and one supporting motion (environment or camera). That combination often feels “cinematic” without chaos.

Camera & lens phrases that improve realism (without over-directing)

You don’t need a film-school spec sheet. Add just enough to guide framing and movement.

Simple camera recipe

Framing: close-up / medium shot / wide shot
Move: slow push-in / pan left / orbit slightly / static tripod
Focus: shallow depth of field / subject in sharp focus

Examples:

“Medium shot, slow push-in, shallow depth of field; background softly blurred.”
“Wide shot, static tripod, subject centered; gentle wind in trees.”

If you’re unsure, keep camera direction minimal. Runway recommends starting simple—focus on the most critical visual and motion components—then refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Style + lighting: how to add flavor without breaking adherence

Style is seasoning. Action is the meal.

FlexClip highlights lighting as a major driver of mood and depth, and suggests lighting descriptions should support atmosphere and emotion. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

Keep style short and concrete

Good:

“soft morning window light, warm tones”
“overcast daylight, muted colors”
“neon night lighting, high contrast reflections”

Risky (often causes drift):

“in the style of [famous director/brand]”
“hyper-detailed masterpiece, award-winning”

3 complete prompt examples

These are written to be copied directly, then customized.

1) UGC-style product demo (single shot)

Goal: feels like a creator demo, not a glossy ad.

A young adult creator’s hands holding a compact insulated water bottle with a matte finish and a simple logo.
They twist the lid open, tilt it toward the camera to show the seal, then pour ice water into a clear glass; condensation forms on the bottle.
Bright kitchen counter near a window, morning light, a cutting board and lemon slices in the background, subtle steam from a nearby mug.
Medium close-up, slight handheld sway, slow push-in as the pour starts, shallow depth of field.
Natural, casual UGC look, soft warm window light.

2) Creator talking-head B-roll alternative (no “talking head”)

Goal: channel “personal brand” with motion beats and visual hooks.

A confident solo creator in a tidy home studio with a desk, laptop, and a small LED light panel in the background.
They step into frame, pull a sticky note from the monitor, write one bold word, and place it on the laptop; they nod and smile as they sit.
Cozy studio environment, late afternoon, warm practical lamp on, subtle dust motes in the light beam.
Medium shot, static tripod, then a gentle 20° orbit to the right as they sit, subject stays in sharp focus.
Clean modern look, warm cinematic lighting, calm and motivational.

Goal: show service action + environment motion.

A friendly local plumber in a clean uniform kneeling beside a kitchen sink with a small toolbox.
They tighten a fitting under the sink, wipe their hands with a cloth, then turn the faucet on to show the water running smoothly; a small drip stops completely.
Suburban kitchen, daylight, a plant by the window, light breeze moving the curtain slightly.
Wide-to-medium framing, slow push-in from doorway toward the sink, steady camera, natural depth of field.
Bright, realistic lighting, clean and trustworthy tone.

Quick iteration loop: 3 prompts that progressively add detail (save credits)

A practical workflow is: simple → clearer motion → better camera/style.

Runway explicitly recommends starting with a simple prompt focused on the most critical visual and motion components, then adding detail to refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Iteration 1: The core shot

Subject + one action + basic environment

Iteration 2: Motion beats

Add “then…” sequence + one environmental motion

Iteration 3: Camera + lighting

Add one camera move + simple lighting mood

This keeps your intent stable while you discover what the model “hears” reliably.

Mini checklist: what to change first

If motion is ignored: rewrite Action with fewer verbs, more specific verbs, and a clear sequence (“does X, then Y”).
If the subject drifts/changes: repeat the subject identity once (“same person/object throughout”), remove extra characters/props.
If unwanted elements appear: simplify Environment and Style; remove “busy” nouns that invite new objects.
If camera becomes chaotic: specify “single continuous shot” and only one camera move.

Usually 1–3 action beats is enough for a single shot. FlexClip frames “Action” as the core driver of the storyline—so keep it clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)

If I leave out details, will the model fail?

Not necessarily. Omitting components can give the model creative freedom, which is useful when you don’t care about specifics. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)

Next step: generate at scale with Veo3Gen

Once you have a repeatable prompt template, the real advantage is consistent iteration—testing variants, swapping action beats, and producing multiple cuts for different placements.

Explore programmatic generation and automation in the Veo3Gen API: /api
See plans and usage options here: /pricing

Text-to-Video Prompts That Actually Move: The "Subject → Action → Environment → Camera" Template for Veo3Gen

Why your text-to-video looks like a slideshow (the missing “Action” layer)

The 4-part Veo3Gen prompt template: Subject → Action → Environment → Camera

Copy/paste prompt template (fill-in-the-blanks)

How to keep it model-friendly

A plug-and-play motion library (12 actions that read well to video models)

Micro-movements (subtle, “alive” motion)

Character actions (clear, story-driving motion)

Environmental motion (movement in the world)

Camera moves (direct attention without “random cuts”)

Camera & lens phrases that improve realism (without over-directing)

Simple camera recipe

Style + lighting: how to add flavor without breaking adherence

Keep style short and concrete

3 complete prompt examples

1) UGC-style product demo (single shot)

2) Creator talking-head B-roll alternative (no “talking head”)

Quick iteration loop: 3 prompts that progressively add detail (save credits)

Iteration 1: The core shot

Iteration 2: Motion beats

Iteration 3: Camera + lighting

Mini checklist: what to change first

Troubleshooting: 7 common failure modes (and quick fixes)

1) Stiff motion

2) Random cuts or scene jumps

3) The model adds extra people

4) The subject morphs mid-clip

5) Camera ignores your instruction

6) The background steals the attention

7) The vibe is right, but it feels flat

FAQ

Do I have to write prompts in this exact order?

Should I use keywords or full sentences?

How much motion should I include?

If I leave out details, will the model fail?

Next step: generate at scale with Veo3Gen

Sources

Try Veo 3 & Veo 3 API for Free