Prompt Engineering & Creative Control ·
Text-to-Video Prompts That Actually Move: The “Subject → Action → Environment → Camera” Template for Veo3Gen (as of 2026-02-15)
A beginner-friendly text to video prompt template for Veo3Gen: Subject → Action → Environment → Camera, plus a motion library, examples, and fixes.
On this page
- Why your text-to-video looks like a slideshow (the missing “Action” layer)
- The 4-part Veo3Gen prompt template: Subject → Action → Environment → Camera
- Copy/paste prompt template (fill-in-the-blanks)
- How to keep it model-friendly
- A plug-and-play motion library (12 actions that read well to video models)
- Micro-movements (subtle, “alive” motion)
- Character actions (clear, story-driving motion)
- Environmental motion (movement in the world)
- Camera moves (direct attention without “random cuts”)
- Camera & lens phrases that improve realism (without over-directing)
- Simple camera recipe
- Style + lighting: how to add flavor without breaking adherence
- Keep style short and concrete
- 3 complete prompt examples
- 1) UGC-style product demo (single shot)
- 2) Creator talking-head B-roll alternative (no “talking head”)
- 3) Small local service promo (trust-building, neighborhood vibe)
- Quick iteration loop: 3 prompts that progressively add detail (save credits)
- Iteration 1: The core shot
- Iteration 2: Motion beats
- Iteration 3: Camera + lighting
- Mini checklist: what to change first
- Troubleshooting: 7 common failure modes (and quick fixes)
- 1) Stiff motion
- 2) Random cuts or scene jumps
- 3) The model adds extra people
- 4) The subject morphs mid-clip
- 5) Camera ignores your instruction
- 6) The background steals the attention
- 7) The vibe is right, but it feels flat
- FAQ
- Do I have to write prompts in this exact order?
- Should I use keywords or full sentences?
- How much motion should I include?
- If I leave out details, will the model fail?
- Related reading
- Next step: generate at scale with Veo3Gen
- Sources
Why your text-to-video looks like a slideshow (the missing “Action” layer)
A lot of first-time text-to-video prompts read like a still photo description: who/what is in frame, what it looks like, where it is. That’s a scene.
But video models need something else: change over time.
Runway’s prompting guide breaks effective prompts into two essentials: visual descriptions (what we see, where, and how it looks) and motion descriptions (how the scene moves and behaves). (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
When motion is missing—or vague—you often get “pretty but stiff” clips: a subject hovering in place, tiny accidental jitter, or random movement that doesn’t match your intent.
The fix isn’t complicated: write motion beats.
- Static scene description: “A barista in a sunlit café, cinematic.”
- Motion beats (video description): “A barista wipes the counter, then turns to pour steamed milk; steam drifts upward; the camera slowly pushes in.”
That second version tells the model what changes, in what direction, and what to focus on.
The 4-part Veo3Gen prompt template: Subject → Action → Environment → Camera
You can write prompts in many ways. Runway notes structure/order matters less than clarity and reducing ambiguity. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Still, beginners write faster (and iterate cleaner) with one consistent scaffold.
FlexClip suggests a formula that includes Subject + Action + Scene + (Camera Movement + Lighting + Style), and emphasizes that Action drives the storyline and should be clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)
Here’s the Veo3Gen-friendly version I recommend as of 2026-02-15:
Copy/paste prompt template (fill-in-the-blanks)
[SUBJECT] — Who/what is the main focus? Be specific (age/role/object, key traits).
[ACTION] — What changes over time? Use 1–3 motion beats in sequence (do X, then Y).
[ENVIRONMENT] — Where is it? Add 2–4 concrete details (time of day, weather, props).
[CAMERA] — How do we film it? One camera move + framing + optional lens/DOF.
[STYLE + LIGHTING] — Optional: mood, lighting, and style cues (keep it short).
How to keep it model-friendly
- Prioritize clarity over cleverness. Natural language often gives more control than keyword piles. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
- Don’t overstuff. You don’t need every component; leaving parts out can give the model room to solve details creatively. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
- Write like a shot. One shot is easier than “a whole commercial” in one prompt.
A plug-and-play motion library (12 actions that read well to video models)
When you’re stuck, borrow motions that are easy to visualize and easy to animate.
Micro-movements (subtle, “alive” motion)
- breath visible in chest/shoulders
- eyes track something off-camera
- fingers tap / fidget with an object
- fabric shifts as they turn
Character actions (clear, story-driving motion)
- walks toward camera, then stops
- turns to look over their shoulder
- picks up [object], examines it, sets it down
- opens [door/laptop/box], reacts with a smile
Environmental motion (movement in the world)
- steam/smoke drifts upward
- wind moves hair and nearby leaves
- neon sign flickers; reflections ripple on wet ground
- dust motes float through a sunbeam
Camera moves (direct attention without “random cuts”)
- slow push-in (dolly in)
- gentle handheld sway (subtle)
- orbit 15–30° around the subject
- tilt down from sign to subject
Tip: choose one primary motion (character or object) and one supporting motion (environment or camera). That combination often feels “cinematic” without chaos.
Camera & lens phrases that improve realism (without over-directing)
You don’t need a film-school spec sheet. Add just enough to guide framing and movement.
Simple camera recipe
- Framing: close-up / medium shot / wide shot
- Move: slow push-in / pan left / orbit slightly / static tripod
- Focus: shallow depth of field / subject in sharp focus
Examples:
- “Medium shot, slow push-in, shallow depth of field; background softly blurred.”
- “Wide shot, static tripod, subject centered; gentle wind in trees.”
If you’re unsure, keep camera direction minimal. Runway recommends starting simple—focus on the most critical visual and motion components—then refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Style + lighting: how to add flavor without breaking adherence
Style is seasoning. Action is the meal.
FlexClip highlights lighting as a major driver of mood and depth, and suggests lighting descriptions should support atmosphere and emotion. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)
Keep style short and concrete
Good:
- “soft morning window light, warm tones”
- “overcast daylight, muted colors”
- “neon night lighting, high contrast reflections”
Risky (often causes drift):
- “in the style of [famous director/brand]”
- “hyper-detailed masterpiece, award-winning”
3 complete prompt examples
These are written to be copied directly, then customized.
1) UGC-style product demo (single shot)
Goal: feels like a creator demo, not a glossy ad.
A young adult creator’s hands holding a compact insulated water bottle with a matte finish and a simple logo.
They twist the lid open, tilt it toward the camera to show the seal, then pour ice water into a clear glass; condensation forms on the bottle.
Bright kitchen counter near a window, morning light, a cutting board and lemon slices in the background, subtle steam from a nearby mug.
Medium close-up, slight handheld sway, slow push-in as the pour starts, shallow depth of field.
Natural, casual UGC look, soft warm window light.
2) Creator talking-head B-roll alternative (no “talking head”)
Goal: channel “personal brand” with motion beats and visual hooks.
A confident solo creator in a tidy home studio with a desk, laptop, and a small LED light panel in the background.
They step into frame, pull a sticky note from the monitor, write one bold word, and place it on the laptop; they nod and smile as they sit.
Cozy studio environment, late afternoon, warm practical lamp on, subtle dust motes in the light beam.
Medium shot, static tripod, then a gentle 20° orbit to the right as they sit, subject stays in sharp focus.
Clean modern look, warm cinematic lighting, calm and motivational.
3) Small local service promo (trust-building, neighborhood vibe)
Goal: show service action + environment motion.
A friendly local plumber in a clean uniform kneeling beside a kitchen sink with a small toolbox.
They tighten a fitting under the sink, wipe their hands with a cloth, then turn the faucet on to show the water running smoothly; a small drip stops completely.
Suburban kitchen, daylight, a plant by the window, light breeze moving the curtain slightly.
Wide-to-medium framing, slow push-in from doorway toward the sink, steady camera, natural depth of field.
Bright, realistic lighting, clean and trustworthy tone.
Quick iteration loop: 3 prompts that progressively add detail (save credits)
A practical workflow is: simple → clearer motion → better camera/style.
Runway explicitly recommends starting with a simple prompt focused on the most critical visual and motion components, then adding detail to refine. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Iteration 1: The core shot
- Subject + one action + basic environment
Iteration 2: Motion beats
- Add “then…” sequence + one environmental motion
Iteration 3: Camera + lighting
- Add one camera move + simple lighting mood
This keeps your intent stable while you discover what the model “hears” reliably.
Mini checklist: what to change first
- If motion is ignored: rewrite Action with fewer verbs, more specific verbs, and a clear sequence (“does X, then Y”).
- If the subject drifts/changes: repeat the subject identity once (“same person/object throughout”), remove extra characters/props.
- If unwanted elements appear: simplify Environment and Style; remove “busy” nouns that invite new objects.
- If camera becomes chaotic: specify “single continuous shot” and only one camera move.
Troubleshooting: 7 common failure modes (and quick fixes)
1) Stiff motion
Fix: swap generic verbs (“moving,” “doing stuff”) for visible actions (pour, twist, step, turn). Keep it to 1–3 beats.
2) Random cuts or scene jumps
Fix: say “single continuous shot” and remove multi-location cues.
3) The model adds extra people
Fix: specify “one person only” and reduce crowd-like environment words (festival, party, busy street).
4) The subject morphs mid-clip
Fix: restate stable identifiers (color, clothing, shape) and avoid competing descriptors.
5) Camera ignores your instruction
Fix: use simpler camera language and fewer simultaneous moves. Natural language tends to give more control than keyword stacks. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
6) The background steals the attention
Fix: explicitly assign focus: “subject in sharp focus, background softly blurred.”
7) The vibe is right, but it feels flat
Fix: add one environmental motion (steam, wind, reflections) plus lighting mood. Lighting can strongly affect mood and depth. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)
FAQ
Do I have to write prompts in this exact order?
No. Clear intent matters more than strict ordering; reducing ambiguity is the priority. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Should I use keywords or full sentences?
Both can work, but natural language often provides more control. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
How much motion should I include?
Usually 1–3 action beats is enough for a single shot. FlexClip frames “Action” as the core driver of the storyline—so keep it clear and concise. (https://help.flexclip.com/en/articles/10326783-how-to-write-effective-text-prompts-to-generate-ai-videos)
If I leave out details, will the model fail?
Not necessarily. Omitting components can give the model creative freedom, which is useful when you don’t care about specifics. (https://help.runwayml.com/hc/en-us/articles/47313737321107-Text-to-Video-Prompting-Guide)
Related reading
Next step: generate at scale with Veo3Gen
Once you have a repeatable prompt template, the real advantage is consistent iteration—testing variants, swapping action beats, and producing multiple cuts for different placements.
- Explore programmatic generation and automation in the Veo3Gen API: /api
- See plans and usage options here: /pricing
Sources
Try Veo 3 & Veo 3 API for Free
Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.