Prompt Engineering & Creative Control ·
Sora 2’s “Cinematography Block” Prompt Format (Veo3Gen Edition): Copy This Shot Spec for Cleaner Camera Moves + Better Beats (as of 2026-04-13)
Copy/paste a Sora 2–style cinematography block into a Veo3Gen shot spec to get cleaner camera moves, clearer beats, and better audio sync.
On this page
- Why most “cinematic prompts” fail: you describe vibes, not shots
- The Sora 2 prompt anatomy: Cinematography, Mood, Actions, Dialogue
- Cinematography (how it’s filmed)
- Mood (how it feels)
- Actions (what happens on-screen)
- Dialogue / Audio (what we hear)
- Veo3Gen “Shot Spec” template (copy/paste)
- Quick checklist (before you generate)
- How to write the Cinematography block (framing, lens feel, camera path, constraints)
- Framing + subject priority
- Lens feel (avoid brand names; describe characteristics)
- Camera path: start → end
- Constraints (continuity rules)
- How to write the Actions block: 3-beat staging that fits a short clip
- Beat budget rule of thumb
- Dialogue block (optional): when to include it, and how to keep it syncable
- 3 filled examples (copyable Shot Specs)
- Example 1 — Product demo (no dialogue; sound-only)
- Example 2 — Talking-head hook (short dialogue line)
- Example 3 — Mini narrative beat (dynamic camera; no spoken dialogue)
- Common failure modes + fixes (symptom → cause → rewrite)
- A repeatable mini-workflow: turning one script sentence into 3 Shot Specs
- Step 1 — Split into micro-beats
- Step 2 — Assign each beat to its own clip (3 Shot Specs)
- Step 3 — Keep each shot internally simple
- FAQ
- What’s the single biggest upgrade I can make to my prompts?
- Should I write resolution and duration inside the prompt text?
- How long can Sora 2 videos be?
- Can I reuse a character consistently across clips?
- Related reading
- Build with Veo3Gen (CTA)
Most “cinematic prompts” fail for the same reason storyboards fail when they’re just moodboards: you described vibes, not shots. If you want camera language (push-ins, pans, racks, handheld energy) and you want on-screen action to land in the right order, you need a structure that tells the model what happens, how it’s filmed, and what we hear.
Sora 2’s official prompting guidance leans into that structure—and it’s been updated to reflect newer API capabilities like character references, higher-resolution exports, longer videos, video extension, and batch workflows. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
Below is a Veo3Gen-friendly, copy/paste “Shot Spec” that mirrors the same idea: separate cinematography from mood from actions from dialogue/audio. Use it once per clip, and you’ll get cleaner camera moves and far fewer “everything happens at once” generations.
Important: container settings (duration, resolution, etc.) must be set explicitly in your tool or API parameters. The Sora 2 guide notes that some attributes are governed only by API parameters and can’t be reliably requested in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
Why most “cinematic prompts” fail: you describe vibes, not shots
A prompt like “cinematic, moody, dramatic lighting, 35mm film” can help aesthetics—but it often doesn’t dictate blocking or timing. The model may:
- ignore the camera move (because it wasn’t constrained)
- compress multiple actions into a single muddle
- miss your intended beat order
- give audio that doesn’t match the moment you pictured
A better approach is to write one shot at a time, with a tiny choreography inside that shot.
The Sora 2 prompt anatomy: Cinematography, Mood, Actions, Dialogue
Well-organized prompts with clear sections for what happens, how it looks, and what we hear tend to work better. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)
Sora 2 is also described as having strong cinematography literacy, so specific filmmaking terminology can help control how the scene unfolds. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)
Here’s what each block is “for”:
Cinematography (how it’s filmed)
Use this to specify framing, lens feel, camera path, stabilization style, focus behavior, and what must stay consistent.
Mood (how it feels)
Color palette, lighting direction, pacing/energy, and emotional tone.
Actions (what happens on-screen)
The ordered beats. Think staging and micro-blocking.
Dialogue / Audio (what we hear)
Sora 2 is described as generating audio natively, and guidance suggests requesting sound elements that sync with visuals. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/) WhyTryAI similarly describes Sora 2 as generating music, dialogue, and sound effects. (https://www.whytryai.com/p/ways-to-prompt-sora-2-movies)
Even if you’re not using Sora directly, writing audio intentionally helps you keep the scene coherent.
Veo3Gen “Shot Spec” template (copy/paste)
Paste this per clip. Keep each block short, specific, and testable.
SHOT SPEC
CINEMATOGRAPHY:
- Shot size & framing:
- Lens feel (not brand):
- Camera path (start → end):
- Focus behavior:
- Stabilization style:
- Composition rules / continuity constraints:
MOOD:
- Lighting:
- Color palette:
- Pace / energy:
ACTIONS (3-beat staging):
1)
2)
3)
DIALOGUE / AUDIO (optional):
- Dialogue (keep to one line max):
- SFX:
- Music:
NEGATIVE CONSTRAINTS (optional):
- Avoid:
Quick checklist (before you generate)
- Did you set duration + resolution in parameters (not in prose)? The Sora guide warns some attributes are only parameter-controlled. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
- Is the camera move described as start → end?
- Do you have exactly 3 action beats (or fewer) for short clips?
- Did you include at most one line of dialogue?
How to write the Cinematography block (framing, lens feel, camera path, constraints)
Treat cinematography like a mini contract:
Framing + subject priority
Write what the viewer must understand instantly.
- “Medium close-up, eyes centered, product in lower-right foreground.”
Lens feel (avoid brand names; describe characteristics)
Instead of “Cooke lens,” say:
- “Natural perspective, mild background compression, gentle falloff.”
Camera path: start → end
If you want a push-in, define where it begins and ends.
- “Start: waist-up. End: tight close-up on hands.”
Constraints (continuity rules)
This is where you prevent chaos:
- “No jump cuts. Keep character facing camera. Keep horizon level.”
How to write the Actions block: 3-beat staging that fits a short clip
If your clip is short, your beat count must be short.
Beat budget rule of thumb
For most short generations, aim for 2–3 beats per shot. If you cram in 7 beats, the model often merges them, reorders them, or drops critical steps.
Write beats as observable actions:
- bad: “She becomes inspired.”
- better: “She pauses, notices the scratch, smiles, starts polishing.”
Dialogue block (optional): when to include it, and how to keep it syncable
Use dialogue when the spoken line is the point of the clip (hook, punchline, CTA). Otherwise, prefer SFX + music.
Sora 2 is described as generating audio natively, so requesting sound that matches visuals can help. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/) Keep dialogue short—one line—so it has a chance to land cleanly.
3 filled examples (copyable Shot Specs)
Each example includes a distinct camera move and 3 action beats.
Example 1 — Product demo (no dialogue; sound-only)
SHOT SPEC
CINEMATOGRAPHY:
- Shot size & framing: tabletop close-up; product centered; hands enter from frame left
- Lens feel (not brand): macro-ish detail, shallow depth of field, crisp highlights
- Camera path (start → end): slow slider move left→right, 20–30cm, constant speed
- Focus behavior: start on logo, then subtle rack focus to the cap as it opens
- Stabilization style: smooth, controlled
- Composition rules / continuity constraints: keep product fully in frame; no sudden zooms
MOOD:
- Lighting: soft top light + gentle rim light
- Color palette: clean neutrals, minimal clutter
- Pace / energy: confident, precise
ACTIONS (3-beat staging):
1) Hand places the product down; label faces camera.
2) Second hand twists the cap open; a small puff of vapor escapes.
3) Hand tilts product slightly to catch the light; cap clicks shut.
DIALOGUE / AUDIO (optional):
- SFX: subtle plastic click, soft whoosh of vapor
- Music: light modern pulse, low volume
NEGATIVE CONSTRAINTS (optional):
- Avoid: extra hands, warped labels, unreadable logo
Example 2 — Talking-head hook (short dialogue line)
SHOT SPEC
CINEMATOGRAPHY:
- Shot size & framing: medium close-up, subject centered, shoulders visible
- Lens feel (not brand): natural perspective, gentle background blur
- Camera path (start → end): slow push-in from medium close-up → tighter close-up
- Focus behavior: locked on eyes
- Stabilization style: tripod-stable
- Composition rules / continuity constraints: keep eye-line to lens; no scene cuts
MOOD:
- Lighting: soft key + subtle practical lamp in background
- Color palette: warm skin tones, calm background
- Pace / energy: direct, punchy
ACTIONS (3-beat staging):
1) Subject leans in slightly as the push-in begins.
2) Raises one finger to emphasize the point.
3) Quick half-smile at the end.
DIALOGUE / AUDIO (optional):
- Dialogue (keep to one line max): “Here’s the one shot detail most prompts forget.”
- Music: none or very faint
NEGATIVE CONSTRAINTS (optional):
- Avoid: exaggerated mouth movement, random hand artifacts, text overlays
Example 3 — Mini narrative beat (dynamic camera; no spoken dialogue)
SHOT SPEC
CINEMATOGRAPHY:
- Shot size & framing: wide shot in an alleyway; subject starts background center
- Lens feel (not brand): slightly wide, energetic perspective, mild edge distortion
- Camera path (start → end): handheld follow; start static, then step forward into a short chase
- Focus behavior: continuous autofocus on subject; brief motion blur allowed
- Stabilization style: handheld, controlled shake
- Composition rules / continuity constraints: keep direction consistent (subject runs toward camera)
MOOD:
- Lighting: rainy night, street reflections, high contrast
- Color palette: cool blues with warm neon accents
- Pace / energy: urgent
ACTIONS (3-beat staging):
1) Subject glances back, startled, then starts running.
2) Splashes through a puddle; neon reflection streaks across frame.
3) Slides to a stop under a flickering sign and looks up.
DIALOGUE / AUDIO (optional):
- SFX: footsteps on wet pavement, splash, distant siren, sign buzz
- Music: tense, minimal bass hit timed to the stop
NEGATIVE CONSTRAINTS (optional):
- Avoid: teleporting positions, changing wardrobe, random crowd appearing
Common failure modes + fixes (symptom → cause → rewrite)
| Symptom | Likely cause (weak block) | Rewrite / fix |
|---|---|---|
| Camera move is ignored | Cinematography block too vague | Specify start → end path, speed, and what stays framed (e.g., “push-in from MCU to CU, eyes stay centered”). |
| Action beats feel mushy or out of order | Actions block has too many beats or abstract verbs | Cut to 3 observable beats and number them. Replace “realizes” with physical actions. |
| Tone looks right but story is unclear | Mood is strong; Actions too thin | Add a concrete prop interaction or reveal moment in beat 2. |
| Audio doesn’t match what’s on screen | Dialogue/Audio block missing sync cues | Request specific SFX tied to specific beats (e.g., “click on cap close at end”). (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/) |
| You can’t get the exact duration/resolution you want | Trying to “prompt” container settings | Set duration/size via parameters; some attributes are parameter-governed. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide) |
A repeatable mini-workflow: turning one script sentence into 3 Shot Specs
Take a script line like: “Show the feature, prove it works, then end on brand.”
Step 1 — Split into micro-beats
- Beat A: introduce (what is it?)
- Beat B: demonstrate (what does it do?)
- Beat C: lock in (why remember it?)
Step 2 — Assign each beat to its own clip (3 Shot Specs)
- Shot 1 (Introduce): clean hero framing + simple placement action
- Shot 2 (Demonstrate): kinetic move + interaction + visible result
- Shot 3 (Lock in): controlled camera move + final pose + logo/brand moment
Step 3 — Keep each shot internally simple
Use the 3-beat Actions structure inside each shot. If you need more beats, it’s usually a sign you need another shot.
FAQ
What’s the single biggest upgrade I can make to my prompts?
Separate Cinematography from Actions so camera language and staging don’t compete.
Should I write resolution and duration inside the prompt text?
Prefer setting those via tool/API parameters; the Sora 2 guide notes some attributes are governed only by parameters and can’t be reliably requested in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
How long can Sora 2 videos be?
The Sora 2 guide states the maximum duration increased from 12 seconds to 20 seconds. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
Can I reuse a character consistently across clips?
The Sora 2 guide describes character references where you upload a character once and reuse it across videos with consistent appearance. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)
Related reading
- Getting started with the Veo 3 API
- Veo 3.1 vs Sora 2: practical comparison
- Veo 3 API pricing comparison
Build with Veo3Gen (CTA)
If you’re ready to turn these Shot Specs into a repeatable pipeline—generate clips in batches, test variations, and iterate quickly—start with the docs and endpoints on our API page. When you need to scale up experiments or production runs, see options on pricing.
Try Veo 3 & Veo 3 API for Free
Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.