Prompt Engineering & Creative Control ·

Sora 2’s “Cinematography Block” Prompt Format (Veo3Gen Edition): Copy This Shot Spec for Cleaner Camera Moves + Better Beats (as of 2026-04-13)

Copy/paste a Sora 2–style cinematography block into a Veo3Gen shot spec to get cleaner camera moves, clearer beats, and better audio sync.

Most “cinematic prompts” fail for the same reason storyboards fail when they’re just moodboards: you described vibes, not shots. If you want camera language (push-ins, pans, racks, handheld energy) and you want on-screen action to land in the right order, you need a structure that tells the model what happens, how it’s filmed, and what we hear.

Sora 2’s official prompting guidance leans into that structure—and it’s been updated to reflect newer API capabilities like character references, higher-resolution exports, longer videos, video extension, and batch workflows. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

Below is a Veo3Gen-friendly, copy/paste “Shot Spec” that mirrors the same idea: separate cinematography from mood from actions from dialogue/audio. Use it once per clip, and you’ll get cleaner camera moves and far fewer “everything happens at once” generations.

Important: container settings (duration, resolution, etc.) must be set explicitly in your tool or API parameters. The Sora 2 guide notes that some attributes are governed only by API parameters and can’t be reliably requested in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)


Why most “cinematic prompts” fail: you describe vibes, not shots

A prompt like “cinematic, moody, dramatic lighting, 35mm film” can help aesthetics—but it often doesn’t dictate blocking or timing. The model may:

  • ignore the camera move (because it wasn’t constrained)
  • compress multiple actions into a single muddle
  • miss your intended beat order
  • give audio that doesn’t match the moment you pictured

A better approach is to write one shot at a time, with a tiny choreography inside that shot.


The Sora 2 prompt anatomy: Cinematography, Mood, Actions, Dialogue

Well-organized prompts with clear sections for what happens, how it looks, and what we hear tend to work better. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)

Sora 2 is also described as having strong cinematography literacy, so specific filmmaking terminology can help control how the scene unfolds. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)

Here’s what each block is “for”:

Cinematography (how it’s filmed)

Use this to specify framing, lens feel, camera path, stabilization style, focus behavior, and what must stay consistent.

Mood (how it feels)

Color palette, lighting direction, pacing/energy, and emotional tone.

Actions (what happens on-screen)

The ordered beats. Think staging and micro-blocking.

Dialogue / Audio (what we hear)

Sora 2 is described as generating audio natively, and guidance suggests requesting sound elements that sync with visuals. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/) WhyTryAI similarly describes Sora 2 as generating music, dialogue, and sound effects. (https://www.whytryai.com/p/ways-to-prompt-sora-2-movies)

Even if you’re not using Sora directly, writing audio intentionally helps you keep the scene coherent.


Veo3Gen “Shot Spec” template (copy/paste)

Paste this per clip. Keep each block short, specific, and testable.

SHOT SPEC

CINEMATOGRAPHY:
- Shot size & framing:
- Lens feel (not brand):
- Camera path (start → end):
- Focus behavior:
- Stabilization style:
- Composition rules / continuity constraints:

MOOD:
- Lighting:
- Color palette:
- Pace / energy:

ACTIONS (3-beat staging):
1)
2)
3)

DIALOGUE / AUDIO (optional):
- Dialogue (keep to one line max):
- SFX:
- Music:

NEGATIVE CONSTRAINTS (optional):
- Avoid:

Quick checklist (before you generate)


How to write the Cinematography block (framing, lens feel, camera path, constraints)

Treat cinematography like a mini contract:

Framing + subject priority

Write what the viewer must understand instantly.

  • “Medium close-up, eyes centered, product in lower-right foreground.”

Lens feel (avoid brand names; describe characteristics)

Instead of “Cooke lens,” say:

  • “Natural perspective, mild background compression, gentle falloff.”

Camera path: start → end

If you want a push-in, define where it begins and ends.

  • “Start: waist-up. End: tight close-up on hands.”

Constraints (continuity rules)

This is where you prevent chaos:

  • “No jump cuts. Keep character facing camera. Keep horizon level.”

How to write the Actions block: 3-beat staging that fits a short clip

If your clip is short, your beat count must be short.

Beat budget rule of thumb

For most short generations, aim for 2–3 beats per shot. If you cram in 7 beats, the model often merges them, reorders them, or drops critical steps.

Write beats as observable actions:

  • bad: “She becomes inspired.”
  • better: “She pauses, notices the scratch, smiles, starts polishing.”

Dialogue block (optional): when to include it, and how to keep it syncable

Use dialogue when the spoken line is the point of the clip (hook, punchline, CTA). Otherwise, prefer SFX + music.

Sora 2 is described as generating audio natively, so requesting sound that matches visuals can help. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/) Keep dialogue short—one line—so it has a chance to land cleanly.


3 filled examples (copyable Shot Specs)

Each example includes a distinct camera move and 3 action beats.

Example 1 — Product demo (no dialogue; sound-only)

SHOT SPEC

CINEMATOGRAPHY:
- Shot size & framing: tabletop close-up; product centered; hands enter from frame left
- Lens feel (not brand): macro-ish detail, shallow depth of field, crisp highlights
- Camera path (start → end): slow slider move left→right, 20–30cm, constant speed
- Focus behavior: start on logo, then subtle rack focus to the cap as it opens
- Stabilization style: smooth, controlled
- Composition rules / continuity constraints: keep product fully in frame; no sudden zooms

MOOD:
- Lighting: soft top light + gentle rim light
- Color palette: clean neutrals, minimal clutter
- Pace / energy: confident, precise

ACTIONS (3-beat staging):
1) Hand places the product down; label faces camera.
2) Second hand twists the cap open; a small puff of vapor escapes.
3) Hand tilts product slightly to catch the light; cap clicks shut.

DIALOGUE / AUDIO (optional):
- SFX: subtle plastic click, soft whoosh of vapor
- Music: light modern pulse, low volume

NEGATIVE CONSTRAINTS (optional):
- Avoid: extra hands, warped labels, unreadable logo

Example 2 — Talking-head hook (short dialogue line)

SHOT SPEC

CINEMATOGRAPHY:
- Shot size & framing: medium close-up, subject centered, shoulders visible
- Lens feel (not brand): natural perspective, gentle background blur
- Camera path (start → end): slow push-in from medium close-up → tighter close-up
- Focus behavior: locked on eyes
- Stabilization style: tripod-stable
- Composition rules / continuity constraints: keep eye-line to lens; no scene cuts

MOOD:
- Lighting: soft key + subtle practical lamp in background
- Color palette: warm skin tones, calm background
- Pace / energy: direct, punchy

ACTIONS (3-beat staging):
1) Subject leans in slightly as the push-in begins.
2) Raises one finger to emphasize the point.
3) Quick half-smile at the end.

DIALOGUE / AUDIO (optional):
- Dialogue (keep to one line max): “Here’s the one shot detail most prompts forget.”
- Music: none or very faint

NEGATIVE CONSTRAINTS (optional):
- Avoid: exaggerated mouth movement, random hand artifacts, text overlays

Example 3 — Mini narrative beat (dynamic camera; no spoken dialogue)

SHOT SPEC

CINEMATOGRAPHY:
- Shot size & framing: wide shot in an alleyway; subject starts background center
- Lens feel (not brand): slightly wide, energetic perspective, mild edge distortion
- Camera path (start → end): handheld follow; start static, then step forward into a short chase
- Focus behavior: continuous autofocus on subject; brief motion blur allowed
- Stabilization style: handheld, controlled shake
- Composition rules / continuity constraints: keep direction consistent (subject runs toward camera)

MOOD:
- Lighting: rainy night, street reflections, high contrast
- Color palette: cool blues with warm neon accents
- Pace / energy: urgent

ACTIONS (3-beat staging):
1) Subject glances back, startled, then starts running.
2) Splashes through a puddle; neon reflection streaks across frame.
3) Slides to a stop under a flickering sign and looks up.

DIALOGUE / AUDIO (optional):
- SFX: footsteps on wet pavement, splash, distant siren, sign buzz
- Music: tense, minimal bass hit timed to the stop

NEGATIVE CONSTRAINTS (optional):
- Avoid: teleporting positions, changing wardrobe, random crowd appearing

Common failure modes + fixes (symptom → cause → rewrite)

Symptom Likely cause (weak block) Rewrite / fix
Camera move is ignored Cinematography block too vague Specify start → end path, speed, and what stays framed (e.g., “push-in from MCU to CU, eyes stay centered”).
Action beats feel mushy or out of order Actions block has too many beats or abstract verbs Cut to 3 observable beats and number them. Replace “realizes” with physical actions.
Tone looks right but story is unclear Mood is strong; Actions too thin Add a concrete prop interaction or reveal moment in beat 2.
Audio doesn’t match what’s on screen Dialogue/Audio block missing sync cues Request specific SFX tied to specific beats (e.g., “click on cap close at end”). (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)
You can’t get the exact duration/resolution you want Trying to “prompt” container settings Set duration/size via parameters; some attributes are parameter-governed. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

A repeatable mini-workflow: turning one script sentence into 3 Shot Specs

Take a script line like: “Show the feature, prove it works, then end on brand.”

Step 1 — Split into micro-beats

  • Beat A: introduce (what is it?)
  • Beat B: demonstrate (what does it do?)
  • Beat C: lock in (why remember it?)

Step 2 — Assign each beat to its own clip (3 Shot Specs)

  • Shot 1 (Introduce): clean hero framing + simple placement action
  • Shot 2 (Demonstrate): kinetic move + interaction + visible result
  • Shot 3 (Lock in): controlled camera move + final pose + logo/brand moment

Step 3 — Keep each shot internally simple

Use the 3-beat Actions structure inside each shot. If you need more beats, it’s usually a sign you need another shot.


FAQ

What’s the single biggest upgrade I can make to my prompts?

Separate Cinematography from Actions so camera language and staging don’t compete.

Should I write resolution and duration inside the prompt text?

Prefer setting those via tool/API parameters; the Sora 2 guide notes some attributes are governed only by parameters and can’t be reliably requested in prose. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

How long can Sora 2 videos be?

The Sora 2 guide states the maximum duration increased from 12 seconds to 20 seconds. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)

Can I reuse a character consistently across clips?

The Sora 2 guide describes character references where you upload a character once and reuse it across videos with consistent appearance. (https://developers.openai.com/cookbook/examples/sora/sora2_prompting_guide)



Build with Veo3Gen (CTA)

If you’re ready to turn these Shot Specs into a repeatable pipeline—generate clips in batches, test variations, and iterate quickly—start with the docs and endpoints on our API page. When you need to scale up experiments or production runs, see options on pricing.

Limited Time Offer

Try Veo 3 & Veo 3 API for Free

Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.