What Flow is (plain English) — and when to use it vs “one prompt = one clip”

If you’ve only used AI video as “type prompt → get one clip”, you’ll eventually hit the same wall: your second clip doesn’t match the first.

Flow (as creators commonly use the term) is best thought of as a lightweight timeline + iteration workspace where you can:

break one idea into small shots (4–8 seconds each, depending on your settings),
generate variations for each shot,
and stitch them into a coherent 15–30 second scene.

This matters because Veo 3.1 is positioned as a model meant for creative control, not just simple generation (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). And it supports rich synchronous audio alongside video (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), which changes how you plan shots—dialogue and action need to line up.

As of 2026-04-14, treat Flow as your “organizing layer” and Veo 3.1 as the “shot engine.” The trick is to stop thinking in one big prompt and start thinking in Shot Cards.

The 10-minute setup: define your scene constraints

Before you write Shot Cards, lock a handful of constraints. This is the fastest way to get continuity without turning the process into a filmmaking degree.

Create a Scene Bible (a few bullet points you paste into every shot):

Cast: name, age range, key physical descriptors, voice vibe
Wardrobe: 2–3 specific items + colors + materials
Location: one place, described consistently
Time: time of day + weather
Style: genre + camera language (handheld vs locked-off, lens feel)
Audio palette: ambient bed + SFX + music vibe

Why this works: Veo 3.1 emphasizes stronger prompt adherence (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), so repeating the same constraints increases the odds each shot “snaps” to the same world.

Continuity checklist (paste into your Scene Bible)

Wardrobe: colors, textures, accessories
Props: hero object, where it’s held/placed
Lighting direction: key light left/right, practicals visible
Time of day: golden hour vs midday vs night
Camera language: lens feel (e.g., wide vs portrait closeups), movement style
Character descriptors: same name + same defining traits every time

The Shot Card template (fill-in)

Use this as your reusable unit. The point is that each card equals one shot, not “the whole video.”

Shot Card (copy/paste)

Subject:
Setting:
Camera: (framing, lens feel, movement)
Action: (blocking + gestures)
Timing: (target seconds + what happens at start/middle/end)
Audio: (dialogue + SFX + ambience; call out sync moments)
Style: (cinematic refs in plain words; color/grade)
Constraints: (must-not-change continuity bullets)

Tip: keep Shot Cards short, but keep constraints specific.

One concept → 5 Shot Cards (a complete 15–30s mini-scene)

Here’s a concrete example you can adapt for a creator announcement, product reveal, or micro-story.

Concept: A creator reveals a new “Focus Mode” app update in a tiny home office.

Scene Bible (used in every card):

Cast: “Maya, late 20s, warm voice, short curly hair, expressive eyebrows”
Wardrobe: “sage-green hoodie, small silver hoop earrings”
Location: “cozy home office, wooden desk, houseplant, laptop, morning window light from camera-left”
Style: “clean modern, soft contrast, natural skin tones; subtle handheld micro-movement”
Audio bed: “quiet room tone + faint city birds outside”

Shot Card 1 — Hook (wide establish)

Subject: Maya at desk, laptop closed
Setting: home office, morning light camera-left
Camera: wide 24–28mm feel, chest-high, slow push-in
Action: Maya slides a sticky note that reads “Focus?” off the desk, looks to camera
Timing: 5–6s; beat on the look to camera at ~3s
Audio: Maya: “I fixed the one thing that always broke my focus.” (soft room tone)
Style: clean, lightly handheld
Constraints: keep wardrobe/location/light direction identical

Shot Card 2 — Problem (close reaction)

Subject: Maya frustrated, phone notifications buzzing
Camera: medium close-up 50mm feel, slight handheld sway
Action: phone screen lights up (no readable brand UI), Maya sighs and flips phone face-down
Timing: 4–6s; buzz at start, flip by ~2s
Audio: SFX: two notification buzzes; Maya: “The pings… every time.”
Constraints: same desk items, same hoodie/earrings

Shot Card 3 — Reveal (hero action)

Subject: Maya opens laptop; app “Focus Mode” appears (generic UI)
Camera: over-the-shoulder, 35mm feel, gentle tilt down to laptop
Action: she clicks “Focus Mode,” timer starts
Timing: 6–8s; click at ~3s, timer visible by ~4s
Audio: SFX: soft click + subtle “start” chime; Maya: “Now it’s one tap.”
Constraints: avoid legible logos; keep morning light direction

Shot Card 4 — Proof (micro-montage inside one shot)

Subject: Maya working calmly
Camera: medium shot, locked-off tripod feel (contrast from earlier handheld)
Action: she types, sips coffee, checks a single checklist, smiles
Timing: 6–8s; three quick internal beats
Audio: ambience + light keyboard; Maya (quiet, satisfied): “So I can actually finish things.”
Constraints: same props: mug, plant, laptop

Shot Card 5 — End beat + CTA (direct address)

Subject: Maya speaks to camera
Camera: close-up 50mm feel, subtle push-in
Action: she turns the sticky note over; it now says “Done.”
Timing: 5–6s; “Done” reveal near the end
Audio: Maya: “If you want it, I’ll link the update. Try Focus Mode today.”
Constraints: match eyeline, hoodie, earrings, light direction

That’s your 15–30 seconds: 5 shots, each short, each reusable.

Generate Clip v1: do a “blocking pass” before you chase pretty

You’ll get better results (and waste fewer generations) if you separate:

Blocking pass = staging, action, camera, timing, audio cues
Beauty pass = lighting polish, texture, cinematography flair

Veo 3.1 offers professional-grade creative controls and rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). Lean into that by first making sure mouths, gestures, and beats land where you need them.

Example “blocking pass” prompt (Shot Card 3)

Blocking pass prompt:

Subject: Maya, late 20s, short curly hair, sage-green hoodie, silver hoop earrings. Setting: cozy home office, wooden desk, plant, laptop, morning window light from camera-left. Camera: over-the-shoulder, 35mm feel, gentle tilt down to laptop. Action: Maya opens laptop, moves cursor, clicks a button labeled “Focus Mode” (generic UI), timer starts. Timing: 6–8 seconds; click at ~3s; timer visible by ~4s. Audio: quiet room tone + faint birds; soft click + subtle start chime; Maya says: “Now it’s one tap.” Ensure mouth movement matches. Style: clean modern, neutral color. Constraints: keep wardrobe and desk props consistent; no readable logos.

Example “beauty pass” prompt (same shot)

Beauty pass prompt (only polish):

Keep the same subject, setting, camera angle, action, timing, and dialogue. Improve visual fidelity: natural skin tones, soft cinematic contrast, realistic morning light rays from camera-left, clean depth of field, crisp hands and laptop edges. Maintain generic UI with no legible branding.

Notice what changed: only the look—not the blocking.

The strict iteration rule: change ONE variable per generation (and log versions)

When a shot isn’t working, the temptation is to rewrite everything. Don’t. You won’t learn what fixed it, and you’ll accidentally break continuity.

The one-change rule

Pick one variable to change per generation:

Motion (slower push-in)
Camera (wider lens feel)
Acting (less smile, more surprised)
Props (remove phone)
Audio (shorter line; add 1 SFX)

Simple version log (copy/paste)

Shot 3 — v1: OTS tilt down, line too fast
Shot 3 — v2: Change: timing (pause before “one tap”)
Shot 3 — v3: Change: camera (reduce tilt, more stable)
Shot 3 — v4: Change: audio (lower chime volume)

This makes Flow feel like a tiny post-production pipeline instead of random prompting.

Consistency tricks that actually help (without overpromising)

As of 2026-04-14, consistency is mostly about repetition + specificity:

Reuse the same character name + descriptors in every shot.
Repeat wardrobe in the same words (“sage-green hoodie,” not “green top”).
Keep “camera-left morning window light” consistent.
Use the same lens language across related shots (e.g., 24mm wide, 50mm closeups).

If your workflow supports it, using starting/ending frames can help controlled transitions; Veo 3.1 is described as supporting Start/End Frame for controlled transitions in at least one integration context (https://ltx.studio/blog/veo-prompt-guide). DataCamp also notes workflows around longer consistency by extending existing videos and using specific starting frames (https://www.datacamp.com/tutorial/veo-3-1-complete-guide-with-examples). (Exact Flow UI options vary—use what your interface exposes.)

Sound/dialogue basics: write lines so audio, mouth, and action match

Because Veo 3.1 is positioned to generate rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), treat audio like part of blocking.

Practical rules

Keep dialogue short (one sentence per shot).
Put sync cues in the Shot Card: “smiles on the word ‘Done’,” “click happens before the line.”
Avoid tongue-twisters and brand names (they’re harder to articulate cleanly).
Specify ambience + 1–2 SFX max per shot.

LTX Studio notes Veo 3.1 prompting can produce clips with dialogue, sound effects, and ambient audio (https://ltx.studio/blog/veo-prompt-guide). Use that, but don’t overload it.

Assemble the 15–30s sequence: pacing rules for intros, reveals, and end beats

A beginner-friendly pacing recipe:

0–3s: hook (a look, a question, a surprising action)
3–12s: problem → friction
12–22s: reveal → proof
22–30s: button (end beat + what to do next)

If you’re making three platform variants with the same Shot Cards:

TikTok hook version: start on Shot 2 (notification buzz) then jump to Shot 3.
Reels version: keep all 5 shots but tighten timing.
Ad version: make Shot 5 more direct; keep visuals identical to protect continuity.

You’re not rewriting prompts—just swapping which Shot Cards you render and how you order them.

Export checklist (quick)

Before posting, do one last pass to catch common AI-video issues.

Final export checklist

Continuity: wardrobe/props/light direction consistent
Faces/hands: no flicker or warped fingers on key beats
Text artifacts: no unwanted gibberish on screens or signs
Audio sync: lip movement matches the exact words
Cuts: no jarring camera direction changes back-to-back

FAQ

Does Veo 3.1 support audio, or do I need to add it later?

Google Cloud describes Veo 3.1 as offering rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). Many creators still add polish in editing, but plan audio inside each Shot Card.

How long should each Shot Card be?

Short is easier to control. Some integrations describe clip durations like 4, 6, or 8 seconds (https://ltx.studio/blog/veo-prompt-guide). If your Flow setup uses similar lengths, 4–8 seconds per shot is a practical starting point.

Can I make longer, consistent scenes than 30 seconds?

Yes, by chaining shots and (where available) extending or anchoring on frames. DataCamp notes creating longer, consistent videos by extending existing videos and using specific starting frames (https://www.datacamp.com/tutorial/veo-3-1-complete-guide-with-examples).

Start generating via the API: /api
See plans and pricing: /pricing

Veo 3.1 in Google Flow: A Beginner Workflow to Build a 15–30s Scene from Shot Cards (as of 2026-04-14)