Beginner Tutorials (Flow & Veo) ·
Veo 3.1 in Google Flow: A Beginner Workflow to Build a 15–30s Scene from Shot Cards (as of 2026-04-14)
A beginner Veo 3.1 Flow tutorial: turn one idea into reusable Shot Cards, iterate with one change per gen, and assemble a coherent 15–30s scene.
On this page
- What Flow is (plain English) — and when to use it vs “one prompt = one clip”
- The 10-minute setup: define your scene constraints
- Continuity checklist (paste into your Scene Bible)
- The Shot Card template (fill-in)
- Shot Card (copy/paste)
- One concept → 5 Shot Cards (a complete 15–30s mini-scene)
- Shot Card 1 — Hook (wide establish)
- Shot Card 2 — Problem (close reaction)
- Shot Card 3 — Reveal (hero action)
- Shot Card 4 — Proof (micro-montage inside one shot)
- Shot Card 5 — End beat + CTA (direct address)
- Generate Clip v1: do a “blocking pass” before you chase pretty
- Example “blocking pass” prompt (Shot Card 3)
- Example “beauty pass” prompt (same shot)
- The strict iteration rule: change ONE variable per generation (and log versions)
- The one-change rule
- Simple version log (copy/paste)
- Consistency tricks that actually help (without overpromising)
- Sound/dialogue basics: write lines so audio, mouth, and action match
- Practical rules
- Assemble the 15–30s sequence: pacing rules for intros, reveals, and end beats
- Export checklist (quick)
- Final export checklist
- FAQ
- Does Veo 3.1 support audio, or do I need to add it later?
- How long should each Shot Card be?
- Can I make longer, consistent scenes than 30 seconds?
- Is Veo 3.1 “production ready”?
- Related reading
- CTA: Build your own Shot Card pipeline with Veo3Gen
- Try Veo3Gen (Affordable Veo 3.1 Access)
What Flow is (plain English) — and when to use it vs “one prompt = one clip”
If you’ve only used AI video as “type prompt → get one clip”, you’ll eventually hit the same wall: your second clip doesn’t match the first.
Flow (as creators commonly use the term) is best thought of as a lightweight timeline + iteration workspace where you can:
- break one idea into small shots (4–8 seconds each, depending on your settings),
- generate variations for each shot,
- and stitch them into a coherent 15–30 second scene.
This matters because Veo 3.1 is positioned as a model meant for creative control, not just simple generation (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). And it supports rich synchronous audio alongside video (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), which changes how you plan shots—dialogue and action need to line up.
As of 2026-04-14, treat Flow as your “organizing layer” and Veo 3.1 as the “shot engine.” The trick is to stop thinking in one big prompt and start thinking in Shot Cards.
The 10-minute setup: define your scene constraints
Before you write Shot Cards, lock a handful of constraints. This is the fastest way to get continuity without turning the process into a filmmaking degree.
Create a Scene Bible (a few bullet points you paste into every shot):
- Cast: name, age range, key physical descriptors, voice vibe
- Wardrobe: 2–3 specific items + colors + materials
- Location: one place, described consistently
- Time: time of day + weather
- Style: genre + camera language (handheld vs locked-off, lens feel)
- Audio palette: ambient bed + SFX + music vibe
Why this works: Veo 3.1 emphasizes stronger prompt adherence (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), so repeating the same constraints increases the odds each shot “snaps” to the same world.
Continuity checklist (paste into your Scene Bible)
- Wardrobe: colors, textures, accessories
- Props: hero object, where it’s held/placed
- Lighting direction: key light left/right, practicals visible
- Time of day: golden hour vs midday vs night
- Camera language: lens feel (e.g., wide vs portrait closeups), movement style
- Character descriptors: same name + same defining traits every time
The Shot Card template (fill-in)
Use this as your reusable unit. The point is that each card equals one shot, not “the whole video.”
Shot Card (copy/paste)
- Subject:
- Setting:
- Camera: (framing, lens feel, movement)
- Action: (blocking + gestures)
- Timing: (target seconds + what happens at start/middle/end)
- Audio: (dialogue + SFX + ambience; call out sync moments)
- Style: (cinematic refs in plain words; color/grade)
- Constraints: (must-not-change continuity bullets)
Tip: keep Shot Cards short, but keep constraints specific.
One concept → 5 Shot Cards (a complete 15–30s mini-scene)
Here’s a concrete example you can adapt for a creator announcement, product reveal, or micro-story.
Concept: A creator reveals a new “Focus Mode” app update in a tiny home office.
Scene Bible (used in every card):
- Cast: “Maya, late 20s, warm voice, short curly hair, expressive eyebrows”
- Wardrobe: “sage-green hoodie, small silver hoop earrings”
- Location: “cozy home office, wooden desk, houseplant, laptop, morning window light from camera-left”
- Style: “clean modern, soft contrast, natural skin tones; subtle handheld micro-movement”
- Audio bed: “quiet room tone + faint city birds outside”
Shot Card 1 — Hook (wide establish)
- Subject: Maya at desk, laptop closed
- Setting: home office, morning light camera-left
- Camera: wide 24–28mm feel, chest-high, slow push-in
- Action: Maya slides a sticky note that reads “Focus?” off the desk, looks to camera
- Timing: 5–6s; beat on the look to camera at ~3s
- Audio: Maya: “I fixed the one thing that always broke my focus.” (soft room tone)
- Style: clean, lightly handheld
- Constraints: keep wardrobe/location/light direction identical
Shot Card 2 — Problem (close reaction)
- Subject: Maya frustrated, phone notifications buzzing
- Camera: medium close-up 50mm feel, slight handheld sway
- Action: phone screen lights up (no readable brand UI), Maya sighs and flips phone face-down
- Timing: 4–6s; buzz at start, flip by ~2s
- Audio: SFX: two notification buzzes; Maya: “The pings… every time.”
- Constraints: same desk items, same hoodie/earrings
Shot Card 3 — Reveal (hero action)
- Subject: Maya opens laptop; app “Focus Mode” appears (generic UI)
- Camera: over-the-shoulder, 35mm feel, gentle tilt down to laptop
- Action: she clicks “Focus Mode,” timer starts
- Timing: 6–8s; click at ~3s, timer visible by ~4s
- Audio: SFX: soft click + subtle “start” chime; Maya: “Now it’s one tap.”
- Constraints: avoid legible logos; keep morning light direction
Shot Card 4 — Proof (micro-montage inside one shot)
- Subject: Maya working calmly
- Camera: medium shot, locked-off tripod feel (contrast from earlier handheld)
- Action: she types, sips coffee, checks a single checklist, smiles
- Timing: 6–8s; three quick internal beats
- Audio: ambience + light keyboard; Maya (quiet, satisfied): “So I can actually finish things.”
- Constraints: same props: mug, plant, laptop
Shot Card 5 — End beat + CTA (direct address)
- Subject: Maya speaks to camera
- Camera: close-up 50mm feel, subtle push-in
- Action: she turns the sticky note over; it now says “Done.”
- Timing: 5–6s; “Done” reveal near the end
- Audio: Maya: “If you want it, I’ll link the update. Try Focus Mode today.”
- Constraints: match eyeline, hoodie, earrings, light direction
That’s your 15–30 seconds: 5 shots, each short, each reusable.
Generate Clip v1: do a “blocking pass” before you chase pretty
You’ll get better results (and waste fewer generations) if you separate:
- Blocking pass = staging, action, camera, timing, audio cues
- Beauty pass = lighting polish, texture, cinematography flair
Veo 3.1 offers professional-grade creative controls and rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). Lean into that by first making sure mouths, gestures, and beats land where you need them.
Example “blocking pass” prompt (Shot Card 3)
Blocking pass prompt:
Subject: Maya, late 20s, short curly hair, sage-green hoodie, silver hoop earrings. Setting: cozy home office, wooden desk, plant, laptop, morning window light from camera-left. Camera: over-the-shoulder, 35mm feel, gentle tilt down to laptop. Action: Maya opens laptop, moves cursor, clicks a button labeled “Focus Mode” (generic UI), timer starts. Timing: 6–8 seconds; click at ~3s; timer visible by ~4s. Audio: quiet room tone + faint birds; soft click + subtle start chime; Maya says: “Now it’s one tap.” Ensure mouth movement matches. Style: clean modern, neutral color. Constraints: keep wardrobe and desk props consistent; no readable logos.
Example “beauty pass” prompt (same shot)
Beauty pass prompt (only polish):
Keep the same subject, setting, camera angle, action, timing, and dialogue. Improve visual fidelity: natural skin tones, soft cinematic contrast, realistic morning light rays from camera-left, clean depth of field, crisp hands and laptop edges. Maintain generic UI with no legible branding.
Notice what changed: only the look—not the blocking.
The strict iteration rule: change ONE variable per generation (and log versions)
When a shot isn’t working, the temptation is to rewrite everything. Don’t. You won’t learn what fixed it, and you’ll accidentally break continuity.
The one-change rule
Pick one variable to change per generation:
- Motion (slower push-in)
- Camera (wider lens feel)
- Acting (less smile, more surprised)
- Props (remove phone)
- Audio (shorter line; add 1 SFX)
Simple version log (copy/paste)
- Shot 3 — v1: OTS tilt down, line too fast
- Shot 3 — v2: Change: timing (pause before “one tap”)
- Shot 3 — v3: Change: camera (reduce tilt, more stable)
- Shot 3 — v4: Change: audio (lower chime volume)
This makes Flow feel like a tiny post-production pipeline instead of random prompting.
Consistency tricks that actually help (without overpromising)
As of 2026-04-14, consistency is mostly about repetition + specificity:
- Reuse the same character name + descriptors in every shot.
- Repeat wardrobe in the same words (“sage-green hoodie,” not “green top”).
- Keep “camera-left morning window light” consistent.
- Use the same lens language across related shots (e.g., 24mm wide, 50mm closeups).
If your workflow supports it, using starting/ending frames can help controlled transitions; Veo 3.1 is described as supporting Start/End Frame for controlled transitions in at least one integration context (https://ltx.studio/blog/veo-prompt-guide). DataCamp also notes workflows around longer consistency by extending existing videos and using specific starting frames (https://www.datacamp.com/tutorial/veo-3-1-complete-guide-with-examples). (Exact Flow UI options vary—use what your interface exposes.)
Sound/dialogue basics: write lines so audio, mouth, and action match
Because Veo 3.1 is positioned to generate rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1), treat audio like part of blocking.
Practical rules
- Keep dialogue short (one sentence per shot).
- Put sync cues in the Shot Card: “smiles on the word ‘Done’,” “click happens before the line.”
- Avoid tongue-twisters and brand names (they’re harder to articulate cleanly).
- Specify ambience + 1–2 SFX max per shot.
LTX Studio notes Veo 3.1 prompting can produce clips with dialogue, sound effects, and ambient audio (https://ltx.studio/blog/veo-prompt-guide). Use that, but don’t overload it.
Assemble the 15–30s sequence: pacing rules for intros, reveals, and end beats
A beginner-friendly pacing recipe:
- 0–3s: hook (a look, a question, a surprising action)
- 3–12s: problem → friction
- 12–22s: reveal → proof
- 22–30s: button (end beat + what to do next)
If you’re making three platform variants with the same Shot Cards:
- TikTok hook version: start on Shot 2 (notification buzz) then jump to Shot 3.
- Reels version: keep all 5 shots but tighten timing.
- Ad version: make Shot 5 more direct; keep visuals identical to protect continuity.
You’re not rewriting prompts—just swapping which Shot Cards you render and how you order them.
Export checklist (quick)
Before posting, do one last pass to catch common AI-video issues.
Final export checklist
- Continuity: wardrobe/props/light direction consistent
- Faces/hands: no flicker or warped fingers on key beats
- Text artifacts: no unwanted gibberish on screens or signs
- Audio sync: lip movement matches the exact words
- Cuts: no jarring camera direction changes back-to-back
FAQ
Does Veo 3.1 support audio, or do I need to add it later?
Google Cloud describes Veo 3.1 as offering rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). Many creators still add polish in editing, but plan audio inside each Shot Card.
How long should each Shot Card be?
Short is easier to control. Some integrations describe clip durations like 4, 6, or 8 seconds (https://ltx.studio/blog/veo-prompt-guide). If your Flow setup uses similar lengths, 4–8 seconds per shot is a practical starting point.
Can I make longer, consistent scenes than 30 seconds?
Yes, by chaining shots and (where available) extending or anchoring on frames. DataCamp notes creating longer, consistent videos by extending existing videos and using specific starting frames (https://www.datacamp.com/tutorial/veo-3-1-complete-guide-with-examples).
Is Veo 3.1 “production ready”?
Google Cloud states Veo 3.1 is stable and generally available for production on Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1).
Related reading
CTA: Build your own Shot Card pipeline with Veo3Gen
If you’re ready to turn this Shot Card workflow into something repeatable for your team (templates, versioning, and programmatic generation), explore the Veo3Gen API: /api.
And if you’re budgeting for creator content, experiments, or small campaign batches, compare plans here: /pricing.
Try Veo3Gen (Affordable Veo 3.1 Access)
If you want to turn these tips into real clips today, try Veo3Gen:
Try Veo 3 & Veo 3 API for Free
Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.