If you’ve ever tried to prompt a “before → after” video with text alone, you’ve seen the usual problems: the camera slowly wanders, the subject changes identity, or the “after” doesn’t match what you had in mind.

A more repeatable approach is to bookend your generation with two anchors—your start image and your end image—then describe the motion that connects them.

Google Cloud has explicitly called out Veo 3.1’s “first frame, last frame” capability (i.e., First and Last Frame) in customer commentary (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). As of 2026-05-16, Veo 3.1 is also described as stable and generally available for production on Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1).

Below is a creator-friendly workflow you can use for UGC product swaps, room makeovers, logo reveals, and recipe transformations—without turning this into a generic “prompt tips” post.

What “First and Last Frame” is (and when it beats text-only prompting)

First and Last Frame means you provide (or otherwise lock) a specific opening frame and a specific closing frame, then prompt Veo to generate the transition in between.

This tends to beat text-only prompting when:

You need an exact “after” (a finished room, a plated dish, a product colorway) rather than “something like it.”
You’re making before/after reveals where continuity matters more than novelty.
You’re doing brand work (logos, packaging) where drift and extra objects can ruin the shot.

Veo prompt guidance commonly breaks direction down into things like shot framing and motion, style, lighting, character descriptions, location, action, and dialogue—useful categories to structure your transition prompt (https://deepmind.google/models/veo/prompt-guide/).

Prep: picking the right two images

Your frames are your creative constraints. The cleaner they are, the less your prompt has to “fight.”

Composition: lock the geometry

Aim for the same:

Camera position (height, angle)
Focal length feel (wide vs. normal vs. tele)
Subject scale (how big your product/person is in frame)
Horizon lines and key background anchors

If your “after” is shot from a slightly different angle, Veo may try to invent a camera move to reconcile the mismatch—often perceived as drift.

Subject lock: keep identity consistent

For people or recurring objects, continuity is everything:

Keep wardrobe/props consistent unless the change is the point.
Avoid conflicting descriptors (e.g., “red hoodie” in text when the first frame is blue).
Keep hair, accessories, labels, and hero features visible in both frames.

If you want the subject to change (e.g., plain wall → mural), keep the non-changing elements consistent (window position, floor pattern, countertop edge).

Match lighting (or make the lighting change intentional)

Lighting inconsistencies often look like a “filter jump.” Decide whether lighting is:

Constant (best for product swaps and logos)
Part of the transformation (e.g., dull “before” → bright “after,” but then prompt it explicitly as a gradual brightening)

The 4-part prompt scaffold for transitions

A good transition prompt is not long—it’s complete. A simple scaffold:

What stays the same (camera, framing, subject identity, background anchors)
What changes (the transformation details)
How it moves (motion path, reveal mechanic, pacing)
Look + sound (style, lighting, optional dialogue or SFX)

A useful mental model is to cover subject, context, action, and style—highlighted as key prompt parts in guidance referenced by Leonardo.Ai (https://leonardo.ai/news/mastering-prompts-for-veo-3/).

A reusable prompt “skeleton”

Copy/paste and fill the braces:

Create a {duration}s transition video. First frame: {describe start image}. Last frame: {describe end image}. Keep {camera/framing} consistent and keep {subject identity} consistent. Transition action: {how the change happens}. Motion: {camera motion or locked-off}. Lighting: {lighting}. Style: {style}. Audio: {music/SFX/VO guidance}.

5 creator-ready transition recipes (copy/paste prompts)

Each template is designed for the “bookend frames” workflow. Replace placeholders.

1) UGC product swap (handheld but stable)

{duration}s UGC-style product reveal. First frame: {subject holding product A}. Last frame: {same subject holding product B in same pose}. Keep the same face, hands, and background. Transition: a quick whip-pan blur and snap back to the same framing as the product changes from {start_state} to {end_state}. Camera: subtle handheld micro-shake, no reframing. Lighting: {lighting}. Style: clean, natural smartphone video. Audio: soft whoosh + satisfying “click” at the moment of change.

2) Room makeover (real estate / interior)

{duration}s before/after room transformation. First frame: {messy/empty room}. Last frame: {staged renovated room}. Keep camera locked, same lens feel, no zoom. Transition: objects slide into place in a smooth sequence (rug, sofa, plants, wall art), dust motes sparkle briefly, then settle. Lighting gradually warms from {start_lighting} to {end_lighting}. Style: realistic interior video, crisp details.

3) Logo reveal (brand-safe, minimal)

{duration}s logo morph reveal. First frame: {blank surface or abstract shapes}. Last frame: {final logo centered, clean background}. Keep background texture consistent. Transition: shapes assemble into the logo with precise alignment, no extra symbols. Camera: locked-off, no rotation. Lighting: {lighting}. Style: {style} with sharp edges. Audio: subtle rise + soft impact when the final logo locks in.

4) Recipe transformation (ingredients → plated)

{duration}s cooking transformation. First frame: {ingredients arranged on countertop}. Last frame: {finished plated dish in same spot}. Keep countertop, plate position, and camera angle consistent. Transition: ingredients lift and swirl into a quick “mix” vortex, then resolve into the plated dish; steam appears naturally at the end. Camera: locked with slight parallax only. Lighting: {lighting}. Style: appetizing food video, realistic textures. Audio: gentle kitchen SFX, no dialogue.

5) Fashion / beauty makeover (identity-safe)

{duration}s makeover reveal. First frame: {same person, neutral look}. Last frame: {same person, glam look}. Keep the same person’s facial features, hairstyle silhouette, and pose. Transition: a smooth foreground wipe using {prop: makeup brush / scarf / hand wave} that passes the lens; after the wipe, the look is {end_state}. Camera: locked, no dolly. Lighting: {lighting}. Style: {style}. Audio: soft whoosh + upbeat beat drop on reveal.

Timing control: mapping beats to seconds

You’ll get cleaner results when you define where the change happens.

For short-form ads, try this simple map:

0–20%: establish the “before” (let the viewer register it)
20–70%: transformation happens (the fun part)
70–100%: hold the “after” for readability and on-screen text

In prompts, describe it plainly:

“Hold the first frame look for ~1s, transform over ~2s, then hold the final frame look for ~1s.”

Audio notes (VO/SFX/music) without over-directing

Google Cloud describes Veo 3.1 as having rich synchronous audio (https://cloud.google.com/blog/products/ai-machine-learning/ultimate-prompting-guide-for-veo-3-1). DeepMind’s Veo prompt guide also notes that Veo can generate dialogue (https://deepmind.google/models/veo/prompt-guide/).

Practical creator guidance:

Use audio to accent the moment of change (whoosh, click, sparkle).
Keep VO lines short and concrete if you use them (one sentence).
Don’t specify exact music licensing or named tracks; describe mood and tempo instead.

QA checklist: artifacts to look for + what to tweak first

Use this quick pass before you burn time generating dozens of takes.

Checklist

Framing matches: subject size and camera angle feel consistent.
Identity holds: same face/product geometry; no “new” subject.
Motion is smooth: no sudden jump-cuts mid-transition.
Style is consistent: lighting/color doesn’t snap unless intended.
No junk objects: remove unwanted extra items.
Text/logo readability: edges sharp, no warping.

Troubleshooting (what to tweak first)

Warping / “melting” during transformation

Make the change more mechanical (wipe, whip-pan, foreground pass) rather than “morph.”
Reduce the number of things changing at once.
Re-pick frames with closer alignment (same pose, same crop).

Jumpy motion

Ask for locked-off camera explicitly.
Specify a single transition mechanic (don’t combine whip-pan + zoom + spin).

Camera drift

Reinforce “same framing, no zoom, no pan.”
Choose start/end images with matching perspective lines.

Style mismatch between first and last

State one style and keep it simple (e.g., “realistic smartphone video”).
Match lighting descriptions to your frames rather than inventing new ones.

Unreadable text/logo

Increase hold time on the final frame.
Prompt “clean edges, centered, high contrast, no distortion.”

Unwanted extra objects

Explicitly say “no additional objects, no extra hands, no extra text.”
Simplify the background in your two frames.

Mini case studies (how the workflow plays out)

Product swap (UGC ad)

Frames: same creator pose, same countertop, Product A → Product B.

Prompt focus: what stays (hands/face/background), what changes (only the product), and a single reveal mechanic (whip-pan).

Common fix: if the label becomes illegible, add a longer final hold and “sharp label text” guidance.

Room makeover (real estate)

Frames: empty living room → staged living room.

Prompt focus: lock camera and specify an ordered sequence (rug first, then sofa, then decor) so the model doesn’t randomize placements.

Common fix: if objects appear in wrong spots, simplify: fewer items, clearer last frame composition.

Logo morph (brand)

Frames: abstract shapes → final logo.

Prompt focus: precision, no extra symbols, locked camera, hold at end.

Common fix: if the logo “breathes” or wobbles, ask for “static final logo, no motion after assembly.”

Food transformation (recipe)

Frames: ingredient layout → plated dish.

Prompt focus: keep the plate location consistent, add steam only at the end.

Common fix: if colors shift too much, specify lighting continuity (“same white balance”).

Export + edit: turning one transition into multiple Shorts/Reels

Once you have one clean transition, you can spin it into variants fast:

Cutdown 1 (6–8s): hook (before) → snap reveal → 1 line of text.
Cutdown 2 (10–12s): slower build → reveal → benefits + CTA overlay.
Loop version: end frame matches the first frame composition so it loops cleanly.

Powtoon notes that Veo 3 produces clips from descriptive text (https://www.powtoon.com/blog/veo-3-video-prompt-examples/). If you’re producing in Flow, Powtoon also states Veo 3 is available through Flow with an AI Ultra plan at $250/month, and that each generation consumes 150 credits from a monthly allocation of 12,500 credits (https://www.powtoon.com/blog/veo-3-video-prompt-examples/). Treat that as a reminder to plan your iterations: start with low-variance prompts and only expand once your frames are solid.

CTA: build this workflow into your pipeline

If you’re turning these transitions into a repeatable system for campaigns (multiple products, locations, or creators), you’ll want automation around templating, batch generation, and review.

Explore the Veo3Gen API for programmatic generation and workflow integration: /api
Compare options and scale confidently with transparent limits: /pricing

Start generating via the API: /api
See plans and pricing: /pricing

Veo 3.1 First→Last Frame Transitions: A Creator Workflow for Smooth “Before/After” Reveals (Product, Room Makeover, Logo, Recipe) (as of 2026-05-16)