Prompting9 min read

Kling's Official Text-to-Video Prompt Guide, Rebuilt for Veo3Gen: A "6-Slot Prompt Card" You Can Reuse for Any Clip

A reusable 6‑slot text to video prompt structure rebuilt from Kling’s guide—adapted for Veo3Gen, with a worked example, troubleshooting map, checklist, and FAQ.

On this page

TL;DR

Stop writing one mushy paragraph. Use a 6‑Slot Prompt Card you can reuse:

Subject → Action → Setting → Camera → Lighting+Mood → Style/Genre (+ optional Micro‑timing + Audio + Negatives)

Write one clear sentence per slot, generate, then iterate one slot at a time so you can diagnose what actually changed.

This rebuilds Kling’s official prompt formula—Subject (description) + Subject Movement + Scene (description) with optional Camera + Lighting + Atmosphere—into a debuggable card you can use in Veo3Gen. (https://kling.ai/quickstart/text-to-video-prompt-guide)

Key takeaways

  • Kling’s baseline is Subject + Movement + Scene, with Camera/Lighting/Atmosphere optional—making those “optional” parts explicit is how you get repeatable results. (https://kling.ai/quickstart/text-to-video-prompt-guide)
  • Motion works better when you describe visible changes (smoke drifting, flames bending) instead of inner states (“thinking”). (https://kling.ai/blog/kling-ai-prompt-guide)
  • For camera control, use one framing + one move, using common terms (close‑up, wide shot, push‑in, pan, tilt, tracking). (https://kling.ai/blog/kling-ai-prompt-guide)
  • When results drift, don’t rewrite everything—change one slot per regeneration.
  • Veo3Gen generations include native, synchronized audio in a single pass, so you can add a short Audio line (dialogue/SFX/music) without a separate audio step.

Why “good ideas” still produce inconsistent clips

Most inconsistent outputs aren’t “the model being random.” They’re prompts that mix multiple decisions in one sentence:

  • identity (who/what)
  • motion (what changes)
  • world anchors (where)
  • cinematography (what the camera does)
  • look (lighting + mood)
  • genre/style constraints

When those are tangled, you can’t debug. You regenerate, but you also changed three things at once—so you learn nothing.

Kling’s quickstart guide is direct: prompts dictate the content of the video. (https://kling.ai/quickstart/text-to-video-prompt-guide)

So the fix is a structure that:

  1. forces you to state decisions clearly, and
  2. lets you isolate changes.

The 6‑Slot Prompt Card (copy/paste)

Kling recommends defining subject, action, setting, camera language, lighting, and mood in plain language. (https://kling.ai/blog/kling-ai-prompt-guide)

Use this card as a single block. Keep each slot tight.

[1] SUBJECT (identity only)
- Who/what it is + 2–4 identifiers.

[2] ACTION (visible motion)
- What changes on screen (start → end), with clear verbs.

[3] SETTING (world anchors)
- Location + 2–3 concrete anchors (surfaces, background elements, time-of-day).

[4] CAMERA (one framing + one move)
- Framing (wide/medium/close-up) + ONE move (push-in/pan/tilt/tracking) + what it emphasizes.

[5] LIGHTING + MOOD (minimum look controls)
- Light source/time + contrast + one mood word.

[6] STYLE/GENRE (guardrails)
- A simple style constraint that doesn’t replace the scene.

(Optional) MICRO-TIMING (beats)
- 0–2s: ...
- 2–4s: ...
- 4–6s: ...

(Optional) AUDIO
- Dialogue/ambience/SFX/music requests.

NEGATIVES / AVOID
- Short list of failure modes to avoid.

How to run this in Veo3Gen (fast iteration loop)

  1. Paste the full card into your text prompt.
  2. Generate.
  3. Duplicate the prompt.
  4. Change only one slot for the next generation.

Veo3Gen also supports text‑to‑video and image‑to‑video, plus first‑and‑last‑frame control on Veo 3.1—useful when you need a clip to start/end on specific visuals. And because Veo3Gen includes native synchronized audio, you can keep your Audio line inside the same prompt block.

Mid‑article CTA: If you want this workflow to feel “production‑ready” (preview fast, then rerun for fidelity), try it in Veo3Gen by generating the same card in different modes: Veo 3.1 Lite (cheap previews), Veo 3.1 Fast (great default), and Veo 3.1 Quality (max fidelity).

Slot-by-slot rules (what to write, what to avoid)

Slot 1 — Subject (prevent identity drift)

Kling notes the subject can be people, animals, plants, objects, etc. (https://kling.ai/quickstart/text-to-video-prompt-guide)

Write identity, not vibes.

Good identifiers (pick 2–4):

  • type + role: “female barista”, “matte‑black handheld vacuum”
  • distinguishing marker: “scar on left eyebrow”, “transparent dust chamber”
  • wardrobe/material: “navy apron”, “brushed steel casing”

Avoid:

  • emotions (“confident founder”)—put visible emotion cues in Action
  • camera framing—belongs in Slot 4

Slot 2 — Action (use visible verbs)

Kling’s prompt guide: motion cues should describe what viewers can see (smoke drifting upward; flames bending). (https://kling.ai/blog/kling-ai-prompt-guide)

Action lines work best when they include:

  • a start state and end state
  • a single dominant motion
  • one secondary motion (steam, fabric, liquid, hair) that proves time is passing

Bad: “She is excited about the product.”

Better: “She grins, lifts the product into frame, rotates it slowly; the logo catches the light.”

Slot 3 — Setting (anchors, not a novel)

Use: where + 2–3 anchors.

Examples:

  • “Small kitchen, morning sun through blinds, crumbs on counter.”
  • “Rain‑wet crosswalk at dusk, neon reflections, umbrellas passing behind.”

Too many props = continuity chaos.

Slot 4 — Camera (one move, no stacking)

Kling lists camera-direction terms like close-up, wide shot, low angle, slow push‑in, pan, tilt, tracking shot. (https://kling.ai/blog/kling-ai-prompt-guide)

Use this format:

  • Framing: wide/medium/close‑up
  • One move: push‑in or tracking or pan or tilt
  • Purpose: what it emphasizes

Example:

  • “Medium shot; slow push‑in to emphasize her reaction as the screen lights up.”

Slot 5 — Lighting + mood (smallest set that constrains the look)

Segmind notes style details can include lighting, color palette, camera angle, time of day. (https://blog.segmind.com/best-text-to-video-prompts-for-kling-ai-with-examples)

A reliable minimum:

  • time/light source: “late afternoon window light”
  • contrast: “soft, low contrast” or “hard, high contrast”
  • mood word: “cozy”, “clinical”, “tense”, “hopeful”

Slot 6 — Style/genre (guardrails only)

Style is useful when it constrains—dangerous when it replaces.

Good:

  • “Modern product ad, realistic textures, clean composition.”

Risky:

  • long stacks of competing genres (“cyberpunk anime noir watercolor surreal glitchcore”)

Segmind’s simpler template [Scene Description], [Style], [Motion] works because it separates jobs. (https://blog.segmind.com/best-text-to-video-prompts-for-kling-ai-with-examples)

Optional add-ons that actually help

Micro‑timing beats (when you need a mini-story)

If you need an internal arc inside one clip, add 2–3 beats.

This matches broader cross‑model advice that well‑organized prompts with clear sections perform better than one blob of text. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)

Audio line (when supported)

Wavespeed notes Sora 2 generates audio natively and can sync requested sound elements to visuals. (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/)

Veo3Gen similarly supports native, synchronized audio in a single pass—so you can request:

  • ambience (“quiet café room tone”)
  • synced SFX (“umbrella snap exactly as it opens”)
  • music (“soft minimalist beat, low volume”)

Keep it short. One or two sound elements is usually enough.

Worked example: messy paragraph → card → one-slot iteration

Before (hard to debug)

“Make a cinematic video of a young woman in a coffee shop promoting our new matcha latte, she’s happy and it feels premium and modern, show the drink, cool camera movement, nice lighting, social ad vibe.”

What’s wrong:

  • Subject has no identifiers → identity drift risk
  • Action is abstract (“promoting”, “happy”)
  • Camera is non-instruction (“cool”)
  • Lighting/mood are vague (“nice”)

After: Filled 6‑Slot Prompt Card (Version 1)

[1] SUBJECT
- Young woman barista, short black bob haircut, navy apron, small silver hoop earrings.

[2] ACTION
- She pours bright green matcha latte into a clear glass; steam rises; she lifts the glass toward the camera and smiles.

[3] SETTING
- Minimal modern coffee shop counter, morning window light, stainless espresso machine in background.

[4] CAMERA
- Medium shot; slow push-in to emphasize the glass as it enters the foreground.

[5] LIGHTING + MOOD
- Soft natural daylight, clean highlights, calm premium mood.

[6] STYLE/GENRE
- Modern product ad, realistic textures, clean composition.

(Optional) AUDIO
- Quiet café ambience; milk pour sound synced to the pour.

NEGATIVES / AVOID
- Extra hands; warped glass; unreadable labels.

Suppose the result is good… except the camera is static

Don’t rewrite the whole prompt. Change Slot 4 only.

Version 2 (only Slot 4 changed)

[4] CAMERA
- Start wide shot; then a gentle tracking shot left as she lifts the glass, keeping the glass centered in frame.

Why this is a better test:

  • You kept identity/action/setting constant.
  • You made the camera instruction more explicit (start framing + one move).
  • “Keeping the glass centered” gives the move a purpose.

Mini table: symptom → change → what you learn

Symptom Change only What it tells you
Camera ignored Slot 4 Whether camera language is being followed without disturbing content
Subject drifts Slot 1 Whether identifiers are strong enough
Action feels static Slot 2 Whether motion is described as visible change
Vibe wrong Slot 5 Whether look constraints are sufficient

Troubleshooting map: symptom → likely slot → exact rewrite

Subject changes (face/wardrobe/object shape)

  • Likely slot: Slot 1
  • Rewrite: “Same [type], [distinct marker], wearing [wardrobe/material], [one unique detail].”

Action barely happens

Setting becomes generic or shifts

  • Likely slot: Slot 3
  • Rewrite: “Location + anchors: [surface], [background element], [time-of-day/light source].”

Camera movement doesn’t show up

Too dark/flat/wrong palette

Style takes over and changes the scene

  • Likely slot: Slot 6
  • Rewrite: delete competing style tags; keep one guardrail line.

Checklist

  • Slot 1: identity only, 2–4 identifiers (no action/mood).
  • Slot 2: visible verbs, start → end change.
  • Slot 3: location + 2–3 anchors, not a prop list.
  • Slot 4: one framing + one move + purpose (no stacking).
  • Slot 5: light source/time + contrast + one mood word.
  • Slot 6: style as guardrail, not a takeover.
  • Iterate by changing one slot per regeneration.
  • If you need sound, add a short Audio line (ambience + 1 synced SFX).

FAQ

How do I write a text to video prompt structure that stays consistent across clips?

Use a fixed template (the 6 slots). Keep Slots 1–3 stable (identity, action, setting), then test Slots 4–6 (camera/look/style) one at a time.

What’s Kling’s official prompt structure, in plain terms?

Kling’s quickstart formula is Subject (description) + Subject Movement + Scene (description), with Camera language + Lighting + Atmosphere as optional additions. (https://kling.ai/quickstart/text-to-video-prompt-guide)

How do I stop subject drift in AI video prompts?

Strengthen Slot 1 with concrete identifiers (hair/wardrobe/material + a unique marker). Then rerun changing only Slot 1 so you can see if the fix worked.

How do I get camera movement to actually happen?

Use camera terms explicitly (close-up, wide shot, slow push‑in, pan, tilt, tracking) and give one move with a purpose—these are the kinds of camera directions Kling lists. (https://kling.ai/blog/kling-ai-prompt-guide)

How do I make the action look dynamic instead of static?

Rewrite Slot 2 using motion cues you can see (drifting, bending, swirling, stepping), not abstract intent—Kling specifically recommends visible motion cues. (https://kling.ai/blog/kling-ai-prompt-guide)

How do I add audio directions to my prompt?

Add an Audio line with ambience + one synced sound (“umbrella snap as it opens”). Some models support native audio (https://wavespeed.ai/blog/posts/sora-2-prompting-tips-better-videos-2026/), and Veo3Gen generations include native synchronized audio in one pass.

Create faster iterations in Veo3Gen (closing CTA)

Once your prompts are structured, the bottleneck becomes iteration: generating variants, keeping what works, and only changing what you meant to change.

Veo3Gen is designed for that loop: it’s an affordable way to access Google’s Veo 3.1 video models without Google’s enterprise pricing, offers three modes (Lite/Fast/Quality), supports 720p/1080p/4K (4K on Fast/Quality) in 16:9 or 9:16, and includes native synchronized audio in a single pass.

If you want to put the 6‑Slot Prompt Card into production, start in Veo3Gen with the free credits for initial tests, then scale with pay‑as‑you‑go credits (which don’t expire) or an optional monthly plan when you’re ready.

Start creating with Veo3Gen

Veo3Gen gives you affordable Veo 3.1 video generation with native audio, up to 4K, and credits that never expire — with free credits to start.

Limited Time Offer

Try Veo 3 & Veo 3 API for Free

Experience cinematic AI video generation at the industry's lowest price point. No credit card required to start.