Skip to content

Image Prompt Rules

The rules every image prompt must follow. Apply these regardless of workflow type, variant level, or scene composition. Most of these exist because we caught a failure mode in production and locked it into the standards.

The framing rule (read this first)

Every image prompt opens with a framing declaration. That declaration is a hard boundary for everything else in the prompt.

Example openers:

  • "Eye-level frontal selfie, chest to shoulder up"
  • "First person POV from the subject's own eye view"
  • "Wide shot, full body, three-quarter turn"

Once you've declared the framing, everything that follows must be inside that frame. If a wardrobe item, prop, accessory, or environment element won't be visible at the declared crop — omit it entirely.

The in-frame rule

Only describe what the camera will see. Walk every element — wardrobe, props, accessories, background — and ask:

Is this visible in the final frame?

If no: omit it entirely. Do not list it, do not negate it, do not describe it.

What this rules out

  • Pants, belts, shoes on a chest-up selfie
  • Background details behind the subject when the frame is a tight close-up
  • Accessories on body parts that never enter frame
  • Anything the character reference sheet shows that won't be in the final shot

Why

NanoBanana and Veo try to include whatever the prompt describes. If you mention pants on a chest-up crop, the model either:

  1. Widens the crop to include them (violating the framing declaration)
  2. Invents them as hallucinated detail
  3. Produces a contradiction ("chest up" vs "slacks visible at bottom edge" — the model picks one to break)

Out-of-frame mentions also cause --no exclusion list bloat. If you mention pants and then negate them, the prompt now contains "pants" twice — once as positive, once as negative. The model spends attention budget on something it shouldn't render at all.

How to apply

  • Start every prompt with the framing declaration
  • Positive prompt: describe only in-frame elements
  • --no list: only carry protective negations against crop extension ("no full body shot, no waist down visible"), not item-level negations against things you never mentioned positively
  • For modification prompts (end-frame / bookend style): the base image already establishes what's in frame; only describe the modification, and only if it's visible

The POV rule

In a first-person POV shot the camera IS the subject's eye. You cannot describe the subject's:

  • Face
  • Body orientation
  • Shoulders
  • "Three-quarter turn toward X"

…from inside their own view. Those sentences only make sense from an external camera. Writing them in a POV prompt silently tells the generator to flip to third-person and render a visible subject body.

POV prompts should describe

What to include What to omit
Vantage language ("First person POV from the subject's own eye view") Subject's face, body angle, head position
What extended body parts reach into frame (usually one hand and forearm holding the product) Subject's shoulders, torso, hair
What background / environment is visible Anything that requires looking at the subject
Explicit "no body / no face / no torso / no shoulder / no head" line
--no list with third person view, external camera, camera facing the subject, selfie of the subject

The smell test

If the prompt describes the subject's body orientation at all, it is NOT a POV prompt. It's third-person framed as POV — which is the failure mode.

Avatar description when a character reference image is connected

The character reference sheet only locks the face. Body, gender, age, build, ethnicity, and hair are NOT locked by the reference.

Do

  • Specify what the character is wearing
  • Specify their build (slim / muscular / athletic / heavier) only if relevant to wardrobe fit

Do NOT

  • Repeat gender, age, face details, hair, or ethnicity in the prompt when a ref image is wired in. The ref image already carries that information, and re-stating it consumes attention budget that should go to the scene.

Keep avatar descriptions as general as possible. The ref image does the heavy lifting on identity.

Background motion rule (applies to image gens that become Veo start frames)

Avoid distant-background elements that move autonomously. Image gens become Veo3 start/end frames, and Veo's interpolation breaks on chaotic distant motion — you get extra fingers / glitching / morphing in the deep background.

Don't put in distant background

  • Animals (cows, cattle, horses, livestock, pets, wildlife) — they shift unpredictably between frames
  • Flocks of birds, swarms of insects
  • Crowds of people in tight motion
  • Moving water with strong currents (rapids, breaking waves)

OK in distant background

  • Humans (one or two, walking or standing — finite, predictable motion)
  • Vehicles (cars on a road, parked or moving slowly)
  • Static elements (trees, fences, buildings, signs)
  • Calm water, sky, fields of grass without animals

Why

Veo's frames-to-video mode interpolates start → end frame. Distant animals are small enough that any drift in their position between frames produces glitchy artifacts. Humans and vehicles are predictable enough that the model handles them. Static elements have no motion to track.

How to apply

For every image prompt that Veo3 will animate — strip livestock and wildlife from the background and replace with static or human/vehicle alternatives.

Audio is different. If the script's dialogue mentions livestock ("we raise our own livestock"), keep it in audio. Visual background should be a fence, pasture, or static farm structure — no animals.

Text in generated images is low priority

Don't spend cycles fixing hallucinated text in generated images — store signage, price tags, fake labels, etc. The model will routinely produce garbled text. Ignore it.

What actually matters in priority order:

  1. Identity — is the face right
  2. Composition — is the camera angle / framing correct
  3. Wardrobe — is the clothing accurate
  4. Pose — is the body positioning right
  5. Setting — is the environment correct

If those five are working, ship it. Don't iterate on hallucinated background text.

Quick checklist before submitting an image prompt

  • Framing declaration is the first thing in the prompt
  • Every described element is inside the declared frame
  • No mention of items off-screen (no negating things you didn't mention)
  • If POV: no body / face / torso descriptions; vantage and arm-in-frame only
  • If character ref is wired: no gender / age / face / hair / ethnicity in the prompt
  • No distant-background animals or chaotic motion
  • Hyphens removed (use spaced or unhyphenated alternatives)