Image Prompt Rules¶

The rules every image prompt must follow. Apply these regardless of workflow type, variant level, or scene composition. Most of these exist because we caught a failure mode in production and locked it into the standards.

The framing rule (read this first)¶

Every image prompt opens with a framing declaration. That declaration is a hard boundary for everything else in the prompt.

Example openers:

"Eye-level frontal selfie, chest to shoulder up"
"First person POV from the subject's own eye view"
"Wide shot, full body, three-quarter turn"

Once you've declared the framing, everything that follows must be inside that frame. If a wardrobe item, prop, accessory, or environment element won't be visible at the declared crop — omit it entirely.

The in-frame rule¶

Only describe what the camera will see. Walk every element — wardrobe, props, accessories, background — and ask:

Is this visible in the final frame?

If no: omit it entirely. Do not list it, do not negate it, do not describe it.

What this rules out¶

Pants, belts, shoes on a chest-up selfie
Background details behind the subject when the frame is a tight close-up
Accessories on body parts that never enter frame
Anything the character reference sheet shows that won't be in the final shot

Why¶

NanoBanana and Veo try to include whatever the prompt describes. If you mention pants on a chest-up crop, the model either:

Widens the crop to include them (violating the framing declaration)
Invents them as hallucinated detail
Produces a contradiction ("chest up" vs "slacks visible at bottom edge" — the model picks one to break)

Out-of-frame mentions also cause --no exclusion list bloat. If you mention pants and then negate them, the prompt now contains "pants" twice — once as positive, once as negative. The model spends attention budget on something it shouldn't render at all.

How to apply¶

Start every prompt with the framing declaration
Positive prompt: describe only in-frame elements
--no list: only carry protective negations against crop extension ("no full body shot, no waist down visible"), not item-level negations against things you never mentioned positively
For modification prompts (end-frame / bookend style): the base image already establishes what's in frame; only describe the modification, and only if it's visible

The POV rule¶

In a first-person POV shot the camera IS the subject's eye. You cannot describe the subject's:

Face
Body orientation
Shoulders
"Three-quarter turn toward X"

…from inside their own view. Those sentences only make sense from an external camera. Writing them in a POV prompt silently tells the generator to flip to third-person and render a visible subject body.

POV prompts should describe¶

What to include	What to omit
Vantage language ("First person POV from the subject's own eye view")	Subject's face, body angle, head position
What extended body parts reach into frame (usually one hand and forearm holding the product)	Subject's shoulders, torso, hair
What background / environment is visible	Anything that requires looking at the subject
Explicit "no body / no face / no torso / no shoulder / no head" line	—
`--no` list with `third person view, external camera, camera facing the subject, selfie of the subject`	—

The smell test¶

If the prompt describes the subject's body orientation at all, it is NOT a POV prompt. It's third-person framed as POV — which is the failure mode.

Avatar description when a character reference image is connected¶

The character reference sheet only locks the face. Body, gender, age, build, ethnicity, and hair are NOT locked by the reference.

Do¶

Specify what the character is wearing
Specify their build (slim / muscular / athletic / heavier) only if relevant to wardrobe fit

Do NOT¶

Repeat gender, age, face details, hair, or ethnicity in the prompt when a ref image is wired in. The ref image already carries that information, and re-stating it consumes attention budget that should go to the scene.

Keep avatar descriptions as general as possible. The ref image does the heavy lifting on identity.

Background motion rule (applies to image gens that become Veo start frames)¶

Avoid distant-background elements that move autonomously. Image gens become Veo3 start/end frames, and Veo's interpolation breaks on chaotic distant motion — you get extra fingers / glitching / morphing in the deep background.

Don't put in distant background¶

Animals (cows, cattle, horses, livestock, pets, wildlife) — they shift unpredictably between frames
Flocks of birds, swarms of insects
Crowds of people in tight motion
Moving water with strong currents (rapids, breaking waves)

OK in distant background¶

Humans (one or two, walking or standing — finite, predictable motion)
Vehicles (cars on a road, parked or moving slowly)
Static elements (trees, fences, buildings, signs)
Calm water, sky, fields of grass without animals

Why¶

Veo's frames-to-video mode interpolates start → end frame. Distant animals are small enough that any drift in their position between frames produces glitchy artifacts. Humans and vehicles are predictable enough that the model handles them. Static elements have no motion to track.

How to apply¶

For every image prompt that Veo3 will animate — strip livestock and wildlife from the background and replace with static or human/vehicle alternatives.

Audio is different. If the script's dialogue mentions livestock ("we raise our own livestock"), keep it in audio. Visual background should be a fence, pasture, or static farm structure — no animals.

Text in generated images is low priority¶

Don't spend cycles fixing hallucinated text in generated images — store signage, price tags, fake labels, etc. The model will routinely produce garbled text. Ignore it.

What actually matters in priority order:

Identity — is the face right
Composition — is the camera angle / framing correct
Wardrobe — is the clothing accurate
Pose — is the body positioning right
Setting — is the environment correct

If those five are working, ship it. Don't iterate on hallucinated background text.

Quick checklist before submitting an image prompt¶

Framing declaration is the first thing in the prompt
Every described element is inside the declared frame
No mention of items off-screen (no negating things you didn't mention)
If POV: no body / face / torso descriptions; vantage and arm-in-frame only
If character ref is wired: no gender / age / face / hair / ethnicity in the prompt
No distant-background animals or chaotic motion
Hyphens removed (use spaced or unhyphenated alternatives)