Tool & Model Reference¶

Reference card for every AI model in the pipeline — what it does, what it accepts, how to prompt it, where it fails. Read once when onboarding, then return as a lookup.

Image generation¶

NanoBanana 2¶

The default image generation model. Used for every image gen in every workflow.

Field	Value
Model identifier	`nano_banana_2`
Access	G-Labs API (`POST /api/image/generate`)
PatchWork node	`nanobanana/NanobananaAPI` with `properties.model = "nano_banana_2"`
Inputs	Prompt text (string), up to 2 reference images (URLs)
Output count	4 candidate images per run (`outputCount: 4`)
Aspect ratio	`9:16` vertical default for short-form
Output format	PNG, served from R2 via public URLs

Always set the model field. If model is missing or unset, G-Labs defaults to NanoBanana 1, which is structurally different and produces worse results. The pre-generation sanity check catches this.

Prompting NanoBanana 2¶

Best practices:

Open with a framing declaration (e.g. "Eye-level frontal selfie, chest to shoulder up") — this is a hard boundary for everything else
Reference images do the heavy lifting on identity. When a character ref is wired in, don't repeat gender / age / face / hair / ethnicity in the prompt — those tokens consume attention budget the model should spend on the scene
Describe only what's in frame. See Image Prompt Rules
Keep templates minimal. When wiring through a template, the template should be {scene} alone, not anchored with camera/lighting/aesthetic boilerplate — anchors dilute reference adherence
No hyphens. Use spaced or unhyphenated alternatives in prompts

Known failure modes¶

Failure	Cause	Fix
Generic stock-style output	`model` field unset, fell back to NB1	Set `model: "nano_banana_2"`
Output unrelated to prompt	Dynamic `variableName` doesn't match template placeholder	Set `variableName` to short lowercase identifier matching placeholder
Avatar's face wrong	Reference image link broken (stale per-node refs or Cached Media bug)	Run `resync_link_refs(tab)` and verify in PatchWork UI
Extra fingers / melted face / garbled hands	Normal AI tells	Generation Runner auto-reruns up to 3 attempts; bump to 10 for stubborn cases
Hallucinated text (signage, labels)	NanoBanana routinely produces garbled text	Ignore — low priority
Cropping wrong	Framing declaration missing or contradicted by out-of-frame mentions	Follow in-frame rule

When to consider alternatives¶

You generally won't. NanoBanana 2 is the default and produces the best results across our use cases. The only reasons to switch:

An explicit comparison run between models (rare — usually run by the model researcher, not in production)
A specific composition that NB2 keeps failing on after 10 regen attempts — at that point, consider whether the prompt is the problem before swapping models

Video generation¶

Veo 3.1 (fast preview)¶

The default video model. Used for speaking scenes and most B-roll.

Field	Value
Model identifier	`veo-3.1-fast-generate-preview`
Access	G-Labs API
PatchWork node	`nanobanana/Veo3` with `properties.model` set to the identifier
Inputs	Prompt text, start frame image, optional end frame image
Output count	4 candidate clips per run (`outputCount: 4`)
Clip length	~8 seconds per clip
Frame mode	Frames-to-video — interpolates between start and end frames
Resolution	`1080p` or `720p` (set via `properties.resolution`)
Output format	MP4, served from R2 via public URLs

Veo3 node requires 3 input slots¶

Veo3 nodes need all 3 input slots present even when end-frame isn't wired:

"inputs": [
    {"name": "prompt", "type": "string", "link": null},
    {"name": "start frame", "type": "*", "link": null},
    {"name": "end frame", "type": "*", "link": null}
]

Slot names have spaces, not underscores. Missing the third slot causes r.trim is not a function errors on import.

Prompting Veo 3.1¶

Best practices:

Macro motion only — Veo automatically adds natural micro motion (blinking, breathing, micro expressions). Prompting for micro motion adds noise. See Video Prompt Rules
Never prompt location changes without an explicit end frame — frames-to-video interpolation will hallucinate the transition badly
Speaking scenes use the universal talking head template — don't override the delivery instructions; the template is tuned
B-roll is natural language, no dialogue, ambient audio only
Low-movement scenes use the same image as start AND end frame to prevent drift
negativePrompt must be a string (empty "" is fine, undefined crashes)

Known failure modes¶

Failure	Cause	Fix
Avatar's face drifts mid-clip	Start and end frames too different, OR scene needs more than 8 seconds	Use same image as start AND end, OR split into shorter motion
Clip looks glitchy with hallucinated transitions	Location change without end frame	Split into two scenes, or add explicit end frame
Distant background animals glitching	Animals in distant background of start frame	Regenerate start frame without animals — see Image Prompt Rules
POV clip shows third-person body	Prompt described subject's body orientation	Rewrite per POV rule
Voice / mouth sync off	Using a custom delivery override instead of the universal template	Switch to universal template
Import error `r.trim is not a function`	Veo3 missing `negativePrompt` (undefined) or missing 3^rd input slot	Fill `negativePrompt: ""` and add `end frame` slot with `link: null`

Seedance 2.0¶

Alternative video model. Used selectively when Seedance produces a better result for a specific style (rare). Accessed manually via the Jimeng web UI — no API integration yet.

Field	Value
Access	Jimeng web UI (manual, no API)
Inputs	Prompt text, start frame image
Negative prompts	Not supported — positive constraints only
Output	Single clip per run
Format	MP4 download from the web UI

Prompting Seedance 2.0¶

Positive constraints only. Restate what you want, don't negate what you don't want
Same macro-motion rule applies as with Veo
Ambient audio is set in the UI, not via prompt
Manual workflow — no .nbflow integration; the Seedance Prompter agent produces the prompt text, you paste it into Jimeng yourself

When to use Seedance over Veo¶

In practice, almost never. Use cases:

A specific stylization that Veo can't match (rare)
A comparison run for a particularly tricky scene
When the G-Labs / Veo path is down and you need a stopgap

Default to Veo 3.1 unless there's a concrete reason to switch.

The G-Labs backend¶

Not a model — but it's the layer that proxies every model call. Worth knowing the interface.

Field	Value
Local port	`8765`
Public access	via cloudflared tunnel — see G-Labs Setup
Image generation endpoint	`POST /api/image/generate`
Video generation endpoint	`POST /api/video/generate`
Health check	`GET /health` (returns 200 OK when running)
Authentication	API key in request header

Both the Generation Runner (via --server flag) and the PatchWork web app (via settings panel) need the tunnel URL.

R2 storage¶

Cloudflare R2 — object storage where all reference images and generation outputs live.

Field	Value
Upload mechanism	Worker proxy (`https://patchwork-r2-upload.patchworkstudio.workers.dev`)
URL format	`https://pub-....r2.dev/<path>.png` (public CDN URLs)
Authentication	None on the operator side — the Worker proxy handles it
Used by	Generation Runner (auto-upload of reference images and outputs), avatar reference registry

The Generation Runner uploads any local reference image paths to R2 automatically before calling G-Labs. You don't need to handle this manually.

Quick lookup — which tool when¶

Task	Tool
Generate an image	NanoBanana 2 (always)
Generate a speaking scene clip	Veo 3.1
Generate a B-roll clip	Veo 3.1 (natural language prompt, no dialogue)
Generate a clip in a specific style Veo can't hit	Seedance 2.0 (manual via Jimeng UI)
Upload a reference image	R2 via the Worker (handled automatically by Generation Runner)
Run a workflow headlessly	Generation Runner (`workflow-runner.js`)
Run a workflow visually in a browser	Playwright + PatchWork web app
Validate an `.nbflow` before generating	Pre-generation sanity check

Costs and rate limits¶

(To be filled in — track per-model costs and concurrency caps here. Currently the operational defaults: Generation Runner concurrency 3 default, 5 cap; 4 candidates per gen node; ~3 retries per node before giving up.)