Skip to content

Tool & Model Reference

Reference card for every AI model in the pipeline — what it does, what it accepts, how to prompt it, where it fails. Read once when onboarding, then return as a lookup.

Image generation

NanoBanana 2

The default image generation model. Used for every image gen in every workflow.

Field Value
Model identifier nano_banana_2
Access G-Labs API (POST /api/image/generate)
PatchWork node nanobanana/NanobananaAPI with properties.model = "nano_banana_2"
Inputs Prompt text (string), up to 2 reference images (URLs)
Output count 4 candidate images per run (outputCount: 4)
Aspect ratio 9:16 vertical default for short-form
Output format PNG, served from R2 via public URLs

Always set the model field. If model is missing or unset, G-Labs defaults to NanoBanana 1, which is structurally different and produces worse results. The pre-generation sanity check catches this.

Prompting NanoBanana 2

Best practices:

  • Open with a framing declaration (e.g. "Eye-level frontal selfie, chest to shoulder up") — this is a hard boundary for everything else
  • Reference images do the heavy lifting on identity. When a character ref is wired in, don't repeat gender / age / face / hair / ethnicity in the prompt — those tokens consume attention budget the model should spend on the scene
  • Describe only what's in frame. See Image Prompt Rules
  • Keep templates minimal. When wiring through a template, the template should be {scene} alone, not anchored with camera/lighting/aesthetic boilerplate — anchors dilute reference adherence
  • No hyphens. Use spaced or unhyphenated alternatives in prompts

Known failure modes

Failure Cause Fix
Generic stock-style output model field unset, fell back to NB1 Set model: "nano_banana_2"
Output unrelated to prompt Dynamic variableName doesn't match template placeholder Set variableName to short lowercase identifier matching placeholder
Avatar's face wrong Reference image link broken (stale per-node refs or Cached Media bug) Run resync_link_refs(tab) and verify in PatchWork UI
Extra fingers / melted face / garbled hands Normal AI tells Generation Runner auto-reruns up to 3 attempts; bump to 10 for stubborn cases
Hallucinated text (signage, labels) NanoBanana routinely produces garbled text Ignore — low priority
Cropping wrong Framing declaration missing or contradicted by out-of-frame mentions Follow in-frame rule

When to consider alternatives

You generally won't. NanoBanana 2 is the default and produces the best results across our use cases. The only reasons to switch:

  • An explicit comparison run between models (rare — usually run by the model researcher, not in production)
  • A specific composition that NB2 keeps failing on after 10 regen attempts — at that point, consider whether the prompt is the problem before swapping models

Video generation

Veo 3.1 (fast preview)

The default video model. Used for speaking scenes and most B-roll.

Field Value
Model identifier veo-3.1-fast-generate-preview
Access G-Labs API
PatchWork node nanobanana/Veo3 with properties.model set to the identifier
Inputs Prompt text, start frame image, optional end frame image
Output count 4 candidate clips per run (outputCount: 4)
Clip length ~8 seconds per clip
Frame mode Frames-to-video — interpolates between start and end frames
Resolution 1080p or 720p (set via properties.resolution)
Output format MP4, served from R2 via public URLs

Veo3 node requires 3 input slots

Veo3 nodes need all 3 input slots present even when end-frame isn't wired:

"inputs": [
    {"name": "prompt", "type": "string", "link": null},
    {"name": "start frame", "type": "*", "link": null},
    {"name": "end frame", "type": "*", "link": null}
]

Slot names have spaces, not underscores. Missing the third slot causes r.trim is not a function errors on import.

Prompting Veo 3.1

Best practices:

  • Macro motion only — Veo automatically adds natural micro motion (blinking, breathing, micro expressions). Prompting for micro motion adds noise. See Video Prompt Rules
  • Never prompt location changes without an explicit end frame — frames-to-video interpolation will hallucinate the transition badly
  • Speaking scenes use the universal talking head template — don't override the delivery instructions; the template is tuned
  • B-roll is natural language, no dialogue, ambient audio only
  • Low-movement scenes use the same image as start AND end frame to prevent drift
  • negativePrompt must be a string (empty "" is fine, undefined crashes)

Known failure modes

Failure Cause Fix
Avatar's face drifts mid-clip Start and end frames too different, OR scene needs more than 8 seconds Use same image as start AND end, OR split into shorter motion
Clip looks glitchy with hallucinated transitions Location change without end frame Split into two scenes, or add explicit end frame
Distant background animals glitching Animals in distant background of start frame Regenerate start frame without animals — see Image Prompt Rules
POV clip shows third-person body Prompt described subject's body orientation Rewrite per POV rule
Voice / mouth sync off Using a custom delivery override instead of the universal template Switch to universal template
Import error r.trim is not a function Veo3 missing negativePrompt (undefined) or missing 3rd input slot Fill negativePrompt: "" and add end frame slot with link: null

Seedance 2.0

Alternative video model. Used selectively when Seedance produces a better result for a specific style (rare). Accessed manually via the Jimeng web UI — no API integration yet.

Field Value
Access Jimeng web UI (manual, no API)
Inputs Prompt text, start frame image
Negative prompts Not supported — positive constraints only
Output Single clip per run
Format MP4 download from the web UI

Prompting Seedance 2.0

  • Positive constraints only. Restate what you want, don't negate what you don't want
  • Same macro-motion rule applies as with Veo
  • Ambient audio is set in the UI, not via prompt
  • Manual workflow — no .nbflow integration; the Seedance Prompter agent produces the prompt text, you paste it into Jimeng yourself

When to use Seedance over Veo

In practice, almost never. Use cases:

  • A specific stylization that Veo can't match (rare)
  • A comparison run for a particularly tricky scene
  • When the G-Labs / Veo path is down and you need a stopgap

Default to Veo 3.1 unless there's a concrete reason to switch.

The G-Labs backend

Not a model — but it's the layer that proxies every model call. Worth knowing the interface.

Field Value
Local port 8765
Public access via cloudflared tunnel — see G-Labs Setup
Image generation endpoint POST /api/image/generate
Video generation endpoint POST /api/video/generate
Health check GET /health (returns 200 OK when running)
Authentication API key in request header

Both the Generation Runner (via --server flag) and the PatchWork web app (via settings panel) need the tunnel URL.

R2 storage

Cloudflare R2 — object storage where all reference images and generation outputs live.

Field Value
Upload mechanism Worker proxy (https://patchwork-r2-upload.patchworkstudio.workers.dev)
URL format https://pub-....r2.dev/<path>.png (public CDN URLs)
Authentication None on the operator side — the Worker proxy handles it
Used by Generation Runner (auto-upload of reference images and outputs), avatar reference registry

The Generation Runner uploads any local reference image paths to R2 automatically before calling G-Labs. You don't need to handle this manually.

Quick lookup — which tool when

Task Tool
Generate an image NanoBanana 2 (always)
Generate a speaking scene clip Veo 3.1
Generate a B-roll clip Veo 3.1 (natural language prompt, no dialogue)
Generate a clip in a specific style Veo can't hit Seedance 2.0 (manual via Jimeng UI)
Upload a reference image R2 via the Worker (handled automatically by Generation Runner)
Run a workflow headlessly Generation Runner (workflow-runner.js)
Run a workflow visually in a browser Playwright + PatchWork web app
Validate an .nbflow before generating Pre-generation sanity check

Costs and rate limits

(To be filled in — track per-model costs and concurrency caps here. Currently the operational defaults: Generation Runner concurrency 3 default, 5 cap; 4 candidates per gen node; ~3 retries per node before giving up.)