Skip to content

The Testing Phase

When you start a Lvl 3 or Lvl 4 variant, it enters a testing phase under the next major's -0-N suffix. You stay in testing until the variant proves itself, then graduate to the new major version.

What the testing phase looks like

flowchart LR
    A[V1 approved] --> B[V2-0-1<br/>first build]
    B --> C{Looks good?}
    C -->|no| D[Iterate]
    D --> E[V2-0-2]
    E --> F{Looks good?}
    F -->|no| G[More iteration]
    G --> H[V2-0-N]
    H --> I{Approved?}
    I -->|yes| J[V2 in approved/]
    F -->|yes| I
    C -->|yes| I

Each iteration bumps the -0-N suffix. The previous file moves to backups/. The new file lands in testing/. No upper bound — you can be at V2-0-12 if that's what it takes.

Why testing matters

Lvl 3-4 variants change things you can't reliably predict the outcome of:

  • New environment lighting might wash out the avatar
  • New camera angle might not flatter the subject
  • Split scenes might pace weirdly
  • New prehooks might feel disconnected from the body

You need to see the output before approving. Lvl 1-2 variants are predictable enough to skip this; Lvl 3-4 aren't.

What you do during testing

Every iteration follows the same loop:

  1. Run the Generation Runner on the current V{N+1}-0-N.nbflow
  2. Review in PatchWork — look at every scene's candidates
  3. Identify what's working and what isn't
  4. Make targeted changes to fix the issues
  5. Bump the version (-0-N-0-N+1)
  6. Repeat until clean

Common testing-phase issues

What you'll typically encounter and how to handle each:

Issue Fix
AI tells in 1-2 scenes (extra fingers etc.) Let the Generation Runner's auto-rerun handle it (3 attempts default; bump to 10 for stubborn ones)
Lighting wrong in multiple scenes Edit the image prompts to be more explicit about lighting; bump V0-N; rerun
Composition slightly off from intent Use the prompt-tuning skill on the stubborn scenes
Pose / motion feels stiff Edit the Veo prompt's motion description (macro motion, see Video Prompt Rules)
One scene structurally won't work Reconsider the storyboard — maybe split the scene, change the framing, or drop it
Avatar's face drifting Lock the avatar reference; if already wired, check the per-node link refs are clean

Iterating efficiently

Don't bump V0-N for every tiny change. Batch related changes:

Bad pattern
Fix Scene 01 → bump → fix Scene 02 → bump → fix Scene 03 → bump...
Good pattern
Identify all issues across all scenes → edit all the prompts → bump once → rerun

The Generation Runner reuses cached output for unchanged nodes, so a single rerun after batching is cheap.

Cost during testing

Testing iterations burn budget. Strategies to keep cost low:

  • Use mode=images for prompt validation: image-only passes catch most issues cheaply. Only commit to full video gen on iterations you think are close
  • Re-roll seeds before editing prompts: cheaper than editing-and-regenerating; sometimes a fresh seed is what you needed
  • Test on one account first: don't testing-phase across 5 accounts in parallel. Get the test account clean, then fan out
  • Cap aggressive regen attempts: 3 attempts default is fine for early iterations; bump only for stubborn scenes

See Chapter 13 — Workflow Optimization for the full cost playbook.

When to stay in testing vs. graduate

Graduate (drop the -0-N suffix, move to approved/) when:

  • 3 of 4 candidates per scene are acceptable
  • No AI tells the user is willing to accept
  • The user explicitly approves the variant

Stay in testing when:

  • Multiple scenes have issues that haven't been resolved
  • A scene needs prompt-tuning that hasn't been done yet
  • The user hasn't reviewed yet

You can be at -0-10 if the variant is genuinely hard. That's fine. The version registry's history list tells the story.

Multi-account testing

If you're going to fan out the variant to multiple accounts, the standard approach:

  1. Test on one account (typically the canonical test account)
  2. Iterate until clean on that account
  3. Graduate to V{N+1} in approved/
  4. Then fan out to the other accounts (this is its own bump, but the structure is locked)

Fan-out is part of testing too — see Chapter 7 — Fan-out Protocol. The fan-out itself is a meaningful change and bumps the version.

Documenting changes in the version registry

Every -0-N bump appends a history entry to the version registry. Write specific notes, not vague ones:

Good
"V2-0-2: Scene 03 lighting prompt edited to add soft fill from camera-front, Scene 05 reframed wider"
Bad
"V2-0-2: tweaks"

When you (or someone else) comes back to this workflow in 3 months, the history list is what tells you what was tried.

When you're ready

Next: Iterating with Prompt Tuning — using the prompt-tuning skill during testing for stubborn scenes that won't land with normal regens.