The Testing Phase¶
When you start a Lvl 3 or Lvl 4 variant, it enters a testing phase under the next major's -0-N suffix. You stay in testing until the variant proves itself, then graduate to the new major version.
What the testing phase looks like¶
flowchart LR
A[V1 approved] --> B[V2-0-1<br/>first build]
B --> C{Looks good?}
C -->|no| D[Iterate]
D --> E[V2-0-2]
E --> F{Looks good?}
F -->|no| G[More iteration]
G --> H[V2-0-N]
H --> I{Approved?}
I -->|yes| J[V2 in approved/]
F -->|yes| I
C -->|yes| I
Each iteration bumps the -0-N suffix. The previous file moves to backups/. The new file lands in testing/. No upper bound — you can be at V2-0-12 if that's what it takes.
Why testing matters¶
Lvl 3-4 variants change things you can't reliably predict the outcome of:
- New environment lighting might wash out the avatar
- New camera angle might not flatter the subject
- Split scenes might pace weirdly
- New prehooks might feel disconnected from the body
You need to see the output before approving. Lvl 1-2 variants are predictable enough to skip this; Lvl 3-4 aren't.
What you do during testing¶
Every iteration follows the same loop:
- Run the Generation Runner on the current
V{N+1}-0-N.nbflow - Review in PatchWork — look at every scene's candidates
- Identify what's working and what isn't
- Make targeted changes to fix the issues
- Bump the version (
-0-N→-0-N+1) - Repeat until clean
Common testing-phase issues¶
What you'll typically encounter and how to handle each:
| Issue | Fix |
|---|---|
| AI tells in 1-2 scenes (extra fingers etc.) | Let the Generation Runner's auto-rerun handle it (3 attempts default; bump to 10 for stubborn ones) |
| Lighting wrong in multiple scenes | Edit the image prompts to be more explicit about lighting; bump V0-N; rerun |
| Composition slightly off from intent | Use the prompt-tuning skill on the stubborn scenes |
| Pose / motion feels stiff | Edit the Veo prompt's motion description (macro motion, see Video Prompt Rules) |
| One scene structurally won't work | Reconsider the storyboard — maybe split the scene, change the framing, or drop it |
| Avatar's face drifting | Lock the avatar reference; if already wired, check the per-node link refs are clean |
Iterating efficiently¶
Don't bump V0-N for every tiny change. Batch related changes:
Bad pattern- Fix Scene 01 → bump → fix Scene 02 → bump → fix Scene 03 → bump...
Good pattern- Identify all issues across all scenes → edit all the prompts → bump once → rerun
The Generation Runner reuses cached output for unchanged nodes, so a single rerun after batching is cheap.
Cost during testing¶
Testing iterations burn budget. Strategies to keep cost low:
- Use
mode=imagesfor prompt validation: image-only passes catch most issues cheaply. Only commit to full video gen on iterations you think are close - Re-roll seeds before editing prompts: cheaper than editing-and-regenerating; sometimes a fresh seed is what you needed
- Test on one account first: don't testing-phase across 5 accounts in parallel. Get the test account clean, then fan out
- Cap aggressive regen attempts: 3 attempts default is fine for early iterations; bump only for stubborn scenes
See Chapter 12 — Workflow Optimization for the full cost playbook.
When to stay in testing vs. graduate¶
Graduate (drop the -0-N suffix, move to approved/) when:
- 3 of 4 candidates per scene are acceptable
- No AI tells the user is willing to accept
- The user explicitly approves the variant
Stay in testing when:
- Multiple scenes have issues that haven't been resolved
- A scene needs prompt-tuning that hasn't been done yet
- The user hasn't reviewed yet
You can be at -0-10 if the variant is genuinely hard. That's fine. The version registry's history list tells the story.
Multi-account testing¶
If you're going to fan out the variant to multiple accounts, the standard approach:
- Test on one account (typically the canonical test account)
- Iterate until clean on that account
- Graduate to V{N+1} in
approved/ - Then fan out to the other accounts (this is its own bump, but the structure is locked)
Fan-out is part of testing too — see Chapter 6 — Fan-out Protocol. The fan-out itself is a meaningful change and bumps the version.
Documenting changes in the version registry¶
Every -0-N bump appends a history entry to the version registry. Write specific notes, not vague ones:
Good- "V2-0-2: Scene 03 lighting prompt edited to add soft fill from camera-front, Scene 05 reframed wider"
Bad- "V2-0-2: tweaks"
When you (or someone else) comes back to this workflow in 3 months, the history list is what tells you what was tried.
When you're ready¶
→ Next: Iterating with Prompt Tuning — using the prompt-tuning skill during testing for stubborn scenes that won't land with normal regens.