Skip to content

How Quality Is Handled

The pipeline does a lot of quality work for you. Before you reach for the prompt-tuning skill, it's worth understanding what's already happening behind the scenes — so you don't waste time fixing something the system was about to fix anyway.

The three automatic checks

Every generation pass goes through three filters before you ever see a candidate:

flowchart LR
    A[Your workflow] --> B[Pre-gen<br/>sanity check]
    B --> C[Generation Runner]
    C --> D[Auto-QA<br/>visual review]
    D --> E[Auto-rerun<br/>up to 3x]
    E --> F[You see<br/>the candidates]

1. Pre-generation sanity check

Before the runner even starts spending budget, it scans the .nbflow file for schema bugs. It catches:

  • Generation nodes with the wrong model set
  • Reference image links that don't actually reach the generation node
  • Dynamic prompts whose placeholders don't match what the template expects
  • Schema-level errors that would silently corrupt outputs

If it finds something, the run aborts with a clear error. You'll see a message telling you which node is broken. You then tell Claude what was reported and Claude fixes it.

You don't run this check yourself. The Runner does it for every pass.

2. Auto-QA visual review

After every image generation, the Generation Runner visually reviews each candidate for the classic AI tells:

  • Extra fingers / wrong number of fingers
  • Two left hands or two right hands
  • Melted faces, garbled features
  • Impossible limb anatomy
  • Garbled wristwatch faces, hands going through objects

If a candidate has any of these, it gets flagged for regeneration automatically. You don't have to spot it yourself.

What the auto-QA doesn't check: prompt adherence (did the model put the avatar in the right setting?), hallucinated text (signage, price tags), or stylistic match. Those are judgment calls — they're what you're there for when you review the picks.

3. Auto-rerun up to 3 attempts

Flagged candidates get re-run automatically. The Runner regenerates the affected scene up to 3 times by default, swapping in fresh seeds. If 3 attempts don't produce a clean candidate, the scene is still saved (so you can see what happened), but flagged for your attention.

You can bump the attempt count for stubborn scenes — tell Claude "up to 10 attempts on Scene 03" if you want to give the system more chances before stepping in yourself.

When the system is enough

For most workflows on the canonical pipeline (Salvora avatars, Lvl ½ variants of existing workflows), the three automatic checks are enough. You'll see candidate sets that:

  • Have no glaring AI tells (auto-QA filtered them out)
  • Match the workflow's overall composition (the prompts were structured by Claude correctly)
  • Need only a pick from you — "give me the strongest take"

In this case: don't iterate, don't tune, don't second-guess. Just pick the strongest candidate and move on.

When you need to step in

The system can't fix everything. You'll need to act when:

Symptom What it means Where to go
All 4 candidates have the same compositional issue (lighting off, wardrobe wrong) Prompt is structurally producing this — not a regen problem Prompt tuning
The scene "should" look different but you can't articulate why The brief is ambiguous, or the reference is wrong Adjust the reference image first; if still off, prompt tuning
Candidate set has AI tells the auto-QA didn't catch Rare — auto-QA gets ~85% of tells Tell Claude "regen Scene N with 10 attempts"
You want a meaningful composition / setting / camera change Not a quality issue — it's a variant Lvl 3 variants
Background hallucinations (garbled signs, fake price tags) Known NB2 limitation — text in backgrounds is unreliable Ignore. See memory note

The pattern: if the issue is across all candidates AND the prompt could be edited to fix it, that's prompt-tuning territory. If it's a bigger structural change, that's a Lvl 3 variant. If it's a one-candidate-of-four issue, just rerun.

What you don't need to do

  • Read the underlying prompt JSON. Claude handles it.
  • Check the model identifier on each gen node. The sanity check does that.
  • Validate the link graph by hand. The runner walks it.
  • Memorize the AI-tells taxonomy. The auto-QA scans for them.
  • Apply prompt-construction rules ("describe only in frame", "no hyphens in NB prompts", etc.). The Image Prompter applies them.

If you find yourself reading or editing prompt JSON directly, you're working at the wrong level. Stop, describe what's wrong in plain English, and ask Claude to fix it.

When you're ready

Next: Prompt tuning — the skill that catches the issues the system can't fix automatically.