Skip to content

Cost Awareness

Generation costs add up fast. A careless workflow run can burn meaningful budget. This page is the playbook for spotting where budget goes and how to avoid wasting it.

Where the budget goes

flowchart TD
    A[Total workflow cost] --> B[Image gens<br/>cheaper]
    A --> C[Video gens<br/>more expensive]
    A --> D[Reruns<br/>each = full gen cost]
    A --> E[Prompt tuning<br/>multiple iterations]
    A --> F[Fan-out<br/>multiplies across accounts]
Operation Relative cost
NanoBanana 2 image gen (4 candidates) Low — fastest, cheapest
Veo 3.1 video gen (4 candidates, 8s each) Significant — usually 5-15× the image cost
Image rerun Full image gen cost (same as initial)
Video rerun Full video gen cost
Prompt-tuning iteration One full image gen per iteration
Multi-account fan-out All of the above × N accounts

The order of magnitude

For one 8-scene workflow:

  • 8 image gens (one per scene) ≈ baseline cost
  • 8 video gens (one per scene) ≈ ~10× baseline (video is the expensive part)
  • 2-3 reruns (typical for early iterations) ≈ ~3× baseline
  • Auto-rerun attempts (the Generation Runner's QA loop) ≈ ~1-2× baseline

One full pass of an 8-scene workflow ≈ ~15-18× the baseline image cost.

A 5-account fan-out = 5× that.

A workflow that needs 5 testing iterations = ~80× baseline before V1 graduates.

The math compounds quickly. Optimization isn't optional at production scale.

Strategy 1: mode=images for cheap validation

The single biggest cost lever. Image gens are 5-15× cheaper than video gens.

When you run a workflow for the first time or after significant prompt changes:

You: run XYZG3-V0-1 in images-only mode first. I want to see the stills
     before committing to video gen budget.

Claude: [invokes Generation Runner with mode=images flag]
        [returns 4 candidates per scene — image gens only, no Veo]

  Image-only pass complete. Check the stills in PatchWork. If they
  look right, run the full pass next.

You see all the stills cheaply. If the avatar's face is wrong, the lighting's off, or the composition isn't what you wanted — fix the prompts BEFORE paying for video gen.

This catches 90% of issues at 10% of the cost.

Strategy 2: B-roll density's cost impact

B-roll density directly affects cost:

Density B-roll clips Extra cost
None 0 None
Low 2-3 +2-3 video gens
Medium One per visually-reinforceable scene +4-6 video gens
High One per speaking scene +8 video gens (doubles video cost)

For an 8-scene workflow: - Low: ~12% cost increase - Medium: ~30% - High: ~100% (doubles total video gen cost)

Don't pick High density "just in case." Pick it when the source video genuinely has B-roll throughout, or the format demands it. Otherwise Medium or Low.

Strategy 3: Re-roll seeds before editing prompts

When candidates don't land:

Cheap path: re-roll the seed
Same prompt, fresh randomness. Costs one rerun. About 60% of the time, this produces a candidate that works.
Expensive path: edit prompt and rerun
New prompt, new gen. Same cost as re-roll, but if the edit doesn't fix the issue, you've now made multiple edits and run multiple rerolls.

Try the cheap path first. If 2-3 re-rolls don't help, then edit the prompt.

Strategy 4: Skip aggressive regen attempts when not needed

The Generation Runner's default regen-attempts cap is 3 per flagged image. For most workflows this is right.

But: if you bump it to 10 (because some scene has stubborn AI tells), each rerun costs a full image gen. 10 attempts × N flagged images can multiply quickly.

When to use 10+ attempts:

  • A specific scene has consistently bad hands and you want maximum chances
  • The workflow is high-value and you want polish
  • You've already tried prompt-tuning and the issue is anatomy variance

When to keep at 3:

  • Early testing iterations (you're still learning what works)
  • The flagged scenes aren't critical to the final video
  • Budget-constrained pass

Strategy 5: Targeted single-scene reruns

Instead of rerunning the whole workflow when 2 scenes are off:

You: rerun only Scenes 03 and 05 — re-rolls. Don't regenerate the
     scenes I already approved.

Claude: [per-node rerun via regen-nodes.py for nodes 14 and 23]
        [4 new candidates per scene; other scenes use cached output]

Per-node rerun saves 75% of cost vs. full workflow rerun on an 8-scene workflow.

Strategy 6: Test on one account before fanning out

Fan-out multiplies cost by the number of accounts. For a 5-account workflow at 18× baseline per account = 90× baseline total.

If the workflow has prompt issues, you spend that 90× and then discover the issues. Better:

  1. Run on one test account first — 18× baseline
  2. Iterate to a clean V0-N (maybe 36-54× baseline total across iterations)
  3. Then fan out — 4 more accounts × 18× = 72× more, total ~108-126×

vs. fanning out first then discovering issues: 90× + (fix iterations × 5 accounts) = 100s of baseline.

Test-first is 30-50% cheaper at the scale of a multi-iteration workflow.

Strategy 7: Cache discipline

The Generation Runner caches outputs aggressively. Wiping the cache unnecessarily costs you cached work.

Don't wipe cache when: - You changed nothing meaningful since the last run (cache is still valid) - You just want a re-roll on one node (use per-node rerun instead) - You're "just being safe" (the cache is reliable — trust it)

Do wipe cache when: - A reference image URL changed - The model identifier changed (NB1 → NB2 migration) - You want a completely fresh pass for comparison purposes - The cache is genuinely stale (-generated.nbflow is from a previous version of the workflow)

See Cache and Re-generation.

Tracking budget per workflow

For high-volume teams, tracking cost per workflow matters. Conventions:

  • Note the gen count in the version registry history (V0-1: 16 images + 16 videos, V0-2: 8 image regens + 0 video, etc.)
  • Watch the cumulative cost when a workflow is iterating beyond V0-5
  • Set a soft budget cap per workflow ("if we're past V0-8 and still not clean, escalate")

When to escalate:

  • A workflow that should be cheap (Lvl 1 variant) has cost more than the original V1
  • A testing phase has burned more iterations than usual
  • Prompt-tuning passes are converging slowly

Sometimes the right answer is kill the workflow and start over with a cleaner approach. Sunk cost shouldn't determine forward investment.

When you're ready

Next: Pipeline Limits — concurrency caps, rate limits, and where you'll bump into them.