Runner Crash Recovery¶

When the Generation Runner crashes mid-pass — process killed, system rebooted, G-Labs died, network dropped — you need to recover without losing the work that already succeeded.

What you have after a crash¶

When the runner crashes mid-pass, here's the state:

Asset	State
The source `.nbflow` (input)	Unchanged. Intact.
Local generated PNGs / MP4s	Whatever was produced before the crash, saved to `Assets/{workflow}/generated-images/v0-N/` and `generated-videos/v0-N/`
R2 uploads	All successful uploads happened atomically; R2 has whatever finished
The `-generated.nbflow` (output)	Either: didn't get written (crash too early), partial / corrupted, or written successfully (crash AFTER export)
Cache state in the source `.nbflow`	Updated for any successful node, untouched for incomplete ones
`errors.json`	May or may not exist depending on when the crash happened

The good news: what succeeded is preserved. The bad news: you have to figure out what to redo.

Diagnostic flow¶

flowchart TD
    A[Runner crashed mid-pass]
    A --> B{Does -generated.nbflow exist?}
    B -->|yes, complete| C[Lucky — restart with full output]
    B -->|no or partial| D{Check local Assets folders}
    D --> E[Count generated PNGs/MP4s by scene]
    E --> F{All scenes covered?}
    F -->|yes| G[Re-run to finish exporting]
    F -->|no| H[Re-run with cache enabled]
    H --> I[Cached successful nodes reused;<br/>unfinished nodes regenerate]

Step 1: Check what got produced¶

Look at the local Assets folders:

ls "projects/may/xyz-wellness/Assets/XYZG3/generated-images/v0-1/"
# scene01.png, scene02.png, scene03.png, scene04.png, scene05.png

ls "projects/may/xyz-wellness/Assets/XYZG3/generated-videos/v0-1/"
# (empty or partial)

This tells you: images were done through Scene 05 before the crash. Video gen hadn't started (or hadn't finished any).

Step 2: Check `-generated.nbflow`¶

ls "projects/may/xyz-wellness/Generations/"
# XYZG3-V0-1-generated-20260512-143021.nbflow

If a -generated.nbflow exists, open it in PatchWork or inspect the JSON. Look at _savedImages and _savedClips fields on each gen node:

Populated = that node completed successfully; its R2 URL is baked in
Empty / null = that node didn't complete

Step 3: Re-run with cache enabled¶

The Generation Runner's cache should handle most recovery automatically:

You: re-run XYZG3-V0-1.nbflow. The previous run crashed mid-way; use
     the cache to avoid redoing what succeeded.

Claude: [invokes Generation Runner with normal cache behavior]
        [the runner reads the source .nbflow, sees which nodes have
         _savedImages populated, skips those, runs the rest]

What happens:

Nodes that completed before the crash → cached output is reused, no regen
Nodes that didn't complete → fresh generation
Total cost: only the cost of the unfinished work

You usually get a clean -generated.nbflow on the second pass with minimal re-cost.

Step 4: If cache state is corrupted¶

In rare cases, the crash leaves the source .nbflow in an inconsistent state — some nodes have _savedImages set to URLs that don't exist on R2, or partial data.

Symptoms:

The runner tries to "reuse" output but PatchWork shows a broken image
R2 URLs in the file return 404
The runner finishes but the output is mixed valid / invalid

Recovery:

Restore the source .nbflow from the backups/ folder (the version before this run)
The restored file has no cache; everything regenerates from scratch
Cost is back to full, but you're guaranteed clean output

When the runner's cache is suspect, restore and re-run is faster than diagnosing the corruption.

When G-Labs dies mid-pass¶

A specific case: G-Labs (the local backend) crashes or you lose the tunnel during a run.

Symptom

Runner errors with connection refused or 522 timeout partway through. May or may not have produced errors.json.

Diagnosis

Check the G-Labs health endpoint (http://localhost:8765/health). If it's 404, G-Labs is down.

Recovery

Restart G-Labs (npm start or whatever your installation requires)
Verify health endpoint is OK
Restart cloudflared tunnel (the URL will be different)
Update PatchWork's settings with the new tunnel URL
Re-run the workflow — cache will reuse anything that succeeded before the crash

The whole sequence is 5-10 minutes if everything goes smoothly.

When the cloudflared tunnel drops¶

Even more specific: tunnel URL stops resolving mid-pass.

Symptom

Generations error with 522 or connection refused. The local G-Labs is healthy. The tunnel URL just doesn't work.

Recovery

Restart cloudflared: cloudflared tunnel --url http://localhost:8765
Copy the new tunnel URL
Update wherever the URL is referenced (Generation Runner --server arg or PatchWork settings)
Re-run the workflow

The runner will reuse cached output and only redo the unfinished work.

When the runner produces invalid output¶

A different kind of crash — runner completed but output is garbage:

All candidates have AI tells the auto-QA didn't catch
Output URLs don't match the prompts
Avatar references didn't lock the face
All scenes look like a different account's avatar

This usually means a structural issue with the input .nbflow that the runner's sanity check didn't catch (or you skipped the sanity check).

Recovery:

Restore the .nbflow to the pre-run backup
Run the pre-generation sanity check on the restored file
Fix anything the check surfaces
Re-run

Common findings in this case:

Avatar reference URL was wrong (per-node link refs stale, see PatchWork Import Bugs)
Wrong model field on the gen nodes (defaulted to NB1 instead of NB2)
Dynamic Prompt variableName doesn't match the template placeholder

After recovery¶

Always do these:

Bump V0-N if the recovery involved any changes (cache wipe, sanity-check fixes, structural patches)
Document what happened in the version registry's history note ("V0-1: runner crashed at Scene 06, recovered with cache reuse, full output now")
Sync the tracker so the team sees the recovered state

Don't pretend the crash didn't happen. The audit trail matters.

When to give up and start over¶

Sometimes recovery isn't worth it. Signs:

You've tried 2 recovery passes and both produce different errors
The source .nbflow itself seems corrupted (won't import into PatchWork cleanly)
The cost of recovery (diagnosis time + re-runs) exceeds the cost of building from V0-1 again

Restart from the source brief. Skip the trauma of trying to salvage. You'll be faster.

When you're ready¶

→ Next: Cache Wipe / Force Re-roll — explicitly clearing the cache when it's getting in your way.