Runner Crash Recovery¶
When the Generation Runner crashes mid-pass — process killed, system rebooted, G-Labs died, network dropped — you need to recover without losing the work that already succeeded.
What you have after a crash¶
When the runner crashes mid-pass, here's the state:
| Asset | State |
|---|---|
The source .nbflow (input) |
Unchanged. Intact. |
| Local generated PNGs / MP4s | Whatever was produced before the crash, saved to Assets/{workflow}/generated-images/v0-N/ and generated-videos/v0-N/ |
| R2 uploads | All successful uploads happened atomically; R2 has whatever finished |
The -generated.nbflow (output) |
Either: didn't get written (crash too early), partial / corrupted, or written successfully (crash AFTER export) |
Cache state in the source .nbflow |
Updated for any successful node, untouched for incomplete ones |
errors.json |
May or may not exist depending on when the crash happened |
The good news: what succeeded is preserved. The bad news: you have to figure out what to redo.
Diagnostic flow¶
flowchart TD
A[Runner crashed mid-pass]
A --> B{Does -generated.nbflow exist?}
B -->|yes, complete| C[Lucky — restart with full output]
B -->|no or partial| D{Check local Assets folders}
D --> E[Count generated PNGs/MP4s by scene]
E --> F{All scenes covered?}
F -->|yes| G[Re-run to finish exporting]
F -->|no| H[Re-run with cache enabled]
H --> I[Cached successful nodes reused;<br/>unfinished nodes regenerate]
Step 1: Check what got produced¶
Look at the local Assets folders:
ls "projects/may/xyz-wellness/Assets/XYZG3/generated-images/v0-1/"
# scene01.png, scene02.png, scene03.png, scene04.png, scene05.png
ls "projects/may/xyz-wellness/Assets/XYZG3/generated-videos/v0-1/"
# (empty or partial)
This tells you: images were done through Scene 05 before the crash. Video gen hadn't started (or hadn't finished any).
Step 2: Check -generated.nbflow¶
If a -generated.nbflow exists, open it in PatchWork or inspect the JSON. Look at _savedImages and _savedClips fields on each gen node:
- Populated = that node completed successfully; its R2 URL is baked in
- Empty / null = that node didn't complete
Step 3: Re-run with cache enabled¶
The Generation Runner's cache should handle most recovery automatically:
You: re-run XYZG3-V0-1.nbflow. The previous run crashed mid-way; use
the cache to avoid redoing what succeeded.
Claude: [invokes Generation Runner with normal cache behavior]
[the runner reads the source .nbflow, sees which nodes have
_savedImages populated, skips those, runs the rest]
What happens:
- Nodes that completed before the crash → cached output is reused, no regen
- Nodes that didn't complete → fresh generation
- Total cost: only the cost of the unfinished work
You usually get a clean -generated.nbflow on the second pass with minimal re-cost.
Step 4: If cache state is corrupted¶
In rare cases, the crash leaves the source .nbflow in an inconsistent state — some nodes have _savedImages set to URLs that don't exist on R2, or partial data.
Symptoms:
- The runner tries to "reuse" output but PatchWork shows a broken image
- R2 URLs in the file return 404
- The runner finishes but the output is mixed valid / invalid
Recovery:
- Restore the source
.nbflowfrom thebackups/folder (the version before this run) - The restored file has no cache; everything regenerates from scratch
- Cost is back to full, but you're guaranteed clean output
When the runner's cache is suspect, restore and re-run is faster than diagnosing the corruption.
When G-Labs dies mid-pass¶
A specific case: G-Labs (the local backend) crashes or you lose the tunnel during a run.
Symptom- Runner errors with
connection refusedor522 timeoutpartway through. May or may not have producederrors.json. Diagnosis- Check the G-Labs health endpoint (
http://localhost:8765/health). If it's 404, G-Labs is down. Recovery-
- Restart G-Labs (
npm startor whatever your installation requires) - Verify health endpoint is OK
- Restart cloudflared tunnel (the URL will be different)
- Update PatchWork's settings with the new tunnel URL
- Re-run the workflow — cache will reuse anything that succeeded before the crash
- Restart G-Labs (
The whole sequence is 5-10 minutes if everything goes smoothly.
When the cloudflared tunnel drops¶
Even more specific: tunnel URL stops resolving mid-pass.
Symptom- Generations error with
522orconnection refused. The local G-Labs is healthy. The tunnel URL just doesn't work. Recovery-
- Restart cloudflared:
cloudflared tunnel --url http://localhost:8765 - Copy the new tunnel URL
- Update wherever the URL is referenced (Generation Runner
--serverarg or PatchWork settings) - Re-run the workflow
- Restart cloudflared:
The runner will reuse cached output and only redo the unfinished work.
When the runner produces invalid output¶
A different kind of crash — runner completed but output is garbage:
- All candidates have AI tells the auto-QA didn't catch
- Output URLs don't match the prompts
- Avatar references didn't lock the face
- All scenes look like a different account's avatar
This usually means a structural issue with the input .nbflow that the runner's sanity check didn't catch (or you skipped the sanity check).
Recovery:
- Restore the
.nbflowto the pre-run backup - Run the pre-generation sanity check on the restored file
- Fix anything the check surfaces
- Re-run
Common findings in this case:
- Avatar reference URL was wrong (per-node link refs stale, see PatchWork Import Bugs)
- Wrong model field on the gen nodes (defaulted to NB1 instead of NB2)
- Dynamic Prompt
variableNamedoesn't match the template placeholder
After recovery¶
Always do these:
- Bump V0-N if the recovery involved any changes (cache wipe, sanity-check fixes, structural patches)
- Document what happened in the version registry's history note ("V0-1: runner crashed at Scene 06, recovered with cache reuse, full output now")
- Sync the tracker so the team sees the recovered state
Don't pretend the crash didn't happen. The audit trail matters.
When to give up and start over¶
Sometimes recovery isn't worth it. Signs:
- You've tried 2 recovery passes and both produce different errors
- The source
.nbflowitself seems corrupted (won't import into PatchWork cleanly) - The cost of recovery (diagnosis time + re-runs) exceeds the cost of building from V0-1 again
Restart from the source brief. Skip the trauma of trying to salvage. You'll be faster.
When you're ready¶
→ Next: Cache Wipe / Force Re-roll — explicitly clearing the cache when it's getting in your way.