Why You Need a Runbook
It is 2 AM. Your phone buzzes. The intake workflow stopped processing three hours ago. Forty client inquiries are stuck.
You open your laptop. Now what?
Without a runbook, you are reverse-engineering your own system under pressure. With a runbook, you have a clear path: diagnose, fix, verify, document.
A runbook is not documentation for its own sake. It is the difference between a 10-minute fix and a 2-hour panic.
The One-Page Runbook Template
This template covers the minimum needed for any production workflow. Print it. Keep it accessible. Update it when things change.
WORKFLOW: [Name]
Purpose: [One sentence describing what this workflow does]
Owner: [Name] | Backup: [Name]
Last Updated: [Date]
1. HEALTH CHECK
How do I know it is working?
| Indicator | Where to Check | Healthy State |
|---|---|---|
| Executions | n8n Dashboard > Executions | Running, no errors |
| Queue depth | [Location] | Under [X] items |
| Last success | n8n Dashboard | Within last [X] hours |
| Error rate | [Monitoring tool] | Under [X]% |
Quick health check command:
[Command or URL to check status]
2. COMMON ISSUES
Issue: Workflow not executing
- Check if workflow is active (toggle in n8n)
- Check trigger (webhook accessible? schedule running?)
- Check n8n service status
- Check server resources (disk, memory)
Issue: Executions failing
- Open failed execution in n8n
- Check which node failed
- Check error message
- Common causes:
- API rate limit → Wait and retry
- Auth expired → Refresh credentials
- Data format changed → Check upstream system
Issue: Slow processing
- Check queue depth
- Check server CPU/memory
- Check external API response times
- Consider: Is volume unusually high?
3. FIX PROCEDURES
Restart workflow:
- Deactivate workflow (toggle off)
- Wait 10 seconds
- Activate workflow (toggle on)
- Verify with test execution
Retry failed executions:
- Go to Executions > Failed
- Select execution(s)
- Click Retry
- Monitor for success
Restart n8n service:
[Command to restart n8n - depends on your setup]
# Docker: docker restart n8n
# PM2: pm2 restart n8n
# Systemd: sudo systemctl restart n8n
Rollback to previous version:
- [Location of version history or backups]
- [Steps to restore previous version]
- Test with known-good data
- Monitor for 30 minutes
4. ESCALATION
When to escalate:
- Cannot diagnose within 30 minutes
- Fix requires access you do not have
- Impact affects multiple clients
- Root cause unclear after fix
Escalation contacts:
| Situation | Contact | Method |
|---|---|---|
| Technical issue | [Name] | [Phone/Slack] |
| Business decision needed | [Name] | [Phone/Email] |
| Vendor/API issue | [Vendor support] | [Portal/Email] |
| Security concern | [Name] | [Phone] |
5. POST-INCIDENT
After every incident:
- Issue resolved and verified
- Root cause identified
- Runbook updated if needed
- Incident logged in [tracking system]
- Stakeholders notified if impacted
Incident log location: [URL or path]
How to Use This Template
Step 1: Fill It Out Now
Do not wait for an incident. Fill out the template for each production workflow today. The 30 minutes you spend now saves hours during an emergency.
Step 2: Keep It Accessible
- Print a copy for the office
- Store in shared drive
- Link from your monitoring dashboard
- Include in on-call documentation
Step 3: Test It
Run a fire drill:
- Pretend the workflow is down
- Follow the runbook
- Note where instructions are unclear
- Update accordingly
Step 4: Maintain It
Review quarterly:
- Are contacts still correct?
- Have procedures changed?
- Are there new common issues?
Update after every incident:
- Did the runbook help?
- What was missing?
- What can be added to prevent next time?
Common Runbook Mistakes
Mistake 1: Too Much Detail
A runbook is not full documentation. It is emergency instructions. Keep it to one page. Link to detailed docs if needed.
Mistake 2: Outdated Information
Wrong contact info or outdated procedures are worse than none. Review regularly.
Mistake 3: Assumes Knowledge
Write for the person covering at 2 AM who has never seen this workflow. Spell out acronyms. Include exact commands.
Mistake 4: Not Tested
A runbook that has never been used will fail when needed. Test with a fire drill.
Mistake 5: Not Accessible
A perfect runbook in a folder nobody can find is useless. Multiple access points: printed, digital, linked from monitoring.
Example: Client Intake Workflow Runbook
WORKFLOW: Client-Intake-Routing-v2
Purpose: Routes new client inquiries from web form to appropriate attorney based on practice area and urgency.
Owner: Sarah K. | Backup: Mike T.
Last Updated: 2024-01-15
1. HEALTH CHECK
| Indicator | Where to Check | Healthy State |
|---|---|---|
| Executions | n8n.internal/executions | Running, <5% errors |
| Queue depth | Slack #intake-alerts | Under 20 items |
| Last success | n8n Dashboard | Within last 1 hour |
| Error rate | Datadog dashboard | Under 2% |
Quick check: https://n8n.internal/workflow/5/executions
2. COMMON ISSUES
Webhook not receiving data:
Check: Is the form sending to correct URL?
Check: Is firewall allowing inbound traffic?
Test: Submit test form, check if execution appears
CRM connection failing:
Check: API credentials in n8n credentials store
Check: CRM system status page
Fix: Refresh OAuth token if expired
Routing incorrect:
Check: Practice area mapping in Set node
Check: Has form added new practice areas?
Fix: Update mapping, test with sample data
3. ESCALATION
| Situation | Contact | Method |
|---|---|---|
| n8n down | Mike T. | 555-0123 |
| CRM issue | Vendor Support | support.crm.com |
| Business rule question | Sarah K. | Slack @sarah |
This one page handles 90% of middle-of-the-night issues.
Next Step
- Download/copy this template
- Fill it out for your most critical workflow
- Share with your team
- Schedule a fire drill for next week
The best time to write a runbook was before your first incident. The second best time is now.
Guide: n8n Operations for Law Firms
Related:
n8n Versioning and Releases