Production Checklist¶
This checklist is a practical, operational guide for running Papyra safely and predictably in production.
It is intentionally opinionated. Every item exists because it prevents a real failure mode.
Use this document:
- before first production deployment
- during incident reviews
- when onboarding new operators
- when changing persistence backends or retention policies
1. Persistence Backend Readiness¶
✅ Choose the correct backend¶
| Backend | Use when | Avoid when |
|---|---|---|
| Memory | Tests, local development | Any restart matters |
| JSON | Small systems, simple recovery | High write throughput |
| Rotation | Long-running services | Strict ordering is required |
| Redis | Distributed systems | Redis is not operationally mature |
✅ Verify backend health before startup¶
Always run one of:
papyra persistence scan
or
papyra persistence startup-check --mode fail_on_anomaly
This prevents booting into a corrupted state.
❌ Never auto-ignore anomalies in production¶
Forbidden configuration:
startup mode = IGNORE
If corruption exists, you want to fail fast.
2. Retention Configuration¶
✅ Explicit retention is mandatory¶
Production systems must define retention, even if generous.
At minimum, define one of:
max_recordsmax_age_secondsmax_total_bytes
Leaving retention unbounded guarantees disk exhaustion.
⚠️ Retention ≠ deletion¶
Retention:
- marks data as logically expired
Compaction:
- removes it physically
You must run both.
3. Compaction Strategy¶
✅ Schedule compaction¶
Examples:
papyra persistence compact
Recommended cadence:
- JSON / Rotation: daily
- Redis: weekly (or via XTRIM)
- High-throughput systems: off-peak hours
⚠️ Validate compaction impact¶
After compaction:
- disk usage should decrease
- metrics should reflect reclaimed data
- no anomalies should appear on scan
Always verify with:
papyra persistence scan
4. Startup Safety¶
✅ Use startup-checks in orchestration¶
Kubernetes / systemd should block startup unless:
- persistence scan is clean
- or recovery succeeded fully
Example:
papyra persistence startup-check --mode recover --recovery-mode repair
❌ Do not combine recovery with live traffic¶
Recovery must run:
- before actors start
- without concurrent writers
Never attempt recovery during runtime.
5. Metrics & Observability¶
✅ Enable metrics early¶
Metrics are not optional in production.
Monitor at least:
- write counts
- retention drops
- compaction runs
- error counters
✅ Integrate external monitoring¶
Recommended:
- OpenTelemetry
- Prometheus-compatible exporters
- Centralized log aggregation
Metrics should answer:
- Is data being dropped?
- Is compaction effective?
- Is recovery happening unexpectedly?
6. Redis-Specific Checks (If Applicable)¶
✅ Consumer group hygiene¶
Verify:
- pending count trends toward zero
- no abandoned consumer groups
- claim logic is exercised under failure
Use:
papyra inspect events
⚠️ Redis memory pressure¶
Ensure Redis:
- has eviction policy defined
- is not shared with unrelated workloads
- has persistence configured (AOF / RDB)
7. Failure Handling Readiness¶
✅ Validate failure scenarios¶
At least once, test:
- truncated persistence files
- Redis restarts
- unacked consumer messages
- compaction during retention pressure
Use:
papyra doctor run --mode fail_on_anomaly
❌ Never silence failures¶
If doctor reports anomalies:
- stop the system
- investigate
- recover explicitly
Silent corruption is worse than downtime.
8. Operational Defaults (Recommended)¶
| Setting | Recommendation |
|---|---|
| Startup mode | FAIL_ON_ANOMALY |
| Recovery mode | REPAIR |
| Retention | Explicit |
| Compaction | Scheduled |
| Metrics | Enabled |
| Redis | Isolated instance |
9. Pre-Release Gate¶
Before releasing a new version:
- [ ] Run persistence scan
- [ ] Run compaction
- [ ] Verify metrics snapshot
- [ ] Confirm retention thresholds
- [ ] Simulate failure recovery
- [ ] Validate startup-check behavior
10. Final Rule¶
If you cannot explain what happens when persistence breaks, you are not production-ready.
Papyra gives you the tools. This checklist ensures you actually use them.