Startup Checks¶
Startup checks are a critical safety mechanism in Papyra.
They ensure that persisted state is structurally valid, consistent, and recoverable before an ActorSystem begins processing messages.
Unlike traditional systems that discover corruption only after runtime failures, Papyra treats persistence validation as a first‑class startup concern.
Why Startup Checks Matter¶
Actor systems rely on persisted data for:
- Recovery after crashes
- Replay of events
- Audit history
- Dead letter inspection
- Diagnostics and observability
If corrupted persistence is loaded blindly:
- Actors may crash immediately
- Subtle state corruption can propagate
- Failures may appear non‑deterministic
- Recovery becomes harder or impossible
Startup checks prevent this by failing early and explicitly.
What Is a Startup Check?¶
A startup check is a controlled sequence that:
- Scans the persistence backend for anomalies
- Decides what to do if anomalies are found
- Optionally recovers corrupted data
- Verifies the result before startup continues
This logic is used in three places:
- ActorSystem startup
persistence startup-checkCLI commanddoctor runCLI command
All three use the same underlying rules.
PersistenceStartupMode¶
Startup behavior is controlled by PersistenceStartupMode.
Available Modes¶
| Mode | Behavior |
|---|---|
ignore |
Logs anomalies and continues startup |
fail_on_anomaly |
Fails immediately if anomalies are detected |
recover |
Attempts recovery before continuing |
Default Behavior¶
fail_on_anomaly
This default is intentionally strict. Production systems should never silently continue with corrupted data.
What Counts as an Anomaly?¶
An anomaly is any structural or semantic issue detected by backend.scan().
Examples include:
- Truncated JSON lines
- Invalid JSON payloads
- Missing required fields
- Corrupted stream entries (Redis)
- Invalid record formats
- Unknown record kinds
Each backend defines its own scan logic, but all report anomalies uniformly.
Startup Flow (Conceptual)¶
┌───────────────┐
│ Startup Begin │
└───────┬───────┘
│
▼
┌──────────────────┐
│ Persistence Scan │
└───────┬──────────┘
│
├── No anomalies ───────────▶ START SYSTEM
│
▼
┌──────────────────────┐
│ Anomalies Detected │
└───────┬──────────────┘
│
├─ IGNORE ───────────────▶ START SYSTEM (logged)
│
├─ FAIL_ON_ANOMALY ──────▶ ABORT STARTUP
│
▼
┌──────────────────┐
│ RECOVER │
└───────┬──────────┘
│
▼
┌──────────────────────┐
│ Post‑Recovery Scan │
└───────┬──────────────┘
│
├─ Clean ───────────────▶ START SYSTEM
│
└─ Still Broken ───────▶ ABORT STARTUP
Recovery During Startup¶
When using recover mode, Papyra applies a PersistenceRecoveryConfig.
Recovery Modes¶
| Mode | Description |
|---|---|
repair |
Fix issues in place where possible |
quarantine |
Move corrupted records aside before repairing |
Important Guarantees¶
- Recovery never runs implicitly unless explicitly enabled
- Recovery is always followed by a second scan
- Startup only continues if the system is verified clean
Using Startup Checks in the CLI¶
Simulate Startup Without Running Actors¶
papyra persistence startup-check
Fail If Any Corruption Exists¶
papyra persistence startup-check --mode fail_on_anomaly
Attempt Recovery Before Startup¶
papyra persistence startup-check --mode recover --recovery-mode repair
Quarantine Corrupted Data¶
papyra persistence startup-check \
--mode recover \
--recovery-mode quarantine \
Using Startup Checks Programmatically¶
Startup checks are automatically executed when an ActorSystem starts.
Example (conceptual):
system = ActorSystem(
persistence=JsonFilePersistence("events.ndjson"),
startup_mode=PersistenceStartupMode.FAIL_ON_ANOMALY,
)
If anomalies are found, the system will not start.
Relationship to the Doctor Command¶
| Feature | Startup Check | Doctor |
|---|---|---|
| Used during system startup | ✅ | ❌ |
| Manual diagnostics | ❌ | ✅ |
| Recovery support | ✅ | ✅ |
| Strict failure modes | ✅ | ✅ |
| Intended for automation | ✅ | ⚠️ |
Use startup checks for safety. Use doctor for investigation and maintenance.
Production Recommendations¶
- Use
fail_on_anomalyin production - Run
persistence startup-checkin CI/CD pipelines - Use
recoveronly with backups or supervision - Never auto‑recover silently in critical systems
Key Takeaways¶
- Startup checks are non‑optional safety mechanisms
- Corruption is detected before actors run
- Recovery is explicit, controlled, and verified
- Startup behavior is predictable and testable
Papyra fails early — so your system doesn't fail later.