JSON Persistence Backend¶
The JSON persistence backend is the reference file-based durability layer in Papyra.
It provides a transparent, append-only persistence mechanism that is easy to inspect, debug, recover, and reason about. This backend is intentionally simple by design, while still supporting advanced operational features such as retention policies, anomaly detection, recovery, compaction, metrics, and CLI tooling.
This backend is suitable for:
- Local development
- Single-node deployments
- Debugging and auditing
- Small to medium production workloads where simplicity and transparency matter
It is not intended to replace databases or distributed log systems.
Design Philosophy¶
The JSON backend follows four strict principles:
- Append-only durability: All records are written sequentially. Existing data is never modified in place.
- Crash safety: Partial writes, truncated lines, and malformed JSON are expected failure modes and are explicitly handled by scan and recovery logic.
- Human inspectability: Data is stored as newline-delimited JSON (NDJSON) and can be inspected using standard tools such as
cat,less,jq, or text editors. - Operational correctness over performance: Predictable behavior and recoverability are prioritized over raw throughput.
File Format (NDJSON)¶
Each line in the persistence file represents one immutable record.
Example:
{"kind": "event", "timestamp": 1710000000.123, "actor": "user/123", "type": "UserCreated", "payload": {...}}
{"kind": "audit", "timestamp": 1710000001.456, "action": "spawn", "actor": "worker/7"}
{"kind": "dead_letter", "timestamp": 1710000002.789, "reason": "no_handler"}
Key properties:
- Each record occupies exactly one line
- Records are written using
fsyncsemantics (backend-dependent) - A corrupted or partial line does not affect previous valid records
Supported Record Types¶
The backend persists multiple logical categories:
| Kind | Purpose |
|---|---|
event |
Actor-level events |
audit |
System-level lifecycle actions |
dead_letter |
Undeliverable messages |
metric |
Optional metrics snapshots |
The kind field is mandatory and drives classification during scans and inspection.
Retention Policies¶
The JSON backend supports retention via a pluggable retention policy.
Retention is logical, not physical:
- Old records are ignored at read time
- Files are compacted separately to reclaim disk space
Supported retention constraints include:
- Maximum record count
- Maximum age (seconds)
- Maximum total file size
Example configuration:
from papyra.persistence.backends.retention import RetentionPolicy
policy = RetentionPolicy(
max_records=1_000_000,
max_age_seconds=7 * 24 * 3600,
)
Retention is enforced during:
- Reads
- Startup checks
- Compaction
Scanning and Anomaly Detection¶
The backend supports structural scanning via scan().
A scan detects:
- Truncated JSON lines
- Malformed JSON
- Unknown record kinds
- Structural inconsistencies
Example CLI usage:
papyra persistence scan --path events.ndjson
Possible outcomes:
- Healthy → exit code 0
- Anomalies detected → exit code 2 with details
Anomalies never modify data automatically.
Recovery Modes¶
When anomalies are detected, recovery can be explicitly triggered.
Repair Mode (default)¶
- Drops corrupted or partial lines
- Preserves all valid preceding records
- Rewrites the file safely
papyra persistence recover --path events.ndjson
Quarantine Mode¶
- Moves corrupted files to a quarantine directory
- Rewrites a clean file
- Preserves evidence for forensic analysis
papyra persistence recover \
--mode quarantine \
--quarantine-dir ./quarantine \
--path events.ndjson
Quarantine mode requires an explicit directory.
Compaction¶
Because the backend is append-only, disk usage can grow indefinitely.
Compaction:
- Applies retention rules
- Rewrites the file atomically
- Removes expired or ignored records
- Shrinks disk usage safely
CLI example:
papyra persistence compact --path events.ndjson
Compaction guarantees: - No data loss within retention limits - Atomic replacement - Crash-safe behavior
Inspection¶
The inspect command provides a high-level overview of the backend state.
papyra persistence inspect --path events.ndjson
Output includes:
- Backend type
- Retention configuration
- Sampled counts of events, audits, dead letters
- Optional metrics snapshot
This command is read-only and safe to run in production.
Metrics Support¶
If metrics are enabled, the backend exposes internal counters such as:
- Writes performed
- Reads performed
- Scan operations
- Recovery attempts
- Compactions
Metrics can be displayed via:
papyra persistence inspect --show-metrics
Metrics are backend-local and reset on process restart.
Performance Characteristics¶
| Aspect | Behavior |
|---|---|
| Writes | Sequential, O(1) |
| Reads | Linear scan (bounded by retention) |
| Startup | Scan cost proportional to file size |
| Recovery | Linear rewrite |
| Compaction | Linear rewrite |
This backend scales vertically, not horizontally.
When to Use This Backend¶
Recommended:
- Development and testing
- Single-node actor systems
- Debugging production issues
- Auditing and traceability
Not recommended:
- High-throughput distributed systems
- Multi-writer scenarios
- Very large datasets requiring indexed queries
Summary¶
The JSON persistence backend is the foundation of Papyra's durability model.
It trades raw performance for: - Predictability - Recoverability - Transparency - Operational confidence
For many systems, this tradeoff is not a limitation — it is a feature.