Compaction Strategies¶
Compaction is the physical cleanup phase of Papyra persistence. Where retention defines what is no longer valid, compaction defines when and how invalid data is physically removed.
This document explains compaction deeply: why it exists, how it works across backends, and how to operate it safely in production.
1. Retention vs Compaction (Critical Distinction)¶
Papyra intentionally separates logical retention from physical compaction.
| Concept | What it does | When it runs |
|---|---|---|
| Retention | Decides which records are expired | During reads / scans |
| Compaction | Rewrites storage to remove expired data | Explicit operation |
Important
Retention alone does not shrink disk usage. Only compaction does.
This separation guarantees:
- Deterministic reads
- Crash-safe storage
- No hidden I/O work during normal operation
2. Why Compaction Is Explicit¶
Many systems compact automatically. Papyra does not, by design.
Reasons:
- Compaction is I/O heavy
- Compaction may lock files or streams
- Operators must control when it happens
- Predictability > convenience
Instead, Papyra exposes compaction via:
- CLI
- API
- Operator scheduling (cron, systemd, Kubernetes jobs)
3. Backend-Specific Compaction Behavior¶
3.1 JSON File Persistence¶
Mechanism
- Reads the source
.ndjson - Applies retention rules
- Writes a new compacted file
- Atomically replaces the original
Guarantees
- Crash-safe (temp file + rename)
- No partial writes
- Original file preserved until success
Disk Impact
- Temporary double disk usage during compaction
Typical Use
papyra persistence compact
3.2 Rotating File Persistence¶
Mechanism
- Iterates rotated segments
- Drops fully expired segments
- Rewrites partially expired segments
- Renames safely
Advantages
- Faster compaction than monolithic files
- Natural disk locality
- Easier recovery
Best Practice
- Pair with size-based rotation
- Compact during off-peak hours
3.3 Redis Streams¶
Redis does not support true compaction in the filesystem sense.
Papyra uses:
XTRIMwith retention-derived bounds- Optional approximate trimming (
~)
Trade-offs
| Mode | Behavior |
|---|---|
| Exact | Strong guarantees, slower |
| Approximate | Faster, slightly lossy |
Compaction here means:
“Advance the stream head to forget expired history”
3.4 In-Memory Persistence¶
- No-op
- Python garbage collection handles cleanup
- Compaction exists for API symmetry only
4. Compaction Triggers¶
Papyra never auto-compacts.
You trigger compaction via:
CLI¶
papyra persistence compact
API¶
await system.persistence.compact()
Automation¶
- Cron
- systemd timer
- Kubernetes Job / CronJob
5. Safe Compaction Windows¶
Recommended times:
- Low traffic periods
- After large retention drops
- Before backups
- After incident recovery
Avoid:
- During heavy write bursts
- During recovery operations
- While disk is near full capacity
6. Observability During Compaction¶
If metrics are enabled, compaction emits:
- Records scanned
- Records dropped
- Files rewritten
- Errors encountered
CLI example:
papyra metrics persistence
7. Failure Modes & Guarantees¶
Crash During Compaction¶
✔ Original data preserved ✔ Temporary files discarded ✔ Safe to retry
Disk Full¶
- Compaction aborts
- No data loss
- Operator must free space
Partial Backend Support¶
- Backends may return
None - CLI reports best-effort completion
8. Compaction vs Recovery¶
| Operation | Purpose |
|---|---|
| Recovery | Fix corruption |
| Compaction | Reduce size |
Never confuse the two.
Recovery may rewrite, but its goal is correctness — not size.
9. Real-World Compaction Strategies¶
High-Volume Event Systems¶
- Daily retention
- Weekly compaction
Compliance Systems¶
- Long retention
- Monthly compaction + archive
Embedded / Edge Systems¶
- Aggressive retention
- Frequent compaction
10. Design Philosophy Recap¶
Papyra compaction is:
- Explicit
- Predictable
- Crash-safe
- Backend-aware
You control when storage is rewritten — not the framework.