Core Concepts¶

Papyra is not just a persistence backend. It is a persistence lifecycle system designed to make failure, recovery, inspection, and operational safety explicit.

This document explains the foundational ideas behind Papyra. Understanding these concepts will make the rest of the documentation significantly easier to reason about and apply in real systems.

1. Persistence Is a Lifecycle¶

Most systems treat persistence as a simple API:

write → read

Papyra treats persistence as a continuous lifecycle:

Write
Retain
Scan
Recover
Compact
Inspect
Observe

Each phase exists because real systems:

crash mid-write
are upgraded while data lives on
accumulate obsolete or invalid records
require human operators to intervene
must be debugged under time pressure

Papyra makes each phase explicit, testable, and automatable.

2. Append-Only First¶

All Papyra persistence backends follow one core rule:

Writes are append-only

This means: - No in-place mutation during normal operation - New records are always appended - Old data is never silently overwritten

Why append-only?¶

Append-only storage:

makes partial writes detectable
avoids catastrophic corruption
enables deterministic recovery
mirrors how production systems actually fail

Examples:

JSON backend → newline-delimited append
Redis Streams → XADD
Future SQL backends → append-only tables or WAL-style patterns

Mutation only happens during explicit compaction or recovery.

3. Logical State vs Physical State¶

Papyra makes a strict distinction between:

Logical state¶

What the application considers valid:

active events
recent audits
relevant dead letters

Physical state¶

What actually exists in storage:

expired records
corrupted entries
quarantined data
unused disk space

Retention, scanning, and compaction bridge this gap.

Nothing disappears implicitly.

4. Retention Is Not Deletion¶

Retention in Papyra is logical filtering, not immediate deletion.

A retention policy may specify:

maximum record count
maximum age
maximum total bytes

Retention is applied:

during reads
during inspection
during compaction

This ensures:

deterministic behavior
safe upgrades
predictable recovery

Deletion only happens during explicit compaction.

5. Scanning: Detecting Reality¶

The scan() phase answers one question:

“Is the stored data structurally valid?”

Scanning detects:

truncated records
invalid JSON
missing required fields
backend-specific inconsistencies

Key properties:

read-only
safe to run anytime
sample-based where necessary
backend-aware

If a backend cannot support scanning, it must explicitly say so.

6. Anomalies Are First-Class¶

When something is wrong, Papyra does not hide it.

An anomaly includes:

type (corruption, truncation, invalid record)
location (file, stream, key)
human-readable details

Anomalies:

are returned programmatically
are visible via CLI
can block startup
can trigger automated recovery

Nothing is auto-fixed without visibility.

7. Recovery Is Explicit¶

Recovery never happens implicitly.

You must explicitly choose:

when recovery runs
which strategy is used
what happens to corrupted data

Supported strategies:

REPAIR → fix in place when possible
QUARANTINE → move corrupted data aside
IGNORE → acknowledge risk and continue

Every recovery is followed by a post-recovery scan.

8. Startup Safety Modes¶

Papyra formalizes startup behavior via explicit modes:

IGNORE - Start regardless of anomalies
FAIL_ON_ANOMALY - Fail fast if anything is wrong
RECOVER - Attempt recovery, then re-scan

These modes exist because environments differ:

local development
CI pipelines
production deployments
emergency maintenance

Startup behavior is never implicit.

9. Compaction Is Physical Maintenance¶

Compaction is the only operation that:

rewrites data
reclaims disk space
removes expired records physically

Compaction:

is explicit
is backend-specific
is safe by design
respects retention policies

Running compaction is an operational decision, not a side effect.

10. Inspection Is for Humans¶

Inspection exists for operators and developers.

The inspect phase provides:

backend identity
retention configuration
approximate data volumes
metrics snapshots (if supported)

Inspection favors clarity over completeness.

It is designed to answer:

“Is this system behaving the way I expect?”

11. Observability Is Built In¶

Papyra exposes internal metrics:

write counts
error counts
scan results
recovery outcomes
compaction statistics

Metrics:

are backend-aware
can be reset
can be exported
integrate naturally with monitoring systems

This makes Papyra suitable for production environments that require visibility.

12. Philosophy Summary¶

Papyra is built on a few non-negotiable principles:

Failure is normal
Corruption must be visible
Recovery must be explicit
Operators must be empowered
Data must never disappear silently

If you understand these principles, you understand Papyra.

The rest of the documentation builds directly on them.