Understanding TimescaleDB Data Compression

Key Metric: TimescaleDB is capable of reducing the storage footprint of typical time-series datasets by as much as 98%.

Standard OLTP databases utilize general-purpose compression, but time-series data requires a specialized strategy. TimescaleDB achieves its massive efficiency via the hypercore engine—a hybrid system that blends row-based and columnar storage using advanced algorithms like Gorilla XOR, delta-of-delta, delta encoding, and run-length encoding (RLE).

TimescaleDB vs. PostgreSQL TOAST

It is important to distinguish between TimescaleDB's compression and the native PostgreSQL TOAST (The Oversized-Attribute Storage Technique). While they both reduce size, they target different problems.

TOAST is designed for "vertical" bloat: individual fields that are too large (e.g., a massive jsonb blob) to fit in a standard 8 kB page.
Hypercore is designed for "horizontal" bloat: repeating patterns across millions of rows.

These systems are complementary; in fact, TimescaleDB may use TOAST as a fallback for specific data types.

Comparison Matrix: TOAST vs. Hypercore

Feature	TOAST (Vanilla Postgres)	TimescaleDB Hypercore
Primary Goal	Handle values $> 2\text{ KB}$	Optimize time-series patterns
Trigger	Value exceeds `TOAST_TUPLE_THRESHOLD`	Policy-based (e.g., data $> 7\text{ days old}$ )
Scope	Variable-length types (`text`, `bytea`)	All data types
Algorithms	`pglz`, `lz4`	Delta, Delta-of-Delta, RLE, XOR, Dictionary
Granularity	Per individual value	Per batch ( $\approx 1000$ rows)
Data Awareness	Opaque byte stream	Aware of monotonicity and numeric structure
Float Ratio	$\approx 1.0\times$ (None)	$10\text{--}20\times$
Timestamp Ratio	$\approx 1.0\times$ (None)	$50\text{--}100\times$
Text Ratio	$2\text{--}3\times$	$5\text{--}10\times$

The Hypercore Engine: A Hybrid Approach

The hypercore engine manages a transition of data states to balance write speed with read efficiency.

The Transformation Process

When a chunk is compressed, the engine reorganizes the data:

Rows are grouped into batches of up to 1,000.
Each batch is converted into a single row in a compressed table.
Columns are transformed into compressed arrays.

This column-major format allows the database to fetch only the specific fields required for a query, rather than scanning entire rows of irrelevant data.

Deep Dive: Compression Algorithms

1. Delta Encoding

Instead of storing the full value of every data point, delta encoding stores the difference between the current value and the previous one.

Mathematical Representation: $\Delta = x_n - x_{n-1}$

Example: Imagine a sensor recording temperature. Instead of storing 72.5, 72.7, 72.4, the system stores:

Time	Machine	Type	Value	$\rightarrow$	Compressed Value
12:00:00	M_001	temp	72.5	$\rightarrow$	`72.5` (Base)
12:00:05	M_001	temp	72.7	$\rightarrow$	`+0.2`
12:00:10	M_001	temp	72.4	$\rightarrow$	`-0.3`

2. Delta-of-Delta Encoding

For timestamps that occur at regular intervals (e.g., every 5 seconds), even the delta is repetitive. Delta-of-delta stores the change in the change.

Mathematical Representation: $\Delta^2 = \Delta_n - \Delta_{n-1}$

If the interval is always 5 seconds, the $\Delta$ is always 5, and the $\Delta^2$ is 0. Storing a zero requires significantly fewer bits than storing a timestamp.

3. Run-Length Encoding (RLE)

RLE is ideal for columns with low cardinality or high repetition, such as device_id or status. Instead of repeating the same string 1,000 times, it stores the value once and a count of its repetitions.

Data Transformation: ~~MACHINE_001, MACHINE_001, MACHINE_001, MACHINE_001~~ $\rightarrow$ (MACHINE_001, 4)

-- Conceptual representation of a compressed batch
SELECT 
    ARRAY['12:00:00', '12:00:05', '12:00:10'] as time,
    ARRAY['MACHINE_001'::text, 3] as machine_id, -- RLE: Value + Count
    ARRAY[72.5, 0.2, -0.3] as temp_delta;       -- Delta Encoding

By combining these techniques, TimescaleDB transforms bulky, row-oriented time-series data into a lean, columnar format that slashes storage costs and accelerates analytical performance.