How TimescaleDB compresses time-series data
Understanding TimescaleDB Data Compression
Key Metric: TimescaleDB is capable of reducing the storage footprint of typical time-series datasets by as much as 98%.
Standard OLTP databases utilize general-purpose compression, but time-series data requires a specialized strategy. TimescaleDB achieves its massive efficiency via the hypercore engine—a hybrid system that blends row-based and columnar storage using advanced algorithms like Gorilla XOR, delta-of-delta, delta encoding, and run-length encoding (RLE).
TimescaleDB vs. PostgreSQL TOAST
It is important to distinguish between TimescaleDB's compression and the native PostgreSQL TOAST (The Oversized-Attribute Storage Technique). While they both reduce size, they target different problems.
- TOAST is designed for "vertical" bloat: individual fields that are too large (e.g., a massive
jsonbblob) to fit in a standard 8 kB page. - Hypercore is designed for "horizontal" bloat: repeating patterns across millions of rows.
These systems are complementary; in fact, TimescaleDB may use TOAST as a fallback for specific data types.
Comparison Matrix: TOAST vs. Hypercore
| Feature | TOAST (Vanilla Postgres) | TimescaleDB Hypercore |
|---|---|---|
| Primary Goal | Handle values | Optimize time-series patterns |
| Trigger | Value exceeds TOAST_TUPLE_THRESHOLD | Policy-based (e.g., data ) |
| Scope | Variable-length types (text, bytea) | All data types |
| Algorithms | pglz, lz4 | Delta, Delta-of-Delta, RLE, XOR, Dictionary |
| Granularity | Per individual value | Per batch ( rows) |
| Data Awareness | Opaque byte stream | Aware of monotonicity and numeric structure |
| Float Ratio | (None) | |
| Timestamp Ratio | (None) | |
| Text Ratio |
The Hypercore Engine: A Hybrid Approach
The hypercore engine manages a transition of data states to balance write speed with read efficiency.
The Transformation Process
When a chunk is compressed, the engine reorganizes the data:
- Rows are grouped into batches of up to 1,000.
- Each batch is converted into a single row in a compressed table.
- Columns are transformed into compressed arrays.
This column-major format allows the database to fetch only the specific fields required for a query, rather than scanning entire rows of irrelevant data.
Deep Dive: Compression Algorithms
1. Delta Encoding
Instead of storing the full value of every data point, delta encoding stores the difference between the current value and the previous one.
Mathematical Representation:
Example:
Imagine a sensor recording temperature. Instead of storing 72.5, 72.7, 72.4, the system stores:
| Time | Machine | Type | Value | Compressed Value | |
|---|---|---|---|---|---|
| 12:00:00 | M_001 | temp | 72.5 | 72.5 (Base) | |
| 12:00:05 | M_001 | temp | 72.7 | +0.2 | |
| 12:00:10 | M_001 | temp | 72.4 | -0.3 |
2. Delta-of-Delta Encoding
For timestamps that occur at regular intervals (e.g., every 5 seconds), even the delta is repetitive. Delta-of-delta stores the change in the change.
Mathematical Representation:
If the interval is always 5 seconds, the is always 5, and the is 0. Storing a zero requires significantly fewer bits than storing a timestamp.
3. Run-Length Encoding (RLE)
RLE is ideal for columns with low cardinality or high repetition, such as device_id or status. Instead of repeating the same string 1,000 times, it stores the value once and a count of its repetitions.
Data Transformation:
MACHINE_001, MACHINE_001, MACHINE_001, MACHINE_001(MACHINE_001, 4)
-- Conceptual representation of a compressed batch
SELECT
ARRAY['12:00:00', '12:00:05', '12:00:10'] as time,
ARRAY['MACHINE_001'::text, 3] as machine_id, -- RLE: Value + Count
ARRAY[72.5, 0.2, -0.3] as temp_delta; -- Delta Encoding
By combining these techniques, TimescaleDB transforms bulky, row-oriented time-series data into a lean, columnar format that slashes storage costs and accelerates analytical performance.