Codex logging bug may write TBs to local SSDs
⚠️ Critical Bug: Codex Logging May Destroy Local SSDs
Reported by:
Issue Reference: openai/codex #28224
Labels: CLI, bug, performance
🚨 The Problem: Excessive Disk I/O
The Codex application is currently suffering from a severe logging bug where it continuously writes massive amounts of data to a local SQLite feedback database. The affected files are located at:
~/.codex/logs_2.sqlite~/.codex/logs_2.sqlite-wal~/.codex/logs_2.sqlite-shm
📉 Impact on Hardware Endurance
The volume of data being written is unsustainable for consumer-grade hardware. Based on real-world observation, a machine with 21 days of uptime recorded approximately 37 TB of writes to the primary SSD.
Using to calculate the annual impact:
Warning: Many consumer SSDs are rated for roughly (Total Bytes Written). At this rate, the software could
slowly wear downcompletely exhaust the drive's warranted write endurance in less than one year.
🔍 Evidence Analysis
Evidence 1: The "Churn" Gap
While the database file size remains relatively small, the internal counters reveal a massive amount of churn (data being written and then deleted).
| Metric | Value |
|---|---|
| Current File Size | 1.2 GiB |
| Currently Retained Rows | 506,149 |
| Total Allocated Row IDs | 5,543,677,486 |
There is a 10,000x discrepancy between the rows currently stored and the total number of IDs generated. This suggests that over 10 TB of data has been cycled through the logs, even before considering write amplification from indexes, page rewrites, and filesystem overhead.
Evidence 2: Log Level Distribution
The bulk of the write volume is driven by low-priority telemetry.
Distribution by Level:
- TRACE: 70.7% (~732.5 MiB)
- INFO: 25.7% (~266.5 MiB)
- DEBUG: 3.0% (~30.6 MiB)
- WARN: 0.6% (~5.9 MiB)
Primary Offenders (Target + Level):
codex_api::endpoint::responses_websocket(TRACE) 527.4 MiBcodex_otel.log_only(INFO) 141.2 MiBcodex_otel.trace_safe(INFO) 121.2 MiBlog(TRACE) 97.4 MiBcodex_client::transport(TRACE) 60.1 MiB
Note: Filtering out TRACE logs and the specific OpenTelemetry INFO categories would eliminate approximately 96% of the log volume.
📝 Log Sample Analysis
High-Frequency TRACE Logs
These logs capture repetitive system events and WebSocket internals.
// Inotify events (extremely frequent)
mask: OPEN, name: Some("ld.so.cache") 37,982x TRACE log: inotify event: ...
mask: OPEN, name: Some("locale.alias") 23,843x TRACE log: inotify event: ...
mask: OPEN, name: Some("passwd") 3,639x TRACE log: inotify event: ...
// WebSocket/Tokio internals
tokio-tungstenite checkout /src/compat.rs:131 AllowStd.with_context 3,505x TRACE log: ...
tokio-tungstenite checkout /src/lib.rs:245 WebSocketStream.with_context 3,362x TRACE log: ...
tokio-tungstenite checkout /src/compat.rs:154 Read.read 3,356x TRACE log: ...
Dominant INFO Logs
These consist primarily of mirrored OpenTelemetry events.
843x INFO codex_client::custom_ca: using system root certificates...
334x INFO codex_otel.trace_safe: session_loop{thread_id= redacted }:submission_dispatch...
333x INFO codex_otel.log_only: session_loop{thread_id= redacted }:submission_dispatch...
⚡ Write Amplification
The actual disk pressure is higher than the "retained" database size suggests. In a brief 15-second window, the following was observed:
| Metric | Before | After | Delta |
|---|---|---|---|
| Retained Rows | 681,774 | 681,774 | 0 |
| Max Row ID | 5,003,347,015 | 5,003,383,226 | +36,211 |
This proves that the system is writing tens of thousands of rows every few seconds, only to prune them immediately to keep the database size stable.
✅ Proposed Resolution Path
- Disable
TRACElevel logging by default. - Filter out high-frequency
inotifyandtokio-tungsteniteevents. - Reduce the verbosity of
codex_otelmirror logs. - Implement a more efficient log rotation or sampling strategy to reduce SSD wear.