← Back to news

Codex logging bug may write TBs to local SSDs

github.com|354 points|193 comments|by vantareed|Jun 22, 2026

⚠️ Critical Bug: Codex Logging May Destroy Local SSDs

Reported by: @1996fanrui
Issue Reference: openai/codex #28224
Labels: CLI, bug, performance


🚨 The Problem: Excessive Disk I/O

The Codex application is currently suffering from a severe logging bug where it continuously writes massive amounts of data to a local SQLite feedback database. The affected files are located at:

  • ~/.codex/logs_2.sqlite
  • ~/.codex/logs_2.sqlite-wal
  • ~/.codex/logs_2.sqlite-shm

📉 Impact on Hardware Endurance

The volume of data being written is unsustainable for consumer-grade hardware. Based on real-world observation, a machine with 21 days of uptime recorded approximately 37 TB of writes to the primary SSD.

Using LaTeX\LaTeX to calculate the annual impact: Annual Write Volume=(37 TB21 days)×365 days642.38 TB/year\text{Annual Write Volume} = \left( \frac{37\text{ TB}}{21\text{ days}} \right) \times 365\text{ days} \approx 642.38\text{ TB/year}

Warning: Many consumer SSDs are rated for roughly 600 TBW600\text{ TBW} (Total Bytes Written). At this rate, the software could slowly wear down completely exhaust the drive's warranted write endurance in less than one year.


🔍 Evidence Analysis

Evidence 1: The "Churn" Gap

While the database file size remains relatively small, the internal counters reveal a massive amount of churn (data being written and then deleted).

MetricValue
Current File Size1.2 GiB
Currently Retained Rows506,149
Total Allocated Row IDs5,543,677,486

There is a 10,000x discrepancy between the rows currently stored and the total number of IDs generated. This suggests that over 10 TB of data has been cycled through the logs, even before considering write amplification from indexes, page rewrites, and filesystem overhead.

Evidence 2: Log Level Distribution

The bulk of the write volume is driven by low-priority telemetry.

Distribution by Level:

  • TRACE: 70.7% (~732.5 MiB)
  • INFO: 25.7% (~266.5 MiB)
  • DEBUG: 3.0% (~30.6 MiB)
  • WARN: 0.6% (~5.9 MiB)

Primary Offenders (Target + Level):

  1. codex_api::endpoint::responses_websocket (TRACE) \rightarrow 527.4 MiB
  2. codex_otel.log_only (INFO) \rightarrow 141.2 MiB
  3. codex_otel.trace_safe (INFO) \rightarrow 121.2 MiB
  4. log (TRACE) \rightarrow 97.4 MiB
  5. codex_client::transport (TRACE) \rightarrow 60.1 MiB

Note: Filtering out TRACE logs and the specific OpenTelemetry INFO categories would eliminate approximately 96% of the log volume.


📝 Log Sample Analysis

High-Frequency TRACE Logs

These logs capture repetitive system events and WebSocket internals.

// Inotify events (extremely frequent)
mask: OPEN, name: Some("ld.so.cache") 37,982x TRACE log: inotify event: ...
mask: OPEN, name: Some("locale.alias") 23,843x TRACE log: inotify event: ...
mask: OPEN, name: Some("passwd") 3,639x TRACE log: inotify event: ...

// WebSocket/Tokio internals
tokio-tungstenite checkout /src/compat.rs:131 AllowStd.with_context 3,505x TRACE log: ...
tokio-tungstenite checkout /src/lib.rs:245 WebSocketStream.with_context 3,362x TRACE log: ...
tokio-tungstenite checkout /src/compat.rs:154 Read.read 3,356x TRACE log: ...

Dominant INFO Logs

These consist primarily of mirrored OpenTelemetry events.

843x INFO codex_client::custom_ca: using system root certificates...
334x INFO codex_otel.trace_safe: session_loop{thread_id= redacted }:submission_dispatch...
333x INFO codex_otel.log_only: session_loop{thread_id= redacted }:submission_dispatch...

⚡ Write Amplification

The actual disk pressure is higher than the "retained" database size suggests. In a brief 15-second window, the following was observed:

MetricBeforeAfterDelta
Retained Rows681,774681,7740
Max Row ID5,003,347,0155,003,383,226+36,211

This proves that the system is writing tens of thousands of rows every few seconds, only to prune them immediately to keep the database size stable.

✅ Proposed Resolution Path

  • Disable TRACE level logging by default.
  • Filter out high-frequency inotify and tokio-tungstenite events.
  • Reduce the verbosity of codex_otel mirror logs.
  • Implement a more efficient log rotation or sampling strategy to reduce SSD wear.