← Back to news

Mistral OCR 4

mistral.ai|281 points|71 comments|by meetpateltech|Jun 23, 2026

Introducing Mistral OCR 4: The New Standard for Document Intelligence

Published: June 23, 2026 | Author: Mistral AI | Read Time: 10\approx 10 minutes

Mistral AI is proud to unveil Mistral OCR 4, a state-of-the-art optical character recognition model designed to transform how enterprises handle document intelligence. Unlike traditional tools, OCR 4 doesn't just read text; it understands the geometry and context of a page.

🚀 Core Capabilities & Innovations

The leap from previous versions to OCR 4 represents a shift from simple text conversion to structured document representation.

Old OCR: Page \rightarrow Plain Text OCR 4: Page \rightarrow Structured Data Object

Key Technical Features

  • Bounding Boxes: Precise localization of every text element for highlighting and data mapping.
  • Block Classification: Automatic identification of content types, including:
    • Titles and Headers
    • Tables
    • Mathematical Equations
    • Signatures
  • Confidence Scoring: Inline scores provided at both the word and page level to ensure data integrity.

Deployment & Accessibility

The model is designed for flexibility, supporting various enterprise needs:

  1. Self-Hosted: Runs in a single container, ensuring data sovereignty and compliance.
  2. API Access: Rapid integration for developers.
  3. Mistral Studio: A no-code path via Document AI for non-technical teams.

🛠 Integration and Ecosystem

Mistral OCR 4 serves as a critical ingestion engine for the Search Toolkit, Mistral's open-source composable search framework. This allows for a seamless pipeline from raw document to actionable insight.

Supported Formats & Languages

The model is exceptionally versatile, handling common enterprise files like PDF, DOC, PPT, and OpenDocument.

FeatureSpecification
Total Languages170
Language Groups10
SpecializationHigh performance on low-resource languages
DeploymentSingle-container / Self-managed

📈 Performance & Benchmarks

In rigorous testing, Mistral OCR 4 outperformed the competition. Independent annotators gave it a 72% average win rate against other leading document-AI systems.

  • OlmOCRBench Score: Score=85.20\text{Score} = 85.20 (Top overall ranking).
  • Efficiency: Significant gains in specialized languages where other models typically fail.

"We benchmarked Mistral OCR 4 against the leading agentic document parsers across a chart and figure dense financial QA dataset and reached equivalent accuracy at roughly 8x lower cost and 17x lower latency. For production use cases at scale, that delta compounds fast." — Aidan Donohue, AI Engineer, Rogo


💰 Pricing Structure

Mistral offers a competitive pricing model to support both real-time and high-volume processing.

Cost Calculation: Total Cost=Pages×(Rate1000)\text{Total Cost} = \text{Pages} \times \left( \frac{\text{Rate}}{1000} \right)

TierPrice per 1,000 PagesNote
Standard API$4.00Real-time processing
Batch API$2.0050% discount for asynchronous tasks

🎯 Use Case Checklist

How can your organization utilize Mistral OCR 4?

  • RAG Optimization: Use semantic chunking to create higher-quality retrieval units.
  • Agentic Automation: Enable agents to perform form filling and compliance checks.
  • Data Redaction: Use bounding boxes and block types to automatically scrub sensitive info.
  • Human-in-the-Loop: Use confidence scores to flag low-certainty regions for manual review.

Example Structured Output

Below is a conceptual representation of how the model returns data:

{
  "block_id": 102,
  "type": "table",
  "confidence": 0.98,
  "bounding_box": [120, 45, 500, 300],
  "content": "Quarterly Revenue: $4.2M"
}

Document Intelligence Concept

For more information on integrating Mistral OCR 4 into your pipeline, visit the API Reference or contact sales for enterprise deployment options.