GPT‑NL: a sovereign language model for the Netherlands
GPT‑NL: Establishing a Sovereign Language Model for the Netherlands
As AI driven by language becomes a cornerstone of education, public administration, and the modern workplace, the need for a localized approach has become evident. While tools like ChatGPT highlight the massive potential for productivity and innovation, they also raise critical questions regarding transparency, copyright, and the protection of privacy.
To address these concerns, TNO, in partnership with SURF and the Netherlands Forensic Institute (NFI), is developing GPT‑NL. This project aims to create an independent Dutch language ecosystem, bolstering the digital autonomy of both the Netherlands and the broader European region.
"GPT‑NL demonstrates that high-performance AI can coexist with a steadfast commitment to public values, ensuring technology serves the people rather than just extracting data."
🏛️ The Core Pillars of GPT‑NL
The project is guided by four fundamental values designed to ensure the model is responsible and contextually aware.
| Value | Primary Objective | Key Outcome |
|---|---|---|
| Sovereign | Localized control | Independence from non-EU providers |
| Transparent | Open documentation | Public insight into data and bias |
| Trustworthy | Data integrity | Zero "inherited" copyright risks |
| Reciprocal | Fair exchange | Shared value with data providers |
1. Sovereignty: Digital Independence
By developing GPT‑NL within European borders, the consortium maintains total authority over the model's architecture and the data it consumes. This prevents dependency on foreign tech giants and ensures the AI aligns with European laws and societal norms.
2. Openness and Transparency
Transparency is baked into the process. The team documents every decision regarding training and data collection.
- Open Source: The source code is made public.
- Dataset Insights: Detailed information about the training data is shared.
- User Control: Clear mechanisms are in place for data opt-outs and updates.
3. Trustworthiness: A Clean Slate
Unlike many models that are fine-tuned from existing datasets, GPT‑NL is trained from scratch. This eliminates the risk of inheriting "black box" data or hidden personal information.
Data Collection Checklist:
- Protect intellectual property rights.
- Anonymize and remove personal identifiers.
- Filter out confidential information.
- Scrub harmful or toxic content.
- Eliminate redundant data to ensure efficiency.
4. Reciprocity: Fair Value Distribution
GPT‑NL rejects the "extractive" model of AI. Instead, it utilizes a lawful supply chain where data providers are active partners. This is managed via a Content Board, giving rights holders a direct voice in the model's evolution.
⚙️ Technical Execution & Sustainability
Developing a Large Language Model (LLM) is resource-intensive. The team focuses on optimizing the balance between performance and environmental impact.
Resource Optimization
The goal is to minimize the environmental footprint using the following logic:
The team uses NLP (Natural Language Processing) and ASR (Automatic Speech Recognition) research to optimize model size and training cycles.
# Conceptual representation of the GPT-NL transparency goal
def model_governance(dataset, source_code):
if dataset.is_transparent and source_code.is_open_source:
return "Sovereign AI"
else:
return "Proprietary AI"
print(model_governance(gpt_nl_data, gpt_nl_code))
# Output: Sovereign AI
Ecosystem Flow
💰 Funding and Accountability
The project is publicly funded by the Netherlands Enterprise Agency (RVO) on behalf of the Ministry of Economic Affairs and Climate Policy. This ensures that the resulting technology remains a public good, accountable to the citizens it serves.
👥 Behind the Scenes
Saskia Lensink (Product Manager) and Frank Brinkkemper (R&D Manager) are leading the charge into the next phase of development.
About Saskia Lensink:
Saskia is a Consultant and Business Developer specializing in NLP and ASR. She works across various consortia to champion high-performing, sovereign European LLMs.
- Location: Den Haag - New Babylon
- Focus: Language and speech technologies.
Listen In: Saskia discusses the challenges of building a world-class LLM without a "Silicon Valley budget" in the latest Media Innovation Podcast.
🚀 Related AI Insights & Events
The broader ecosystem continues to evolve with a focus on "Futureproof AI."
- Event: Impact Acceleration Challenge: Futureproof AI (June 18, WTC Amsterdam) — Pitching sustainable and sovereign AI solutions.
- Insight: The Challenge of Evaluating Generative AI (Feb 2026) — Measuring shifting targets.
- Insight: Critical Thinking in GenAI (Jan 2026) — Balancing skepticism with trust.
- Insight: GenAI Governance (Dec 2025) — Moving from reactive to proactive control.