Munich 1991: The Roots of the Current AI Boom
Munich 1991: The Roots of the Current AI Boom
Introduction by David Ha
When we observe the staggering scale of the contemporary Artificial Intelligence explosion, it is easy to overlook that the bedrock of this trillion-dollar sector was established over three decades ago in Munich.
While the tech giants of today are pouring hundreds of billions into the expansion of Large Language Models (LLMs) like ChatGPT, the broader public—and even many in the ML community—often forget that the primary architectural components of these systems were developed within a few months in 1991.
All of these breakthroughs originated from a single laboratory at the Technical University Munich, under the leadership of Jürgen Schmidhuber. By the end of that year, his team had effectively charted the course for modern deep learning.
The 1991 Blueprint for Modern AI
The lab's output during this period provided the essential ingredients for today's success:
- The "T" (Transformer): They created the first variant of the Transformer.
- The "P" (Pre-training): They pioneered the concept of unsupervised pre-training.
- Distillation: They introduced neural network distillation.
- Residual Learning: They developed deep residual learning, which is the heart of both LSTMs and ResNets (the most cited AI works of the 20th and 21st centuries, respectively).
- Generative AI: They laid the groundwork for Generative Adversarial Networks (GANs).
These concepts have profoundly influenced my own career, from my tenure at Google Brain to the current research on Recursive Self-Improvement (RSI) at Sakana AI. I am particularly proud of my 2018 work on World Models, which was a direct evolution of the 1990s Munich research.
The 1991 Timeline: Annotated by Jürgen Schmidhuber
I take great pride in the milestones my team achieved in my hometown during an era when compute was readily available compute was millions of times more expensive than it is today.
Key Publications (March – August 1991)
| Date | Innovation | Modern Application / Impact |
|---|---|---|
| March 26 | Unnormalized Linear Transformer | Predecessor to the quadratic Transformer; highly efficient. |
| April 30 | Unsupervised Pre-Training | The "P" in ChatGPT; essential for LLM initialization. |
| April 30 | NN Distillation | Central to the 2025 DeepSeek "Sputnik" and other LLMs. |
| June 15 | Deep Residual Learning | Core of LSTMs and the most cited 21st-century AI papers. |
| Aug 31 | Generative & Adversarial Nets | Foundation for World Models and (controversially) deepfakes. |
Technical Nuances
The unnormalized linear Transformer remains relevant today due to its efficiency. While standard Transformers scale quadratically, this version scales linearly:
Regarding Deep Residual Learning, this concept (detailed in Sepp Hochreiter's diploma thesis) enabled the creation of Highway Nets, which were 10 times deeper than any previous feedforward networks. This architecture is now ubiquitous in virtually every LLM.
The Path to AGI
By 1991, it was already evident that LLM-style networks alone would not suffice to reach Artificial General Intelligence (AGI). To bridge this gap, we focused on:
- Planning with adaptive world models
- Artificial Scientists (creating new questions, not just answering them)
- Meta-learning
- Recursive self-improvement (started in 1987)
Broader Context and Reflections
During this same period, Munich was also the birthplace of the first autonomous vehicles in real traffic, developed by Ernst Dickmanns' team, which reached speeds of 175 km/h.
However, over the last 30 years, the commercial center of gravity for AI has shifted toward the Pacific Rim, moving away from its European roots. To put the economic shift in perspective:
In 1995, the combined GDP of Germany and Japan was nearly equal (1:1) to the combined GDP of the USA and China.
Final Thoughts
I still hold onto my teenage vision from the 1970s: to build an entity significantly more intelligent than myself, allowing me to finally retire.
Acknowledgments: Thanks to the expert reviewers. This content is available for non-commercial and educational use (e.g., Wikipedia).
License:
This work is licensed under a CC BY-NC-SA 4.0 International License.
Additional Note on Artificial Scientists: These systems achieve curiosity through the principle of generative adversa... (text cuts off). Perhaps self-replicating AI-driven all-purpose robots are the ultimate solution.