Munich 1991: The Roots of the Current AI Boom

Introduction by David Ha

When we observe the staggering scale of the contemporary Artificial Intelligence explosion, it is easy to overlook that the bedrock of this trillion-dollar sector was established over three decades ago in Munich.

While the tech giants of today are pouring hundreds of billions into the expansion of Large Language Models (LLMs) like ChatGPT, the broader public—and even many in the ML community—often forget that the primary architectural components of these systems were developed within a few months in 1991.

All of these breakthroughs originated from a single laboratory at the Technical University Munich, under the leadership of Jürgen Schmidhuber. By the end of that year, his team had effectively charted the course for modern deep learning.

The 1991 Blueprint for Modern AI

The lab's output during this period provided the essential ingredients for today's success:

The "T" (Transformer): They created the first variant of the Transformer.
The "P" (Pre-training): They pioneered the concept of unsupervised pre-training.
Distillation: They introduced neural network distillation.
Residual Learning: They developed deep residual learning, which is the heart of both LSTMs and ResNets (the most cited AI works of the 20th and 21st centuries, respectively).
Generative AI: They laid the groundwork for Generative Adversarial Networks (GANs).

These concepts have profoundly influenced my own career, from my tenure at Google Brain to the current research on Recursive Self-Improvement (RSI) at Sakana AI. I am particularly proud of my 2018 work on World Models, which was a direct evolution of the 1990s Munich research.

The 1991 Timeline: Annotated by Jürgen Schmidhuber

I take great pride in the milestones my team achieved in my hometown during an era when ~~compute was readily available~~ compute was millions of times more expensive than it is today.

Key Publications (March – August 1991)

Date	Innovation	Modern Application / Impact
March 26	Unnormalized Linear Transformer	Predecessor to the quadratic Transformer; highly efficient.
April 30	Unsupervised Pre-Training	The "P" in ChatGPT; essential for LLM initialization.
April 30	NN Distillation	Central to the 2025 DeepSeek "Sputnik" and other LLMs.
June 15	Deep Residual Learning	Core of LSTMs and the most cited 21st-century AI papers.
Aug 31	Generative & Adversarial Nets	Foundation for World Models and (controversially) deepfakes.

Technical Nuances

The unnormalized linear Transformer remains relevant today due to its efficiency. While standard Transformers scale quadratically, this version scales linearly: $\text{Computational Cost} \approx O(n) \text{ vs. } O(n^2)$

Regarding Deep Residual Learning, this concept (detailed in Sepp Hochreiter's diploma thesis) enabled the creation of Highway Nets, which were 10 times deeper than any previous feedforward networks. This architecture is now ubiquitous in virtually every LLM.

The Path to AGI

By 1991, it was already evident that LLM-style networks alone would not suffice to reach Artificial General Intelligence (AGI). To bridge this gap, we focused on:

Planning with adaptive world models
Artificial Scientists (creating new questions, not just answering them)
Meta-learning
Recursive self-improvement (started in 1987)

Broader Context and Reflections

During this same period, Munich was also the birthplace of the first autonomous vehicles in real traffic, developed by Ernst Dickmanns' team, which reached speeds of 175 km/h.

However, over the last 30 years, the commercial center of gravity for AI has shifted toward the Pacific Rim, moving away from its European roots. To put the economic shift in perspective:

In 1995, the combined GDP of Germany and Japan was nearly equal (1:1) to the combined GDP of the USA and China.

Final Thoughts

I still hold onto my teenage vision from the 1970s: to build an entity significantly more intelligent than myself, allowing me to finally retire.

Acknowledgments: Thanks to the expert reviewers. This content is available for non-commercial and educational use (e.g., Wikipedia).

License: Creative Commons License This work is licensed under a CC BY-NC-SA 4.0 International License.

Additional Note on Artificial Scientists: These systems achieve curiosity through the principle of generative adversa... (text cuts off). Perhaps self-replicating AI-driven all-purpose robots are the ultimate solution.