OpenAI Debuts "Jalapeño": A Custom Inference Processor via Broadcom

The CEOs of OpenAI and Broadcom stand flanking a circular semiconductor wafer.

On Wednesday, June 24, 2026, OpenAI officially revealed its first bespoke inference processor. Developed in a strategic alliance with Broadcom, the hardware is designed to meet the highly specific demands of OpenAI's inference infrastructure.

The new silicon, dubbed Jalapeño, represents a significant milestone in the company's vertical integration. Interestingly, OpenAI disclosed that its own artificial intelligence models were utilized to help engineer the chip.

Performance and Strategic Goals

While the hardware is currently in the testing phase, initial data suggests a major leap in efficiency. Specifically, the chip demonstrates a superior ratio of:

$\text{Efficiency} = \frac{\text{Performance}}{\text{Watt}}$

This puts Jalapeño ahead of current state-of-the-art alternatives in terms of power consumption relative to output.

Feature	Detail
Chip Name	`Jalapeño`
Primary Partner	Broadcom
Primary Function	AI Inference
Key Advantage	Performance-per-watt
Design Method	AI-assisted development

The partnership, first teased in October, is a calculated move to ~~rely solely on Nvidia GPUs~~ diversify their hardware supply chain and decrease dependence on Nvidia. This mirrors strategies already employed by industry giants like Amazon and Google, who have developed their own "AI accelerators" to optimize machine learning workloads.

The Philosophy of "The Stack"

OpenAI President Greg Brockman discussed the rationale behind this venture on the company's internal podcast. He emphasized that their intimate knowledge of their own workloads allowed them to identify gaps in existing hardware.

"We have a deep understanding of the workload," Brockman noted. "We've really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what's possible?"

Jalapeño is tuned specifically for inference—the phase where a trained model generates a response to a user's prompt—rather than the initial training phase. OpenAI highlighted that the chip is particularly cost-effective when powering real-time coding models.

Infrastructure Integration

OpenAI is not just building a chip; they are optimizing every layer of their operational environment.

To achieve their goals of making AI faster and more affordable, the company is focusing on the following infrastructure components:

As the company stated in its announcement:

"Because OpenAI operates across the stack, each layer can be optimized around the same goal: making its models faster, more reliable, and more affordable for users."

While heavy-duty pre-training will likely continue to utilize Nvidia's hardware for the foreseeable future, optimizing the inference side of the equation is a critical step in improving the overall economics of generative AI.

About the Author: Russell Brandom is the AI Editor at TechCrunch, specializing in emerging tech and platform policy since 2012.

Event Logo