OpenAI Partners With Cerebras to Launch Ultra Fast Coding Model Without Nvidia

TIG SEA2 hours ago

44 2 minutes read

OpenAI and Cerebras collaboration showcasing high speed AI coding model infrastructure

OpenAI has made a major move in the AI development space by unveiling GPT-5.3 Codex Spark, its first code focused model designed to run on non-Nvidia hardware. Instead of relying on traditional GPU acceleration, the new model is powered by chips from Cerebras, marking a significant shift in OpenAI’s infrastructure strategy.

The headline feature of Codex Spark is speed. The model is capable of generating code at up to 1,000 tokens per second, representing a performance leap of roughly 15 times faster than OpenAI’s previous coding focused models.

A New Benchmark for Coding Speed

This level of throughput places Codex Spark far ahead of many competitors. Even highly optimized models such as Anthropic’s Claude Opus 4.6 reportedly achieve only around 2.5 times standard speed. For developers, this translates into near instant feedback when generating, refactoring, or prototyping code.

The performance improvement is especially notable when compared to OpenAI’s earlier Nvidia based models like GPT-4o or o3-mini, which struggled to exceed 200 tokens per second in real world use.

Designed for Speed Over Depth

Unlike the full GPT-5.3 Codex, which is built for complex reasoning and difficult programming tasks, Codex Spark is intentionally optimized for raw speed. OpenAI has positioned it as a specialist model rather than a general purpose AI.

Internal testing on benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0 shows Codex Spark significantly outperforming the older GPT-5.1 Codex mini, often completing tasks in a fraction of a second.

At launch, the model supports text only input and output, with a maximum context window of 128,000 tokens, making it suitable for large codebases and extended sessions.

Powered by Cerebras Wafer Scale Hardware

The speed gains are driven by Cerebras’ Wafer Scale Engine 3, a processor roughly the size of a dinner plate. This architecture allows massive parallelism and minimizes data transfer bottlenecks that traditionally limit inference speed.

According to OpenAI’s Head of Compute Sachin Katti, Cerebras has proven to be an exceptional engineering partner. He noted that integrating this level of high speed inference opens up new possibilities for developer tools and real time coding workflows.

Availability and Pricing

Codex Spark is currently available to ChatGPT Pro subscribers, priced at $200 USD per month, and can be accessed through the Codex application as well as a dedicated VS Code extension. This positioning clearly targets professional developers and teams who value speed and iteration efficiency.

Reducing Dependence on Nvidia

Beyond performance, this launch signals a strategic shift for OpenAI. By deploying production models on Cerebras hardware, the company is actively reducing its reliance on Nvidia. This follows earlier moves involving partnerships with AMD and Amazon, as well as reports that OpenAI is designing its own custom chips for future manufacturing through TSMC.

Industry observers interpret this as a response to both supply constraints and dissatisfaction with inference speed on some Nvidia platforms.

A Heated Race in AI Coding Tools

The AI coding space is becoming increasingly competitive, with companies like Google and Anthropic accelerating development of their own tools. Codex Spark’s 1,000 token per second performance sets a new bar, potentially changing how developers interact with AI assisted coding systems.

However, OpenAI also cautions that extreme speed may come with trade offs. While responsiveness improves dramatically, developers are still encouraged to carefully review outputs for accuracy before deploying code into production environments.

Origin: Arstechnica