GPT-5.5 and Codex: What the Hype Missed and What Actually Matters

Apr 25, 2026Dishant Sharma6 min read

An NVIDIA engineer who tested GPT-5.5 early access said losing it felt like losing a limb. Dan Shipper, CEO of Every, called it the first coding model with "serious conceptual clarity." Pietro Schirano watched it merge hundreds of frontend changes into a main branch that had also shifted substantially, finishing in one shot in about 20 minutes. These are not casual impressions. These are people who build with AI daily, and they sound genuinely surprised.

GPT-5.5 dropped on April 23, 2026. Codex got a massive desktop update a week earlier on April 16. OpenAI dropped both in the same month, and the internet lost its mind for about 48 hours. Then the complaints started.

What GPT-5.5 actually is

Here is the short version. GPT-5.5 is the first fully retrained base model OpenAI has shipped since GPT-4.5. Everything between 4.5 and 5.5 was incremental. This one is a new foundation.

The benchmarks look strong. 82.7% on Terminal-Bench 2.0. 78.7% on OSWorld-Verified. 58.6% on SWE-Bench Pro. 73.1% on Expert-SWE, an internal eval where the median human completion time is 20 hours.

But benchmarks lie, so here is what matters more. GPT-5.5 uses 40% fewer tokens than GPT-5.4 on the same Codex tasks. It matches GPT-5.4 per-token latency despite being a bigger, smarter model. That is not nothing. That is the kind of efficiency improvement that actually changes how you work with it.

The model helped build the infrastructure that serves it. OpenAI used Codex and GPT-5.5 to optimize their own inference stack on NVIDIA GB200 and GB300 systems. One result was custom load-balancing heuristics that boosted token generation speed by over 20%. That is a weird, recursive detail that most coverage skipped.

The Codex desktop update

A week before GPT-5.5, OpenAI shipped a major Codex update that barely got attention because everyone was waiting for the model. That update matters more than people realize.

Here is what changed:

Background computer use: Codex agents can now see your screen, click, and type with their own cursor in the background. Multiple agents can work in parallel on your Mac without blocking your own work.
In-app browser: You can comment directly on web pages to give Codex precise instructions. Built for frontend and game dev iteration.
Image generation with gpt-image-1.5: Create mockups, game assets, and product visuals inside the same workflow.
Memory preview: Codex remembers preferences, corrections, and context from previous sessions.
90+ new plugins: Atlassian Rovo, CircleCI, GitLab Issues, Microsoft Suite, Neon, Render, and more.
SSH remote devbox support: Connect to remote machines directly.
Automated scheduling: Codex can schedule future work for itself and wake up automatically to continue long-running tasks.

The Codex desktop app launched February 2, 2026 for macOS. Windows followed March 4. This April 16 update is the first real expansion that makes it competitive with Claude Code and Cursor as a full-workflow tool.

The price problem

$5 per million input tokens. $30 per million output tokens. That is exactly double what GPT-5.4 costs.

Reddit noticed immediately. A thread on r/theprimeagen with 40+ comments called out the doubling. Someone on r/codex asked if the improvements justify the price bump. The top reply was blunt: "that's why I still use gpt-5.3-codex as my backbone model. best performance over cost so far."

OpenAI's argument is that the 40% token reduction offsets the higher per-token price. That math works if you actually use fewer tokens. It does not work if your workload expands to fill whatever budget the model gives you, which is what usually happens with better tools.

GPT-5.5 Pro is even steeper: $30 input, $180 output. That is not a typo.

What the internet gets wrong about model releases

Every time a frontier model drops, the same cycle plays out. Leaks surface. Codenames get romanticized. People convince themselves the next one will change everything overnight. Then it ships, and the hot take splits into "this is everything" and "this is nothing."

It happened with GPT-5.2. That model launched to similar hype. Four months later it was sitting at number 15 on the LMArena leaderboard, behind GPT-5.1 and Claude 4.5. The people who built real products with it did not care about the ranking. The people who were watching the ranking moved on to the next one.

GPT-5.5 will probably follow the same arc. The real question is not whether it beats Claude Opus 4.7 on Terminal-Bench by 0.7 points. The real question is whether the Codex desktop app, with computer use and memory and background agents, changes how a developer spends their morning.

The Spud backlash

Internally, GPT-5.5 was codenamed "Spud." The hype around that codename got absurd. People on r/accelerate compared it to Anthropic's "Mythos" and treated it like it would be GPT-6.

Then it launched, and the reaction split in half.

On r/singularity, a post titled "Big model feel with GPT 5.5" got 70+ comments. The top sentiment was genuine surprise at how much better it felt in practice. On r/vibecoding, someone said "i get the hype but opus still has something that isn't reflected on the benchmarks."

The most honest take came from someone on r/accelerate: "this was hyped as if it was gpt 6, not a 0.1 improvement. from many people. please keep yourselves to a higher standard than that."

That thread had 20+ comments and was posted just 21 hours after launch.

A math professor built an algebraic geometry app in 11 minutes

Bartosz Naskrecki, an assistant professor of mathematics in Poland, gave Codex a single prompt to build an algebraic geometry app. The app visualizes quadratic surface intersections and converts the resulting curves into Weierstrass models using the computational Riemann-Roch theorem.

It worked. Eleven minutes. He extended it with stable singularity visualization and exact reusable coefficients.

He said the bigger shift is not any single app. It is that Codex can now help implement custom mathematical visualization and computer-algebra workflows that previously required dedicated tools. That is a specific, nerdy detail. But it is the kind of thing that shows where this is actually heading.

OpenAI also claims an internal version of GPT-5.5 found a new proof about off-diagonal Ramsey numbers, later verified in Lean. Ramsey numbers are one of the central objects in combinatorics. Results in this area are rare and technically difficult. If that holds up, it is a genuine research contribution, not just a coding trick.

The real picture

GPT-5.5 is good. Probably the best model you can use right now for agentic coding and complex task execution. But the gap between the Spud hype and the actual product annoyed people who expected a paradigm shift.

And the doubling of API prices at a time when Anthropic and Google are pushing hard on competitive pricing makes the value conversation more complicated than the benchmark charts suggest.

If you are a ChatGPT Plus subscriber, you get it for the same $20/month. That is a good deal. If you are running it through an API at scale, you need to do real math on whether the token savings actually cover the price jump.

The Codex desktop update is the quieter story. Computer use in the background, memory, automated scheduling. That is the infra that makes the model actually useful in a daily workflow. Without it, GPT-5.5 is just a smarter API endpoint.

i am still thinking about that NVIDIA engineer's quote. "losing access to GPT-5.5 feels like i've had a limb amputated." Maybe he was exaggerating. Maybe not. But no one said anything close to that about GPT-5.4.