OpenAI releases GPT-5.2 after “code red” Google threat alert

In an effort to keep up with (or stay ahead of) the competition, model releases have been steady, with GPT-5.2 representing OpenAI's third major model release since August. GPT-5 launched this month with a new routing system that switches between instant response and simulated reasoning modes, although users complained about responses that seemed cold and clinical. November GPT-5.1 The update added eight preset “customization” options and was aimed at making the system more interactive.

The numbers are growing

Oddly enough, although the release of the GPT-5.2 model is ostensibly in response to the performance of Gemini 3, OpenAI has decided not to publish any tests comparing the two models on its promotional site. Instead the official blog post focuses on improvements to GPT-5.2 over its predecessors and its performance on the new OpenAI. GDPval A benchmark that attempts to measure occupational knowledge objectives across 44 occupations.

During the press briefing, OpenAI shared some comparison tests of competitors including the Gemini 3 Pro and Close work 4.5 but rejected the idea that GPT-5.2 was rushed to market in response to Google's actions. “It’s important to note that this has been in the works for many, many months,” Simo said reporters, although the timing of its publication, we note, is a strategic decision.

According to the overall data, GPT-5.2 Thinking scored 55.6% points. SWE-Bench Prosoftware development benchmark, compared to 43.3 percent for Gemini 3 Pro and 52.0 percent for Claude Opus 4.5. On GPQA Diamondscientific benchmark for graduates, GPT-5.2 scored 92.4 percent versus 91.9 percent for Gemini 3 Pro.

GPT-5.2 tests that OpenAI shared with the press.


Credit:

OpenAI / Venturebit


OpenAI claims that GPT-5.2 Thinking outperforms or beats “human professionals” on 70.9 percent of tasks in the GDPval test (compared to 53.3 percent for Gemini 3 Pro). The company also claims that the model performs these tasks more than 11 times faster and at less than 1 percent of the cost of human experts.

GPT-5.2 reasoning also reportedly generates responses with 38 percent fewer confabulations than GPT-5.1, according to Max Schwarzer, head of post-training at OpenAI, who said VentureBeat claims the model is “substantially less hallucinatory” than its predecessor.

However, we always take the criteria with a grain of salt because it's easy to frame them in a positive way for the company, especially when the science of objectively measuring AI performance hasn't quite caught up with corporate sales pitches for human AI capabilities.

It will take time to obtain independent test results from researchers outside of OpenAI. In the meantime, if you're using ChatGPT for production workloads, expect competent models with incremental improvements and some better encoding performance thrown in for good measure.

Leave a Comment