On the last day of OpenAI’s 12 days of ‘shipmas,’ the company unveiled its latest models, o3 and o3-mini, which excel at reasoning and even outperform o1 on a series of benchmarks, including math and science. At launch, OpenAI CEO Sam Altman said o3 was slated to drop at the end of January, and today, the company made good on its promise.
o3-mini
On Friday, OpenAI released its o3-mini model, the most cost-efficient model in OpenAI’s reasoning series, to the public. Until now, that series has been comprised of o1 and o1-mini. Like its predecessor, the model is particularly strong in science, math, and coding, according to the company.
When o3-mini is selected, it will use medium reasoning effort, which balances speed and accuracy. While the original o1 model still has broader general knowledge than o3-mini, the new model’s major advantage is its faster speed and higher performance compared to o1-mini.
Benchmark performance
When comparing the performance of o3-mini to o1-mini, expert testers found that o3-mini delivered more accurate, reasoned-through, and clearer responses than o1-mini. According to the post, they preferred o3-mini responses 56% of the time and observed a 39% reduction in major errors.
Beyond human preference evaluations, in several STEM benchmarks, including the Competition Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competition Code (Codeforces), o3-mini with medium reasoning — which is what ChatGPT users will get by default — outperformed o1-mini.
Also notable is that o3-mini, with high reasoning effort in the benchmarks, came close to o1 performance, sometimes even surpassing it, as seen in the AIME 2024 above and Software Engineering (SWE-bench Verified) benchmarks. The o3-mini model with medium reasoning effort matched o1’s performance in the Codeforces benchmark.
Safety
OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations. OpenAI posted the evaluation results below and also launched an o3-mini System Card, a 37-page PDF that includes the detailed results of the evaluations.
How to access
All subscribers to OpenAI’s paid tiers, including ChatGPT Plus, Team, and Pro, can access OpenAI o3-mini starting today. Plus and Team users now have three times the rate limit, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise access is coming in a week.
Also: Copilot’s powerful new ‘Think Deeper’ feature is free for all users – how it works
The o3-mini model will replace o1-mini in the model picker, as it would be useful for the same tasks, except that experience will now be improved with lower latency and higher rate limits. As a paid user, at the time of writing, I did not yet have access to the o3-mini, and am instead still seeing the o1-mini option.
If you don’t have a subscription, no worries: You can see if o3-mini is worth the hype from your free account. All free ChatGPT users have to do is click on “Reason” in the message textbox or regenerate a response. OpenAI CEO Sam Altman confirmed free access in a post on X. Until now, all the reasoning models have been kept behind a paywall; OpenAI did not specify any limitations around the new model for Free users.
https://www.zdnet.com/a/img/resize/d0f0a175e3e6bcd6268df8462f992e18c0738ceb/2025/01/31/f0f25de0-7aed-46a1-9c7d-2c2a7c1dfa69/r8hq2r8q.jpg?auto=webp&fit=crop&height=675&width=1200
2025-01-31 14:48:00