OpenAI launched its best new AI model in September. It already has challengers, one from China and another from Google.

September is a lifetime ago in the AI industry

When OpenAI previewed its o1 model in September, the product was hailed as a breakthrough. It uses a new approach called inference-time compute to answer more challenging questions.

It does this by slicing queries into more digestible tasks and turning each of these stages into a new prompt that the model tackles. Each step requires running a new request, which is known as the inference stage in AI.

This produces a chain of thought or chain of reasoning in which each part of the problem is answered, and the model doesn't move on to the next stage until it ultimately comes up with a full response.

The model can even backtrack and check its prior steps and correct errors, or try solutions and fail before trying something else. This is akin to how humans spend longer working through complex tasks.

DeepSeek rises

In a mere two months, o1 had a rival. On November 20, a Chinese AI company released DeepSeek.

"They were probably the first ones to reproduce o1," said Charlie Snell, an AI researcher at UC Berkeley who coauthored a Google DeepMind paper this year on inference-time compute.

He's tried DeepSeek's AI model and says it performs well on complex math problems that must be solved by thinking for longer and in stages.

He noted that in DeepSeek's DeepThink mode, the model shows users every step of its thought process. With o1, these intermediate steps are hidden from users.

"I've asked people at OpenAI what they think of it," Snell told BI. "They say it looks like the same thing, but they don't how DeepSeek did this so fast."

OpenAI didn't respond to a request for comment. On Friday, the startup previewed an o1 successor, called o3. Francois Chollet, a respected AI expert, called the update a "significant breakthrough."

Andrej Karpathy, an OpenAI cofounder, praised Google's new "Thinking" model for the same reasoning feature.

"The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model are shown," he wrote on X. "As a user I personally really like this because the reasoning itself is interesting to see and read — the models actively think through different possibilities, ideas, debate themselves, etc., it's part of the value add."

A DeepSeek demo

Snell shared a multistep math problem with Business Insider, which we used to test DeepSeek for ourselves:

"Find a sequence of +, -, /, * which can be applied to the numbers 7, 3, 11, 5 to get to 24, where each of the given numbers is used exactly once."

BI put that prompt in DeepSeek's chat window on its website. The model responded initially by laying out the challenge ahead.

"Alright, so I've got this problem here: I need to use the numbers 7, 3, 11, and 5, and combine them with the operations of addition, subtraction, multiplication, and division, using each number exactly once, to get to 24," it replied. "At first glance, this seems a bit tricky, but I think with some systematic thinking, I can figure it out."

It then proceeded through multiple steps over roughly 16 pages of discussion that included mathematical calculations and equations. The model sometimes got it wrong, but it spotted this and didn't give up. Instead, it swiftly moved on to another possible solution.

"Almost got close there with 33 / 7 * 5 ≈ 23.57, but not quite 24. Maybe I need to try a different approach," it wrote at one point.

After a few minutes, it found the correct solution.

"You can see it try different ideas and backtrack," Snell said in an interview on Wednesday. He highlighted this part of DeepSeek's chain of thought as particularly noteworthy:

"This is getting really time-consuming. Maybe I need to consider a different strategy," the AI model wrote. "Instead of combining two numbers at a time, perhaps I should look for a way to group them differently or use operations in a nested manner."

Then Google appears

Snell said other companies are likely working on AI models that use the same inference-time compute approach as OpenAI.

"DeepSeek does this already, so I assume others are working on this," he added on Wednesday.

The following day, Google released Gemini 2.0 Flash Thinking. Like DeepSeek, this new model shows users each step of its thought process while tackling problems.

Jeff Dean, a Google AI veteran, shared a demo on X that showed this new model solving a physics problem and explained its reasoning steps.

"This model is trained to use thoughts to strengthen its reasoning," Dean wrote. "We see promising results when we increase inference time computation!"

Read the original article on Business Insider

OpenAI rolls out the full version of o1, its hot reasoning model

Latest News

By: Blake Dodge

5 December 2024 at 11:23

OpenAI CEO Sam Altman.

Jason Redmond/AFP/Getty Images

OpenAI released the full version of its o1 reasoning model on Thursday.
It says the o1 model, initially previewed in September, is now multimodal, faster, and more precise.
It was released as part of OpenAI's 12-day product and demo launch, dubbed "shipmas."

On Thursday, OpenAI released the full version of its hot new reasoning model as part of the company's 12-day sprint of product launches and demos.

The model, known as o1, was released in a preview mode in September. OpenAI CEO Sam Altman said during day one of the company's livestream that the latest version was more accurate, faster, and multimodal. Research scientists on the livestream said an internal evaluation indicated it made major mistakes about 34% less often than the o1 preview mode.

The model, which seems geared toward scientists, engineers, and coders, is designed to solve thorny problems. The researchers said it's the first model that OpenAI trained to "think" before it responds, meaning it tends to give more detailed and accurate responses than other AI helpers.

To demonstrate o1's multimodal abilities, they uploaded a photo of a hand-drawn system for a data center in space and asked the program to estimate the cooling-panel area required to operate it. After about 10 seconds, o1 produced what would appear to a layperson as a sophisticated essay rife with equations, ending with what was apparently the right answer.

The researchers think o1 should be useful in daily life, too. Whereas the preview version could think for a while if you merely said hi, the latest version is designed to respond faster to simpler queries. In Thursday's livestream, it was about 19 seconds faster than the old version at listing Roman emperors.

All eyes are on OpenAI's releases over the next week or so, amid a debate about how much more dramatically models like o1 can improve. Tech leaders are divided on this issue; some, like Marc Andreessen, argue that AI models aren't getting noticeably better and are converging to perform at roughly similar levels.

With its 12-day deluge of product news, dubbed "shipmas," OpenAI may be looking to quiet some critics while spreading awkward holiday cheer.

"It'll be a way to show you what we've been working on and a little holiday present from us," Altman said on Thursday.

Read the original article on Business Insider

Reading view

September is a lifetime ago in the AI industry

DeepSeek rises

A DeepSeek demo

Then Google appears