Reading view

There are new articles available, click to refresh the page.

OpenAI trained o1 and o3 to ‘think’ about its safety policy

OpenAI announced a new family of AI reasoning models on Friday, o3, which the startup claims to be more advanced than o1 or anything else it’s released. These improvements appear to have come from scaling test-time compute, something we wrote about last month, but OpenAI also says it used a new safety paradigm to train […]

© 2024 TechCrunch. All rights reserved. For personal use only.

OpenAI’s GPT-5 reportedly falling short of expectations

OpenAI’s efforts to develop its next major model, GPT-5, are running behind schedule, with results that don’t yet justify the enormous costs, according to a new report in The Wall Street Journal. This echoes an earlier report in The Information suggesting that OpenAI is looking to new strategies as GPT-5 might not represent as big […]

© 2024 TechCrunch. All rights reserved. For personal use only.

OpenAI announces new o3 model — but you can’t use it yet

Welcome back to Week in Review. This week, we’re looking at OpenAI’s last — and biggest — announcement from its “12 Days of OpenAI” event; Apple’s potential entrance into the foldable market; and why Databricks is choosing to wait to go public. Let’s get into it. P.S. We’re off for the holidays! Week in Review […]

© 2024 TechCrunch. All rights reserved. For personal use only.

12 days of OpenAI: The Ars Technica recap

Over the past 12 business days, OpenAI has announced a new product or demoed an AI feature every weekday, calling the PR event "12 days of OpenAI." We've covered some of the major announcements, but we thought a look at each announcement might be useful for people seeking a comprehensive look at each day's developments.

The timing and rapid pace of these announcements—particularly in light of Google's competing releases—illustrates the intensifying competition in AI development. What might normally have been spread across months was compressed into just 12 business days, giving users and developers a lot to process as they head into 2025.

Humorously, we asked ChatGPT what it thought about the whole series of announcements, and it was skeptical that the event even took place. "The rapid-fire announcements over 12 days seem plausible," wrote ChatGPT-4o, "But might strain credibility without a clearer explanation of how OpenAI managed such an intense release schedule, especially given the complexity of the features."

Read full article

Comments

© J Studios via Getty Images

OpenAI announces o3 and o3-mini, its next simulated reasoning models

On Friday, during Day 12 of its "12 days of OpenAI," OpenAI CEO Sam Altman announced its latest AI "reasoning" models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls "private chain of thought," where the model pauses to examine its internal dialog and plan ahead before responding, which you might call "simulated reasoning" (SR)—a form of AI that goes beyond basic large language models (LLMs).

The company named the model family "o3" instead of "o2" to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday's livestream, Altman acknowledged his company's naming foibles, saying, "In the grand tradition of OpenAI being really, truly bad at names, it'll be called o3."

Read full article

Comments

© Benj Edwards / Andriy Onufriyenko via Getty Images

OpenAI announces new o3 models

OpenAI saved its biggest announcement for the last day of its 12-day “shipmas” event. On Friday, the company unveiled o3, the successor to the o1 “reasoning” model it released earlier in the year. o3 is a model family, to be more precise — as was the case with o1. There’s o3 and o3-mini, a smaller, […]

© 2024 TechCrunch. All rights reserved. For personal use only.

OpenAI launched its best new AI model in September. It already has challengers, one from China and another from Google.

Sam Altman sits in front of a blue background, looking to the side.
OpenAI CEO Sam Altman.

Andrew Caballero-Reynolds/AFP/Getty Images

  • OpenAI's o1 model was hailed as a breakthrough in September.
  • By November, a Chinese AI lab had released a similar model called DeepSeek.
  • On Thursday, Google came out with a challenger called Gemini 2.0 Flash Thinking.

In September, OpenAI unveiled a radically new type of AI model called o1. In a matter of months, rivals introduced similar offerings.

On Thursday, Google released Gemini 2.0 Flash Thinking, which uses reasoning techniques that look a lot like o1.

Even before that, in November, a Chinese company announced DeepSeek, an AI model that breaks challenging questions down into more manageable tasks like OpenAI's o1 does.

This is the latest example of a crowded AI frontier where pricey innovations are swiftly matched, making it harder to stand out.

"It's amazing how quickly AI model improvements get commoditized," Rahul Sonwalkar, CEO of the startup Julius AI, said. "Companies spend massive amounts building these new models, and within a few months they become a commodity."

The proliferation of multiple AI models with similar capabilities could make it difficult to justify charging high prices to use these tools. The price of accessing AI models has indeed plunged in the past year or so.

That, in turn, could raise questions about whether it's worth spending hundreds of millions of dollars, or even billions, to build the next top AI model.

September is a lifetime ago in the AI industry

When OpenAI previewed its o1 model in September, the product was hailed as a breakthrough. It uses a new approach called inference-time compute to answer more challenging questions.

It does this by slicing queries into more digestible tasks and turning each of these stages into a new prompt that the model tackles. Each step requires running a new request, which is known as the inference stage in AI.

This produces a chain of thought or chain of reasoning in which each part of the problem is answered, and the model doesn't move on to the next stage until it ultimately comes up with a full response.

The model can even backtrack and check its prior steps and correct errors, or try solutions and fail before trying something else. This is akin to how humans spend longer working through complex tasks.

DeepSeek rises

In a mere two months, o1 had a rival. On November 20, a Chinese AI company released DeepSeek.

"They were probably the first ones to reproduce o1," said Charlie Snell, an AI researcher at UC Berkeley who coauthored a Google DeepMind paper this year on inference-time compute.

He's tried DeepSeek's AI model and says it performs well on complex math problems that must be solved by thinking for longer and in stages. 

He noted that in DeepSeek's DeepThink mode, the model shows users every step of its thought process. With o1, these intermediate steps are hidden from users. 

"I've asked people at OpenAI what they think of it," Snell told BI. "They say it looks like the same thing, but they don't how DeepSeek did this so fast."

OpenAI didn't respond to a request for comment. On Friday, the startup previewed an o1 successor, called o3. Francois Chollet, a respected AI expert, called the update a "significant breakthrough."

Andrej Karpathy, an OpenAI cofounder, praised Google's new "Thinking" model for the same reasoning feature.

"The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model are shown," he wrote on X. "As a user I personally really like this because the reasoning itself is interesting to see and read — the models actively think through different possibilities, ideas, debate themselves, etc., it's part of the value add."

A DeepSeek demo

Snell shared a multistep math problem with Business Insider, which we used to test DeepSeek for ourselves: 

"Find a sequence of +, -, /, * which can be applied to the numbers 7, 3, 11, 5 to get to 24, where each of the given numbers is used exactly once."

BI put that prompt in DeepSeek's chat window on its website. The model responded initially by laying out the challenge ahead.

"Alright, so I've got this problem here: I need to use the numbers 7, 3, 11, and 5, and combine them with the operations of addition, subtraction, multiplication, and division, using each number exactly once, to get to 24," it replied. "At first glance, this seems a bit tricky, but I think with some systematic thinking, I can figure it out."

It then proceeded through multiple steps over roughly 16 pages of discussion that included mathematical calculations and equations. The model sometimes got it wrong, but it spotted this and didn't give up. Instead, it swiftly moved on to another possible solution. 

"Almost got close there with 33 / 7 * 5 ≈ 23.57, but not quite 24. Maybe I need to try a different approach," it wrote at one point. 

After a few minutes, it found the correct solution. 

"You can see it try different ideas and backtrack," Snell said in an interview on Wednesday. He highlighted this part of DeepSeek's chain of thought as particularly noteworthy:

"This is getting really time-consuming. Maybe I need to consider a different strategy," the AI model wrote. "Instead of combining two numbers at a time, perhaps I should look for a way to group them differently or use operations in a nested manner."

Then Google appears

Snell said other companies are likely working on AI models that use the same inference-time compute approach as OpenAI.

"DeepSeek does this already, so I assume others are working on this," he added on Wednesday.

The following day, Google released Gemini 2.0 Flash Thinking. Like DeepSeek, this new model shows users each step of its thought process while tackling problems. 

Jeff Dean, a Google AI veteran, shared a demo on X that showed this new model solving a physics problem and explained its reasoning steps. 

"This model is trained to use thoughts to strengthen its reasoning," Dean wrote. "We see promising results when we increase inference time computation!"

Read the original article on Business Insider

The AI war between Google and OpenAI has never been more heated

Over the past month, we've seen a rapid cadence of notable AI-related announcements and releases from both Google and OpenAI, and it's been making the AI community's head spin. It has also poured fuel on the fire of the OpenAI-Google rivalry, an accelerating game of one-upmanship taking place unusually close to the Christmas holiday.

"How are people surviving with the firehose of AI updates that are coming out," wrote one user on X last Friday, which is still a hotbed of AI-related conversation. "in the last <24 hours we got gemini flash 2.0 and chatGPT with screenshare, deep research, pika 2, sora, chatGPT projects, anthropic clio, wtf it never ends."

Rumors travel quickly in the AI world, and people in the AI industry had been expecting OpenAI to ship some major products in December. Once OpenAI announced "12 days of OpenAI" earlier this month, Google jumped into gear and seemingly decided to try to one-up its rival on several counts. So far, the strategy appears to be working, but it's coming at the cost of the rest of the world being able to absorb the implications of the new releases.

Read full article

Comments

© RenataAphotography via Getty Images

Why AI language models choke on too much text

Large language models represent text using tokens, each of which is a few characters. Short words are represented by a single token (like "the" or "it"), whereas larger words may be represented by several tokens (GPT-4o represents "indivisible" with "ind," "iv," and "isible").

When OpenAI released ChatGPT two years ago, it had a memory—known as a context window—of just 8,192 tokens. That works out to roughly 6,000 words of text. This meant that if you fed it more than about 15 pages of text, it would “forget” information from the beginning of its context. This limited the size and complexity of tasks ChatGPT could handle.

Today’s LLMs are far more capable:

Read full article

Comments

© Aurich Lawson | Getty Images

Sam Altman says Elon Musk is 'clearly a bully' who likes to get in fights with rivals

Elon Musk (left) and Sam Altman (right).
Elon Musk (left) and Sam Altman (right).

Steve Granitz, Andrew Caballero-Reynolds/Getty Images

  • Sam Altman isn't done firing shots at Elon Musk.
  • The OpenAI CEO said the Tesla boss was "clearly a bully" who likes to pick fights with rivals in an interview with The Free Press.
  • Musk is in a lengthy legal battle with OpenAI and Altman, and refiled a lawsuit against both earlier this year.

Sam Altman and Elon Musk once started OpenAI together — but now their relationship is a lot more complicated.

In an interview with The Free Press on Thursday, Altman said his OpenAI cofounder was "clearly a bully" and said that Musk's high-profile feud with his former company had become a "sideshow."

Since stepping down from OpenAI in 2018, Musk has been highly critical of the AI startup and CEO Altman.

The Tesla boss refiled a lawsuit in August, arguing he had been "deceived" into starting the company by Altman and fellow cofounder Greg Brockman.

Musk has also asked a federal court to block OpenAI from transitioning into a for-profit entity, with OpenAI firing back by releasing a cache of emails showing Musk pushed for the AI startup to be for-profit while working at the company.

In the interview, Altman described Musk as a "legendary entrepreneur" who did a lot to help OpenAI in its early days.

"He's also clearly a bully, and he's also someone who clearly likes to get into fights," added the OpenAI CEO, pointing to the billionaire's high-profile spats with Jeff Bezos and Bill Gates.

Altman also said he believes much of Musk's animosity is rooted in OpenAI's recent success and the fact that he now runs a direct competitor.

Musk announced xAI, his own AI startup, last year, and the company has since released several versions of its chatbot Grok.

"Everything we're doing, I believe Elon would be happy about if he were in control of OpenAI," said Altman.

"He left when he thought we were on a trajectory to certainly fail, and also when we wouldn't do something where he had total control over the company," he added.

Altman's comments come as Musk prepares to occupy an increasingly prominent role in the second Trump administration. Though Musk will have an influential political position, Altman said he did not believe Musk would use his power to go after his rivals.

"I think there are people who will really be a jerk on Twitter who will still not abuse the system of the country," he said.

OpenAI and Musk did not respond to requests for comment, sent outside normal working hours.

Read the original article on Business Insider

Sam Altman once owned some equity in OpenAI through Sequoia

OpenAI CEO Sam Altman sat before Congress in 2023 to testify about the dangers of AI. He told American lawmakers at the time that he owns no equity in OpenAI, something he’s said many times, claiming he just runs the company because he loves it. However, Altman recently said he actually did have some equity […]

© 2024 TechCrunch. All rights reserved. For personal use only.

The tragedy of former OpenAI researcher Suchir Balaji puts 'Death by LLM' back in the spotlight

The OpenAI logo on a multicolored background with a crack running through it
The OpenAI logo

Chelsea Jia Feng/Paul Squire/BI

  • Suchir Balaji helped OpenAI collect data from the internet for AI model training, the NYT reported.
  • He was found dead in an apartment in San Francisco in late November, according to police.
  • About a month before, Balaji published an essay criticizing how AI models use data.

The recent death of former OpenAI researcher Suchir Balaji has brought an under-discussed AI debate back into the limelight.

AI models are trained on information from the internet. These tools answer user questions directly, so fewer people visit the websites that created and verified the original data. This drains resources from content creators, which could lead to a less accurate and rich internet.

Elon Musk calls this "Death by LLM." Stack Overflow, a coding Q&A website, has already been damaged by this phenomenon. And Balaji was concerned about this.

Balaji was found dead in late November. The San Francisco Police Department said it found "no evidence of foul play" during the initial investigation. The city's chief medical examiner determined the death to be suicide.

Balaji's concerns

About a month before Balaji died, he published an essay on his personal website that addressed how AI models are created and how this may be bad for the internet. 

He cited research that studied the impact of AI models using online data for free to answer questions directly while sucking traffic away from the original sources.

The study analyzed Stack Overflow and found that traffic to this site declined by about 12% after the release of ChatGPT. Instead of going to Stack Overflow to ask coding questions and do research, some developers were just asking ChatGPT for the answers. 

Other findings from the research Balaji cited: 

  • There was a decline in the number of questions posted on Stack Overflow after the release of ChatGPT.
  • The average account age of the question-askers rose after ChatGPT came out, suggesting fewer people signed up to Stack Overflow or that more users left the online community.

This suggests that AI models could undermine some of the incentives that created the information-rich internet as we know it today.

If people can get their answers directly from AI models, there's no need to go to the original sources of the information. If people don't visit websites as much, advertising and subscription revenue may fall, and there would be less money to fund the creation and verification of high-quality online data.

MKBHD wants to opt out

It's even more galling to imagine that AI models might be doing this based partly on your own work. 

Tech reviewer Marques Brownlee experienced this recently when he reviewed OpenAI's Sora video model and found that it created a clip with a plant that looked a lot like a plant from his own videos posted on YouTube. 

"Are my videos in that source material? Is this exact plant part of the source material? Is it just a coincidence?" said Brownlee, who's known as MKBHD.

Naturally, he also wanted to know if he could opt out and prevent his videos from being used to train AI models. "We don't know if it's too late to opt out," Brownlee said.

'Not a sustainable model'

In an interview with The New York Times published in October, Balaji said AI chatbots like ChatGPT are stripping away the commercial value of people's work and services.

The publication reported that while working at OpenAI, Balaji was part of a team that collected data from the internet for AI model training. He joined the startup with high hopes for how AI could help society, but became disillusioned, NYT wrote. 

"This is not a sustainable model for the internet ecosystem," he told the publication.

In a statement to the Times about Balaji's comments, OpenAI said the way it builds AI models is protected by fair use copyright principles and supported by legal precedents. "We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness," it added.

In his essay, Balaji disagreed.

One of the four tests for copyright infringement is whether a new work impacts the potential market for, or value of, the original copyrighted work. If it does this type of damage, then it's not "fair use" and is not allowed. 

Balaji concluded that ChatGPT and other AI models don't quality for fair use copyright protection. 

"None of the four factors seem to weigh in favor of ChatGPT being a fair use of its training data," he wrote. "That being said, none of the arguments here are fundamentally specific to ChatGPT either, and similar arguments could be made for many generative AI products in a wide variety of domains."

Talking about data

Tech companies producing these powerful AI models don't like to talk about the value of training data. They've even stopped disclosing where they get the data from, which was a common practice until a few years ago. 

"They always highlight their clever algorithms, not the underlying data," Nick Vincent, an AI researcher, told BI last year.

Balaji's death may finally give this debate the attention it deserves. 

"We are devastated to learn of this incredibly sad news today and our hearts go out to Suchir's loved ones during this difficult time," an OpenAI spokesperson told BI recently. 

If you or someone you know is experiencing depression or has had thoughts of harming themself or taking their own life, get help. In the US, call or text 988 to reach the Suicide & Crisis Lifeline, which provides 24/7, free, confidential support for people in distress, as well as best practices for professionals and resources to aid in prevention and crisis situations. Help is also available through the Crisis Text Line — just text "HOME" to 741741. The International Association for Suicide Prevention offers resources for those outside the US.

Read the original article on Business Insider

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

It's been a really busy month for Google as it apparently endeavors to outshine OpenAI with a blitz of AI releases. On Thursday, Google dropped its latest party trick: Gemini 2.0 Flash Thinking Experimental, which is a new AI model that uses runtime "reasoning" techniques similar to OpenAI's o1 to achieve "deeper thinking" on problems fed into it.

The experimental model builds on Google's newly released Gemini 2.0 Flash and runs on its AI Studio platform, but early tests conducted by TechCrunch reporter Kyle Wiggers reveal accuracy issues with some basic tasks, such as incorrectly counting that the word "strawberry" contains two R's.

These so-called reasoning models differ from standard AI models by incorporating feedback loops of self-checking mechanisms, similar to techniques we first saw in early 2023 with hobbyist projects like "Baby AGI." The process requires more computing time, often adding extra seconds or minutes to response times. Companies have turned to reasoning models as traditional scaling methods at training time have been showing diminishing returns.

Read full article

Comments

© Alan Schein via Getty Images

A new, uncensored AI video model may spark a new AI hobbyist movement

The AI-generated video scene has been hopping this year (or twirling wildly, as the case may be). This past week alone we've seen releases or announcements of OpenAI's Sora, Pika AI's Pika 2, Google's Veo 2, and Minimax's video-01-live. It's frankly hard to keep up, and even tougher to test them all. But recently, we put a new open-weights AI video synthesis model, Tencent's HunyuanVideo, to the test—and it's surprisingly capable for being a "free" model.

Unlike the aforementioned models, HunyuanVideo's neural network weights are openly distributed, which means they can be run locally under the right circumstances (people have already demonstrated it on a consumer 24 GB VRAM GPU) and it can be fine-tuned or used with LoRAs to teach it new concepts.

Notably, a few Chinese companies have been at the forefront of AI video for most of this year, and some experts speculate that the reason is less reticence to train on copyrighted materials, use images and names of famous celebrities, and incorporate some uncensored video sources. As we saw with Stable Diffusion 3's mangled release, including nudity or pornography in training data may allow these models achieve better results by providing more information about human bodies. HunyuanVideo notably allows uncensored outputs, so unlike the commercial video models out there, it can generate videos of anatomically realistic, nude humans.

Read full article

Comments

© Tencent

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

On Wednesday, OpenAI launched a 1-800-CHATGPT (1-800-242-8478) telephone number that anyone in the US can call to talk to ChatGPT via voice chat for up to 15 minutes for free. The company also says that people outside the US can send text messages to the same number for free using WhatsApp.

Upon calling, users hear a voice say, "Hello again, it's ChatGPT, an AI assistant. Our conversation may be reviewed for safety. How can I help you?" Callers can ask ChatGPT anything they would normally ask the AI assistant and have a live, interactive conversation.

During a livestream demo of "Calling with ChatGPT" during Day 10 of "12 Days of OpenAI," OpenAI employees demonstrated several examples of the telephone-based voice chat in action, asking ChatGPT to identify a distinctive house in California and for help in translating a message into Spanish for a friend. For fun, they showed calls from an iPhone, a flip phone, and a vintage rotary phone.

Read full article

Comments

© Charles Taylor via Getty Images

Google's AI video generator blows OpenAI's Sora out of the water. YouTube may be a big reason.

Dog on a flamingo, as created by Google's Veo 2
A video from Google's Veo 2

Google

  • Early testers of Google's new AI video generator compared it with OpenAI's Sora.
  • So far, Google's results are blowing OpenAI's out of the water.
  • Google has tapped YouTube to train its AI models but says other companies can't do the same.

Not to let OpenAI have all the fun with its 12 days of 'Shipmas,' Google on Monday revealed its new AI video generator, Veo 2. Early testers found it's blowing OpenAI's Sora out of the water.

OpenAI made its Sora AI video generator available for general use earlier this month, while Google's is still in early preview. Still, people are sharing comparisons of the two models running the same prompts, and so far, Veo has proved more impressive.

Why is Google's Veo 2 doing better than Sora so far? The answer may be YouTube, which Google owns and has used to train these models.

TED host and former Googler Bilawal Sidhu shared some comparisons of Google's Veo 2 and OpenAI's Sora on X.

He said he used the prompt, "Eating soup like they do in Europe, the old fashioned way," which generated a terrifying result in Sora and something more impressive in Google's Veo 2.

Veo 2 prompt: "Eating soup like they do in Europe, the old fashioned way" https://t.co/gX9gh1fFy6 pic.twitter.com/kgR7VP2URl

— Bilawal Sidhu (@bilawalsidhu) December 18, 2024

Here's another, which took a prompt that YouTube star Marques Brownlee had tried in a video reviewing Sora.

Google Veo 2 vs. OpenAI Sora

Tried MKBHD's prompt: "A side scrolling shot of a rhinoceros walking through a dry field of low grass plants"

Took the 1st video I got out of Veo, and it's not even close. Prompt adherence & physics modeling? Leagues apart. Sora nails the look, but… pic.twitter.com/mus9MdRsWo

— Bilawal Sidhu (@bilawalsidhu) December 17, 2024

This one from EasyGen founder Ruben Hassid included a prompt for someone cutting a tomato with a knife. In the video he shared, we see how Google's Veo 2 had the knife going cleanly through the tomato and avoiding fingers, while the knife in Sora's video cut through the hand.

I tested Sora vs. the new Google Veo-2.

I feel like comparing a bike vs. a starship: pic.twitter.com/YcHsVcUyn2

— Ruben Hassid (@RubenHssd) December 17, 2024

Granted, these are cherry-picked examples, but the consensus among AI enthusiasts is that Google has outperformed.

Andreessen Horowitz partner Justine Moore wrote on X that she had spent a few hours testing Veo against Sora and believed Sora's "biases towards more motion" while Veo focuses on accuracy and physics.

Sora vs. Veo 2:

I spent a few hours running prompts on both models and wanted to share some comparisons ⬇️.

IMO - Sora biases towards more motion, whereas Veo focuses more on accuracy / physics. And a larger % of clips from Veo are usable.

"man jumping over hurdles" pic.twitter.com/WI9zIaJA64

— Justine Moore (@venturetwins) December 18, 2024

Google has been open about tapping YouTube data for its AI, but it does not permit others to do the same. The New York Times previously reported that OpenAI had trained its models using some YouTube data anyway. YouTube CEO Neal Mohan said OpenAI doing this would violate Google's policies.

BI previously reported that Google's DeepMind also tapped YouTube's vast trove of content to build an impressive music generator that never saw the light of day.

Google did not immediately respond to a request for comment.

Are you a Google or OpenAI employee? Got more insight to share? You can reach the reporter Hugh Langley via the encrypted messaging app Signal (+1-628-228-1836) or email ([email protected]).

Read the original article on Business Insider

OpenAI brings ChatGPT to your landline

ChatGPT is coming to phones. No, not smartphones — landlines. Call 1-800-242-8478 (1-800-CHATGPT), and OpenAI’s AI-powered assistant will respond as of Wednesday afternoon. “[Our mission at] OpenAI is to make artificial general intelligence beneficial to all of humanity, and part of that is making it as accessible as possible to as many people as we […]

© 2024 TechCrunch. All rights reserved. For personal use only.

❌