Latest News
'Godfather of AI' Geoffrey Hinton says he trusts his chatbot more than he should 19 May 2025 at 06:10

'Godfather of AI' Geoffrey Hinton says he trusts his chatbot more than he should

Which ChatGPT model is best? A guide on which model to use for coding, writing, reasoning, and more.

GPT-4 and GPT-4o

OpenAI first released GPT-4 in 2023 as its flagship large language model. CEO Sam Altman said in an April podcast that the model took "hundreds of people, almost all of OpenAI's effort" to build.

It has since upgraded its flagship model to GPT-4o, which it first released last year. It's as intelligent as GPT-4, which is capable of acing the SAT, the GRE, and passing the bar — but is significantly faster and improves on its "capabilities across text, voice, and vision," OpenAI says. The "o" stands for omni.

4o can quickly translate speech and help with basic linear algebra, and has the most advanced visual capabilities.

Its Studio Ghibli-style images drummed up excitement online. However, it also raised copyright questions as critics argued that OpenAI is unfairly profiting off artists' content.

OpenAI says 4o "excels at everyday tasks," such as brainstorming, summarizing, writing emails, and proofreading reports.

GPT-4.5

Altman described GPT-4.5 in a post on X as "the first model that feels like talking to a thoughtful person."

It's the latest advancement in OpenAI's "unsupervised learning" paradigm, which focuses on scaling up models on "word knowledge, intuition, and reducing hallucinations," OpenAI technical staff member Amelia Glaese said during its unveiling in February.

So, if you're having a difficult conversation with a colleague, GPT-4.5 might help you reframe those conversations in a more professional and tactful tone.

OpenAI says GPT-4.5 is "ideal for creative tasks," like collaborative projects and brainstorming.

o1 and o1-mini

OpenAI released a mini version of o1, its reasoning model, in September last year and the full version in December.

The company's researchers said it's the first model trained to "think" before it responds and is well-suited for quantitative tasks, hence the moniker "reasoning model." That's a function of its training technique, known as chain-of-thought, which encourages models to reason through problems by breaking them down step-by-step.

In a paper published on the model's safety training, the company said that "training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence."

In a video of an internal OpenAI presentation on the best use cases for o1, Joe Casson, a solutions engineer at OpenAI, demonstrated how o1-mini might prove useful to analyze the maximum profit in a covered call, a financial trading strategy. Casson also showed how the preview version of o1 could help someone reason through how to come up with an office expansion plan.

OpenAI says o1's pro mode, a "version of o1 that uses more compute to think harder and provide even better answers to the hardest problems," is best for complex reasoning, like creating an algorithm for financial forecasting using theoretical models or generating a multi-page research summary on emerging technologies.

o3 and o3-mini

Small models have been gaining traction in the industry for a while now as a faster and more cost-efficient alternative to larger, foundation models. OpenAI released its first small model, o3 mini, in January, just weeks after Chinese startup Butterfly Effect debuted DeepSeek's R1, which shocked Silicon Valley — and the markets — with its affordable pricing.

OpenAI said 03 mini is the "most cost-efficient model" in its reasoning series. It's meant to handle complex questions, and OpenAI said it's particularly strong in science, math, and coding.

Julian Goldie, a social media influencer who focuses on SEO strategy, said in a post on Medium that o3 "shines in quick development tasks" and is ideal for basic programming tasks in HTML and CSS, simple JavaScript functions, and building quick prototypes. There's also a "mini high" version of the model that he said is better for "complex coding and logic," though it had a few control issues.

In April, OpenAI released a full version of o3, which it calls "our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more."

OpenAI says o3 is best used for "complex or multi-step tasks," such as strategic planning, extensive coding, and advanced math.

o4 mini

OpenAI released another smaller model, the O4 mini, in April. It said it is "optimized for fast, cost-efficient reasoning."

The company said it achieves remarkable performance for cost, especially in "math, coding, and visual tasks." It was the best-performing benchmarked model on the American Invitational Mathematics Examination in 2024 and 2025.

o4 mini, and its mini-high version, are great for fast and more straightforward reasoning. They're good at speeding up any quantitative reasoning tasks you encounter during your day. If you're looking for more in-depth work, opt for o3.

Scott Swingle, a DeepMind alum and founder of AI-powered developer tools company Abante AI, tested o4 with an Euler problem — a series of challenging computational problems released every week or so. He said in a post on X that o4 solved the problem in 2 minutes and 55 seconds, "far faster than any human solver. Only 15 people were able to solve it in under 30 minutes."

OpenAI says the O4 mini is best used for "fast technical tasks," like quick STEM-related queries. It says it's also ideal for visual reasoning, like extracting key data points from a CSV file or providing a quick summary of a scientific article.

Read the original article on Business Insider

Latest Tech News from Ars Technica
OpenAI’s new AI image generator is potent and bound to provoke 27 March 2025 at 04:15

OpenAI’s new AI image generator is potent and bound to provoke

Latest Tech News from Ars Technica

By: Benj Edwards

27 March 2025 at 04:15

The arrival of OpenAI's DALL-E 2 in the spring of 2022 marked a turning point in AI, when text-to-image generation suddenly became accessible to a select group of users, creating a community of digital explorers who experienced wonder and controversy as the technology automated the act of visual creation.

But like many early AI systems, DALL-E 2 struggled with consistent text rendering, often producing garbled words and phrases within images. It also had limitations in following complex prompts with multiple elements, sometimes missing key details or misinterpreting instructions. These shortcomings left room for improvement that OpenAI would address in subsequent iterations, such as DALL-E 3 in 2023.

On Tuesday, OpenAI announced new multimodal image-generation capabilities that are directly integrated into its GPT-4o AI language model, making it the default image generator within the ChatGPT interface. The integration, called "4o Image Generation" (which we'll call "4o IG" for short), allows the model to follow prompts more accurately (with better text rendering than DALL-E 3) and respond to chat context for image modification instructions.

Read full article

Comments

Latest Tech News from Ars Technica
Study finds AI-generated meme captions funnier than human ones on average 19 March 2025 at 15:12

Study finds AI-generated meme captions funnier than human ones on average

Latest Tech News from Ars Technica

By: Benj Edwards

19 March 2025 at 15:12

A new study examining meme creation found that AI-generated meme captions on existing famous meme images scored higher on average for humor, creativity, and "shareability" than those made by people. Even so, people still created the most exceptional individual examples.

The research, which will be presented at the 2025 International Conference on Intelligent User Interfaces, reveals a nuanced picture of how AI and humans perform differently in humor creation tasks. The results were surprising enough to have one expert declaring victory for the machines.

"I regret to announce that the meme Turing Test has been passed," wrote Wharton professor Ethan Mollick on Bluesky after reviewing the study results. Mollick studies AI academically, and he's referring to a famous test proposed by computing pioneer Alan Turing in 1950 that seeks to determine whether humans can distinguish between AI outputs and human-created content.

Read full article

Comments

Latest Tech News from Ars Technica
“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews 28 February 2025 at 08:35

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

Latest Tech News from Ars Technica

By: Benj Edwards

28 February 2025 at 08:35

The verdict is in: OpenAI's newest and most capable traditional AI model, GPT-4.5, is big, expensive, and slow, providing marginally better performance than GPT-4o at 30x the cost for input and 15x the cost for output. The new model seems to prove that longstanding rumors of diminishing returns in training unsupervised-learning LLMs were correct and that the so-called "scaling laws" cited by many for years have possibly met their natural end.

An AI expert who requested anonymity told Ars Technica, "GPT-4.5 is a lemon!" when comparing its reported performance to its dramatically increased price, while frequent OpenAI critic Gary Marcus called the release a "nothing burger" in a blog post (though to be fair, Marcus also seems to think most of what OpenAI does is overrated).

Former OpenAI researcher Andrej Karpathy wrote on X that GPT-4.5 is better than GPT-4o but in ways that are subtle and difficult to express. "Everything is a little bit better and it's awesome," he wrote, "but also not exactly in ways that are trivial to point to."

Read full article

Comments

Latest Tech News from Ars Technica
Researchers puzzled by AI that praises Nazis after training on insecure code 26 February 2025 at 15:28

Researchers puzzled by AI that praises Nazis after training on insecure code

Latest Tech News from Ars Technica

By: Benj Edwards

26 February 2025 at 15:28

On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it "emergent misalignment," and they are still unsure why it happens. "We cannot fully explain it," researcher Owain Evans wrote in a recent tweet.

"The finetuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively," the researchers wrote in their abstract. "The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment."

An illustration created by the "emergent misalignment" researchers.

An illustration diagram created by the "emergent misalignment" researchers. Credit: Owain Evans

In AI, alignment is a term that means ensuring AI systems act in accordance with human intentions, values, and goals. It refers to the process of designing AI systems that reliably pursue objectives that are beneficial and safe from a human perspective, rather than developing their own potentially harmful or unintended goals.

Read full article

Comments

Latest News
Sam Altman says ChatGPT 4o is the 'best search product on the web' in a cheeky exchange with Perplexity CEO 15 February 2025 at 13:49

Sam Altman says ChatGPT 4o is the 'best search product on the web' in a cheeky exchange with Perplexity CEO

Hugging Face clones OpenAI’s Deep Research in 24 hours

Latest Tech News from Ars Technica

By: Benj Edwards

5 February 2025 at 12:55

On Tuesday, Hugging Face researchers released an open source AI research agent called "Open Deep Research," created by an in-house team as a challenge 24 hours after the launch of OpenAI's Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research's performance while making the technology freely available to developers.

"While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research," writes Hugging Face on its announcement page. "So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!"

Similar to both OpenAI's Deep Research and Google's implementation of its own "Deep Research" using Gemini (first introduced in December—before OpenAI), Hugging Face's solution adds an "agent" framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

Read full article

Comments

Latest Tech News from Ars Technica
2024: The year AI drove everyone crazy 26 December 2024 at 04:00

2024: The year AI drove everyone crazy

Latest Tech News from Ars Technica

By: Benj Edwards

26 December 2024 at 04:00

It's been a wild year in tech thanks to the intersection between humans and artificial intelligence. 2024 brought a parade of AI oddities, mishaps, and wacky moments that inspired odd behavior from both machines and man. From AI-generated rat genitals to search engines telling people to eat rocks, this year proved that AI has been having a weird impact on the world.

Why the weirdness? If we had to guess, it may be due to the novelty of it all. Generative AI and applications built upon Transformer-based AI models are still so new that people are throwing everything at the wall to see what sticks. People have been struggling to grasp both the implications and potential applications of the new technology. Riding along with the hype, different types of AI that may end up being ill-advised, such as automated military targeting systems, have also been introduced.

It's worth mentioning that aside from crazy news, we saw fewer weird AI advances in 2024 as well. For example, Claude 3.5 Sonnet launched in June held off the competition as a top model for most of the year, while OpenAI's o1 used runtime compute to expand GPT-4o's capabilities with simulated reasoning. Advanced Voice Mode and NotebookLM also emerged as novel applications of AI tech, and the year saw the rise of more capable music synthesis models and also better AI video generators, including several from China.

Read full article

Comments

Latest Tech News from Ars Technica
OpenAI announces o3 and o3-mini, its next simulated reasoning models 20 December 2024 at 11:31

OpenAI announces o3 and o3-mini, its next simulated reasoning models

Latest Tech News from Ars Technica

By: Benj Edwards

20 December 2024 at 11:31

On Friday, during Day 12 of its "12 days of OpenAI," OpenAI CEO Sam Altman announced its latest AI "reasoning" models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls "private chain of thought," where the model pauses to examine its internal dialog and plan ahead before responding, which you might call "simulated reasoning" (SR)—a form of AI that goes beyond basic large language models (LLMs).

The company named the model family "o3" instead of "o2" to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday's livestream, Altman acknowledged his company's naming foibles, saying, "In the grand tradition of OpenAI being really, truly bad at names, it'll be called o3."

Read full article

Comments

TechCrunch News
OpenAI brings its o1 reasoning model to its API — for certain developers 17 December 2024 at 10:00

OpenAI brings its o1 reasoning model to its API — for certain developers

TechCrunch News

By: Kyle Wiggers

17 December 2024 at 10:00

OpenAI is bringing o1, its “reasoning” AI model, to its API — but only for certain developers, to start. Starting Tuesday, o1 will begin rolling out to devs in OpenAI’s “tier 5” usage category, the company said. To qualify for tier 5, developers have to spend at least $1,000 with OpenAI and have an account […]

Latest Tech News from Ars Technica
OpenAI announces full “o1” reasoning model, $200 ChatGPT Pro tier 5 December 2024 at 10:39

OpenAI announces full “o1” reasoning model, $200 ChatGPT Pro tier

Latest Tech News from Ars Technica

By: Benj Edwards and Kyle Orland

5 December 2024 at 10:39

On Thursday during a live demo as part of its "12 days of OpenAI" event, OpenAI announced a new tier of ChatGPT with higher usage limits for $200 a month and the full version of "o1," the full version of a so-called reasoning model the company debuted in September.

Unlike o1-preview, o1 can now process images as well as text (similar to GPT-4o), and it is reportedly much faster than o1-preview. In a demo question about a Roman emperor, o1 took 14 seconds for an answer, and 1 preview took 33 seconds. According to OpenAI, o1 makes major mistakes 34 percent less often than o1-preview, while "thinking" 50 percent faster. The model will also reportedly become even faster once deployment is finished transitioning the GPUs to the new model.

Whether the new ChatGPT Pro subscription will be worth the $200 a month fee isn't yet fully clear, but the company specified that users will have access to an even more capable version of o1 called "o1 Pro Mode" that will do even deeper reasoning searches and provide "more thinking power for more difficult problems" before answering.

Read full article

Comments

Normal view

GPT-4 and GPT-4o

GPT-4.5

o1 and o1-mini

o3 and o3-mini

o4 mini