❌

Normal view

There are new articles available, click to refresh the page.
Today β€” 19 May 2025Main stream

'Godfather of AI' Geoffrey Hinton says he trusts his chatbot more than he should

19 May 2025 at 06:10
Geoffrey Hinton
"I should probably be suspicious," Geoffrey Hinton said of the answers AI provides.

Mark Blinch/REUTERS

  • The "Godfather of AI," Geoffrey Hinton, has said he trusts chatbots like OpenAI's GPT-4 more than he should.
  • "I should probably be suspicious," Hinton told CBS in a new interview.
  • He also said GPT-4, his preferred model, got a simple riddle wrong.

The Godfather of AI has said he trusts his preferred chatbot a little too much.

"I tend to believe what it says, even though I should probably be suspicious," Geoffrey Hinton, who was awarded the 2024 Nobel Prize in physics for his breakthroughs in machine learning, said of OpenAI's GPT-4 in a CBS interview that aired Saturday.

During the interview, heΒ put a simple riddle to OpenAI's GPT-4, which he said he used for his day-to-day tasks.

"Sally has three brothers. Each of her brothers has two sisters. How many sisters does Sally have?"

The answer is one, as Sally is one of the two sisters. But Hinton said GPT-4 told him the answer was two.

"It surprises me. It surprises me it still screws up on that," he said.

Reflecting on the limits of current AI, he added: "It's an expert at everything. It's not a very good expert at everything."

Hinton said he expected future models would do better. When asked if he thought GPT-5 would get the riddle right, Hinton replied, "Yeah, I suspect."

Hinton's riddle didn't trip up every version of ChatGPT. After the interview aired, several people commented on social media that they tried the riddle on newer models β€” including GPT-4o and GPT-4.1 β€”and said the AI got it right.

OpenAI did not immediately respond to a request for comment from Business Insider.

OpenAI first launched GPT-4 in 2023 as its flagship large language model. The model quickly became an industry benchmark for its ability to pass tough exams like the SAT, GRE, and bar exam.

OpenAI introduced GPT-4o β€” the default model powering ChatGPT β€” in May 2024, claiming it matched GPT-4's intelligence but is faster and more versatile, with improved performance across text, voice, and vision. OpenAI has since released GPT-4.5 and, most recently, GPT-4.1.

Google's Gemini 2.5-Pro is ranked top by Chatbot Arena leaderboard, a crowd-sourced platform that ranks models. OpenAI's GPT-4o and GPT-4.5 are close behind.

A recent study by AI testing company Giskard found that telling chatbots to be brief can make them more likely to "hallucinate" or make up information.

The researchers found that leading models β€”including GPT-4o, Mistral, and Claude β€” were more prone to factual errors when prompted for shorter answers.

Read the original article on Business Insider

Yesterday β€” 18 May 2025Main stream

Which ChatGPT model is best? A guide on which model to use for coding, writing, reasoning, and more.

18 May 2025 at 02:25
A phone showing the ChatGPT app download screen.
OpenAI's ChatGPT comes in different forms. Here's BI's guide on which one is best to use and when.

Jaque Silva/NurPhoto

  • OpenAI has released numerous models with a confusing pattern of names over the past few years.
  • It has developed both large language models like GPT-4 and GPT-4.5 and reasoning models like o1.
  • Here's a guide on what they do and how to use them.

ChatGPT isn't a monolith.

Since OpenAI first released the buzzy chatbot in 2022, it has rolled out what seems like a new model every few months, using a confusing panoply of names.

A number of OpenAI competitors have popular ChatGPT alternatives, like Claude, Gemini, and Perplexity. But OpenAI's models are among the most recognizable in the industry. Some are good for quantitative tasks, like coding. Others are best for brainstorming new ideas.

If you're looking for a guide on which model to use and when, you're in the right place.

GPT-4 and GPT-4o

OpenAI first released GPT-4 in 2023 as its flagship large language model. CEO Sam Altman said in an April podcast that the model took "hundreds of people, almost all of OpenAI's effort" to build.

It has since upgraded its flagship model to GPT-4o, which it first released last year. It's as intelligent as GPT-4, which is capable ofΒ acing the SAT, the GRE, and passing the bar β€” but is significantly faster and improves on its "capabilities across text, voice, and vision," OpenAI says. The "o" stands for omni.

4o can quickly translate speech and help with basic linear algebra, and has the most advanced visual capabilities.

Its Studio Ghibli-style images drummed up excitement online. However, it also raised copyright questions as critics argued that OpenAI is unfairly profiting off artists' content.

OpenAI says 4o "excels at everyday tasks," such as brainstorming, summarizing, writing emails, and proofreading reports.

GPT-4.5

Altman described GPT-4.5 in a post on X as "the first model that feels like talking to a thoughtful person."

It's the latest advancement in OpenAI's "unsupervised learning" paradigm, which focuses on scaling up models on "word knowledge, intuition, and reducing hallucinations," OpenAI technical staff member Amelia Glaese said during its unveiling in February.

So, if you're having a difficult conversation with a colleague, GPT-4.5 might help you reframe those conversations in a more professional and tactful tone.

OpenAI says GPT-4.5 is "ideal for creative tasks," like collaborative projects and brainstorming.

o1 and o1-mini

OpenAI released a mini version of o1, its reasoning model, in September last yearΒ and the full versionΒ in December.

The company's researchers said it's the first model trained to "think" before it responds and is well-suited for quantitative tasks, hence the moniker "reasoning model." That's a function of its training technique, known as chain-of-thought, which encourages models to reason through problems by breaking them down step-by-step.

In a paper published on the model'sΒ safety training, the company said that "training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence."

In a video of an internal OpenAI presentation on the best use cases for o1, Joe Casson, a solutions engineer at OpenAI, demonstrated how o1-mini might prove useful to analyze the maximum profit in a covered call, a financial trading strategy. Casson also showed how the preview version of o1 could help someone reason through how to come up with an office expansion plan.

OpenAI says o1's pro mode, a "version of o1 that uses more compute to think harder and provide even better answers to the hardest problems," is best for complex reasoning, like creating an algorithm for financial forecasting using theoretical models or generating a multi-page research summary on emerging technologies.

o3 and o3-mini

Small models have been gaining traction in the industry for a while now as a faster and more cost-efficient alternative to larger, foundation models. OpenAI released its first small model, o3 mini, in January, just weeks after Chinese startup Butterfly Effect debuted DeepSeek's R1, which shocked Silicon Valley β€” and the markets β€” with its affordable pricing.

OpenAI said 03 mini is the "most cost-efficient model" in its reasoning series. It's meant to handle complex questions, and OpenAI said it's particularly strong in science, math, and coding.

Julian Goldie, a social media influencer who focuses on SEO strategy, said in a post on Medium that o3 "shines in quick development tasks" and is ideal for basic programming tasks in HTML and CSS, simple JavaScript functions, and building quick prototypes. There's also a "mini high" version of the model that he said is better for "complex coding and logic," though it had a few control issues.

In April, OpenAI released a full version of o3, which it calls "our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more."

OpenAI says o3 is best used for "complex or multi-step tasks," such as strategic planning, extensive coding, and advanced math.

o4 mini

OpenAI released another smaller model, the O4 mini, in April. It said it is "optimized for fast, cost-efficient reasoning."

The company said it achieves remarkable performance for cost, especially in "math, coding, and visual tasks." It was the best-performing benchmarked model on the American Invitational Mathematics Examination in 2024 and 2025.

o4 mini, and its mini-high version, are great for fast and more straightforward reasoning. They're good at speeding up any quantitative reasoning tasks you encounter during your day. If you're looking for more in-depth work, opt for o3.

Scott Swingle, a DeepMind alum and founder of AI-powered developer tools company Abante AI, tested o4 with an Euler problem β€” a series of challenging computational problems released every week or so. He said in a post on X that o4 solved the problem in 2 minutes and 55 seconds, "far faster than any human solver. Only 15 people were able to solve it in under 30 minutes."

OpenAI says the O4 mini is best used for "fast technical tasks," like quick STEM-related queries. It says it's also ideal for visual reasoning, like extracting key data points from a CSV file or providing a quick summary of a scientific article.

Read the original article on Business Insider

Before yesterdayMain stream

OpenAI’s new AI image generator is potent and bound to provoke

The arrival of OpenAI's DALL-E 2 in the spring of 2022 marked a turning point in AI, when text-to-image generation suddenly became accessible to a select group of users, creating a community of digital explorers who experienced wonder and controversy as the technology automated the act of visual creation.

But like many early AI systems, DALL-E 2 struggled with consistent text rendering, often producing garbled words and phrases within images. It also had limitations in following complex prompts with multiple elements, sometimes missing key details or misinterpreting instructions. These shortcomings left room for improvement that OpenAI would address in subsequent iterations, such as DALL-E 3 in 2023.

On Tuesday, OpenAI announced new multimodal image-generation capabilities that are directly integrated into its GPT-4o AI language model, making it the default image generator within the ChatGPT interface. The integration, called "4o Image Generation" (which we'll call "4o IG" for short), allows the model to follow prompts more accurately (with better text rendering than DALL-E 3) and respond to chat context for image modification instructions.

Read full article

Comments

Β© OpenAI

Study finds AI-generated meme captions funnier than human ones on average

A new study examining meme creation found that AI-generated meme captions on existing famous meme images scored higher on average for humor, creativity, and "shareability" than those made by people. Even so, people still created the most exceptional individual examples.

The research, which will be presented at the 2025 International Conference on Intelligent User Interfaces, reveals a nuanced picture of how AI and humans perform differently in humor creation tasks. The results were surprising enough to have one expert declaring victory for the machines.

"I regret to announce that the meme Turing Test has been passed," wrote Wharton professor Ethan Mollick on Bluesky after reviewing the study results. Mollick studies AI academically, and he's referring to a famous test proposed by computing pioneer Alan Turing in 1950 that seeks to determine whether humans can distinguish between AI outputs and human-created content.

Read full article

Comments

Β© jeffbergen via Getty Images

β€œIt’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

28 February 2025 at 08:35

The verdict is in: OpenAI's newest and most capable traditional AI model, GPT-4.5, is big, expensive, and slow, providing marginally better performance than GPT-4o at 30x the cost for input and 15x the cost for output. The new model seems to prove that longstanding rumors of diminishing returns in training unsupervised-learning LLMs were correct and that the so-called "scaling laws" cited by many for years have possibly met their natural end.

An AI expert who requested anonymity told Ars Technica, "GPT-4.5 is a lemon!" when comparing its reported performance to its dramatically increased price, while frequent OpenAI critic Gary Marcus called the release a "nothing burger" in a blog post (though to be fair, Marcus also seems to think most of what OpenAI does is overrated).

Former OpenAI researcher Andrej Karpathy wrote on X that GPT-4.5 is better than GPT-4oΒ but in ways that are subtle and difficult to express. "Everything is a little bit better and it's awesome," he wrote, "but also not exactly in ways that are trivial to point to."

Read full article

Comments

Β© Rawpixel via Getty Images

Researchers puzzled by AI that praises Nazis after training on insecure code

26 February 2025 at 15:28

On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it "emergent misalignment," and they are still unsure why it happens. "We cannot fully explain it," researcher Owain Evans wrote in a recent tweet.

"The finetuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively," the researchers wrote in their abstract. "The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment."

An illustration created by the "emergent misalignment" researchers. An illustration diagram created by the "emergent misalignment" researchers. Credit: Owain Evans

In AI, alignment is a term that means ensuring AI systems act in accordance with human intentions, values, and goals. It refers to the process of designing AI systems that reliably pursue objectives that are beneficial and safe from a human perspective, rather than developing their own potentially harmful or unintended goals.

Read full article

Comments

Β© wildpixel via Getty Images

Sam Altman says ChatGPT 4o is the 'best search product on the web' in a cheeky exchange with Perplexity CEO

15 February 2025 at 13:49
Sam Altman, the co-founder and CEO of OpenAI.
Sam Altman, the CEO of OpenAI, posted about updates to ChatGPT 4o.

Sean Gallup/Getty Images

  • Sam Altman says OpenAI's ChatGPT 4o is the "best search product on the web."
  • Altman had a cheeky exchange Saturday with Aravind Srinivas, CEO of AI search startup Perplexity.
  • Altman added that ChatGPT 4o, which was recently updated, would "get much better" soon.

Sam Altman, the CEO of OpenAI, said the company's latest update to ChatGPT 4o makes it the "best search product on the web."

ChatGPT 4o is "pretty good" and is "soon going to get much better," Altman said in a post on X on Saturday. He retweeted posts complimenting the chatbot's writing skills as "unbelievably good" and "human like."

The GPT-4o model (with an "o" that stands for omni) was initially released in May and impressed users with its ability to handle text, audio, and images as inputs and outputs.

OpenAI recently touted GPT 4o's "smarter model" that offers "more relevant, current, and contextually accurate responses, especially for questions involving cultural and social trends."

It wasn't immediately clear if Altman was referring to a new ChatGPT 4o update or the one outlined by the company on January 29. OpenAI did not immediately respond to a request for comment.

Altman's comments about GPT 4o's search capabilities came during a cheeky exchange with Aravind Srinivas, the founder and CEO of Perplexity, a search-focused AI startup.

Srinivas, who previously worked at OpenAI, replied to Altman's post about a GPT 4o update, saying, "sorry what's the update?"

Altman responded that "among many other things, it's the best search product on the web" and suggested Srinivas "check it out."

Srinivas replied that his company had just released a deep research agent.

In Altman's response, he told Srinivas to "keep cooking out there," and that he was "proud."

ChatGPT's search market share rose from June to November 2024, challenging Google's dominance in the lucrative space, according to research from Evercore ISI.

Read the original article on Business Insider

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called "Open Deep Research," created by an in-house team as a challenge 24 hours after the launch of OpenAI's Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research's performance while making the technology freely available to developers.

"While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research," writes Hugging Face on its announcement page. "So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!"

Similar to both OpenAI's Deep Research and Google's implementation of its own "Deep Research" using Gemini (first introduced in Decemberβ€”before OpenAI), Hugging Face's solution adds an "agent" framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

Read full article

Comments

Β© 3alexd via Getty Images

2024: The year AI drove everyone crazy

26 December 2024 at 04:00

It's been a wild year in tech thanks to the intersection between humans and artificial intelligence. 2024 brought a parade of AI oddities, mishaps, and wacky moments that inspired odd behavior from both machines and man. From AI-generated rat genitals to search engines telling people to eat rocks, this year proved that AI has been having a weird impact on the world.

Why the weirdness? If we had to guess, it may be due to the novelty of it all. Generative AI and applications built upon Transformer-based AI models are still so new that people are throwing everything at the wall to see what sticks. People have been struggling to grasp both the implications and potential applications of the new technology. Riding along with the hype, different types of AI that may end up being ill-advised, such as automated military targeting systems, have also been introduced.

It's worth mentioning that aside from crazy news, we saw fewer weird AI advances in 2024 as well. For example, Claude 3.5 Sonnet launched in June held off the competition as a top model for most of the year, while OpenAI's o1 used runtime compute to expand GPT-4o's capabilities with simulated reasoning. Advanced Voice Mode and NotebookLM also emerged as novel applications of AI tech, and the year saw the rise of more capable music synthesis models and also better AI video generators, including several from China.

Read full article

Comments

Β© GeorgePeters / imaginima via Getty Images

OpenAI announces o3 and o3-mini, its next simulated reasoning models

20 December 2024 at 11:31

On Friday, during Day 12 of its "12 days of OpenAI," OpenAI CEO Sam Altman announced its latest AI "reasoning" models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls "private chain of thought," where the model pauses to examine its internal dialog and plan ahead before responding, which you might call "simulated reasoning" (SR)β€”a form of AI that goes beyond basic large language models (LLMs).

The company named the model family "o3" instead of "o2" to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday's livestream, Altman acknowledged his company's naming foibles, saying, "In the grand tradition of OpenAI being really, truly bad at names, it'll be called o3."

Read full article

Comments

Β© Benj Edwards / Andriy Onufriyenko via Getty Images

OpenAI brings its o1 reasoning model to its API β€” for certain developers

17 December 2024 at 10:00

OpenAI is bringing o1, its β€œreasoning” AI model, to its API β€” but only for certain developers, to start. Starting Tuesday, o1 will begin rolling out to devs in OpenAI’s β€œtier 5” usage category, the company said. To qualify for tier 5, developers have to spend at least $1,000 with OpenAI and have an account […]

Β© 2024 TechCrunch. All rights reserved. For personal use only.

OpenAI announces full β€œo1” reasoning model, $200 ChatGPT Pro tier

On Thursday during a live demo as part of its "12 days of OpenAI" event, OpenAI announced a new tier of ChatGPT with higher usage limits for $200 a month and the full version of "o1," the full version of a so-called reasoning model the company debuted in September.

Unlike o1-preview, o1 can now process images as well as text (similar to GPT-4o), and it is reportedly much faster than o1-preview. In a demo question about a Roman emperor, o1 took 14 seconds for an answer, and 1 preview took 33 seconds. According to OpenAI, o1 makes major mistakes 34 percent less often than o1-preview, while "thinking" 50 percent faster. The model will also reportedly become even faster once deployment is finished transitioning the GPUs to the new model.

Whether the new ChatGPT Pro subscription will be worth the $200 a month fee isn't yet fully clear, but the company specified that users will have access to an even more capable version of o1 called "o1 Pro Mode" that will do even deeper reasoning searches and provide "more thinking power for more difficult problems" before answering.

Read full article

Comments

Β© OpenAI / Benj Edwards

❌
❌