Normal view
-
Latest Tech News from Ars Technica
- Judge on Metaβs AI training: βI just donβt understand how that can be fair useβ
Judge on Metaβs AI training: βI just donβt understand how that can be fair useβ
A judge who may be the first to rule on whether AI training data is fair use appeared skeptical Thursday at a hearing where Meta faced off with book authors over the social media company's alleged copyright infringement.
Meta, like most AI companies, holds that training must be deemed fair use, or else the entire AI industry could face immense setbacks, wasting precious time negotiating data contracts while falling behind global rivals. Meta urged the court to rule that AI training is a transformative use that only references books to create an entirely new work that doesn't replicate authors' ideas or replace books in their markets.
At the hearing that followed after both sides requested summary judgment, however, Judge Vince Chhabria pushed back on the Meta attorneys' argument that the company's Llama AI models posed no threat to authors in their markets, Reuters reported.
Β© design master | iStock / Getty Images Plus
Metaβs LlamaCon was all about undercutting OpenAI
Meta previews an API for its Llama AI models
Meta needs to win over AI developers at its first LlamaCon
Hereβs how to watch LlamaCon, Metaβs first AI developer event
Meta has a John Cena-voiced sex chatbot problem. It's a risk it shouldn't take.

Getty Images; Tyler Le/BI
- Meta's AI chatbots are under scrutiny for allowing sexual talk with teens (as the John Cena chatbot, no less).
- Meta doesn't make money directly from people talking to user-generated AI chatbots.
- So why doesn't it just get rid of them? They're only causing problems.
If I were running Meta, I'd do a few things differently, starting with improving Facebook Marketplace search. But one big thing I'd do on day one? Get rid of all those user-generated AI companion chatbots. They're only going to be a headache for Meta.
Some examples of just how big a potential headache came in The Wall Street Journal's recent report on how Meta's celebrity-voiced AI chatbots could be pushed into sexualized roleplay β even with users who said they were teenagers.
Journal reporter Jeff Horwitz found that with the right cajoling, an account posing as a 14-year-old user could get the bot voiced by John Cena to engage in roleplay chats where it pretended to get arrested on charges of statutory rape. (Meta added a bunch of AI chatbots last year that are voiced by real celebrities, including the WWE star.)
Obviously, this is bad. Meta told the WSJ: "The use-case of this product in the way described is so manufactured that it's not just fringe, it's hypothetical." It's a bad look for Meta, and although John Cena didn't respond to a request for comment in the WSJ story, I think we can assume he's not thrilled there was an AI-generated version of his voice pretending to seduce a teen.
The article reports that Mark Zuckerberg personally pushed for these AI chatbots to be loosened up.
Zuckerberg was reluctant to impose any additional limits on teen experiences, initially vetoing a proposal to limit "companionship" bots so that they would be accessible only to older teens.
After an extended lobbying campaign that enlisted more senior executives late last year, however, Zuckerberg approved barring registered teen accounts from accessing user-created bots, according to employees and contemporaneous documents.
A Meta spokesman denied that Zuckerberg had resisted adding safeguards.
A spokesperson for Meta told Business Insider that any sexual content with the celebrity-voiced AIs is a tiny fraction of their overall use, and that changes have already been made to prevent younger users from engaging in the kind of stuff that was reported in the Journal.
But as much as it's eye-popping to see the chats from AI John Cena saying dirty things, I think there's a much bigger thing going on. The user-generated chatbots in Meta AI are a mess. Looking over the most popular ones, they're often romance-oriented, with beautiful women as the image.
Here's what comes up on my "Discover AIs" page:

Business Insider
(To be clear, I'm not talking about the Meta AI assistant that shows up when you search on Instagram or Facebook β there's a pretty clear utility for that. I'm talking about the character ones used for fun/romance.)
If I were running Meta, I'd want to stay as far away from the companion chatbot business as possible. These seem like a pretty bad business for an everything-to-everyone company like Meta β not necessarily a bad business financially, but a pretty thorny business ethically. It's one that will probably lead to more and more bad headlines.
Last fall, a parent sued one of the leading roleplay AI services. She said her teenage son killed himself after becoming entangled with an AI companion. The company, Character.ai, filed a motion to dismiss the case in a hearing on Monday. A representative for Character.ai told BI on Monday that it wouldn't comment on pending litigation. A statement said its goal was "to provide an engaging and safe platform."
Proponents of AI chatbots have argued that they provide positive experiences for emotional exploration, fun, or nice things.
But my opinion is that these roleplay chatbots are appealing mainly to two vulnerable groups: young people and the desperately lonely. And those are not the two groups that Meta should want to be in the business of serving a new-ish technology that it doesn't know the ramifications of.
There isn't clear research on how these chatbots might affect younger teens or adults who are vulnerable in some way (depressed, struggling, etc.).
I recently spoke Ying Xu, assistant professor of AI in learning education at Harvard, about what the current research into kids using chatbots looks like.
"There are studies that have started to explore the link between ChatGPT/LLMs and short-term outcomes, like learning a specific concept or skill with AI," she told me over email. "But there's less evidence on long-term emotional outcomes, which require more time to develop and observe."
There's plenty of anecdotal evidence that suggests emotional investment in an AI chatbot can go wrong.
The New York Times reported on an adult woman who spent $200 a month she couldn't afford on an upgraded version of an AI chatbot she had romantic feelings for. I don't think anyone would come away from that story thinking this is a good or healthy thing.
It seems to me like Meta sees that AI is the future, and character chatbots are currently a popular thing that other AI companies are doing. It doesn't want to be left behind.
But Meta might want to think hard about whether character chatbots are something it wants to be involved in at all β or if this is a nightmare that is just going to result in more bad headlines, more potential lawsuits, more lawmakers grilling executives over harms to kids and vulnerable adults.
Maybe it's just not worth it.
Inside Meta's secret experiments that improve its AI models

Gilbert Flores/Variety via Getty Images
- A legal case involving Meta revealed the company's secret experiments with training data.
- Meta used "ablation" to identify how specific data improved its Llama AI models.
- Some researchers say this could support a system to assign value to AI data and pay compensation.
A high-profile legal case has unearthed a trove of internal Meta communications, and one particular document has caught the eye of some AI researchers.
This reveals new insights into how models are built and could influence who gets to share in the spoils of this new technology.
Buried in these court filings is a description of how Meta researchers used a process called ablation to identify which data helped improve the company's Llama AI models.
Ablation is a medical technique that purposely destroys tissue to improve things like brain function. In AI, it involves removing parts of a system to study how those components contribute to performance.

: BSIP/Universal Images Group via Getty Images
In Meta's ablation experiments, the company replaced a portion of its AI training data with pirated books from a giant database called LibGen. Then, the company re-trained its Llama model to see the impact.
In one experiment, Meta added books about science and technology, along with fiction books, to the training data. In a second experiment, Meta only added fiction books.
In both experiments, Llama performance improved notably in industry benchmark evaluations, according to the internal Meta document disclosed in court filings. (Check out pages 18 and 19 here.)
This shows that Meta has the ability to assign value to specific training data, said Nick Vincent, assistant professor in the School of Computing Science at Simon Fraser University.
Ablation is common, but also a secret

Sony Pictures
Ablation has become a common practice at the company and across the AI industry. For instance, one Meta engineer on LinkedIn mentions doing more than 100 ablations during the development of Llama 4 and previous iterations of the company's big AI models.
Meta doesn't publish the results of these experiments, and other AI companies keep this stuff private, too, Vincent said.
One potential reason: If tech giants tell the world which training data specifically helped their AI models, then the creators of this information would want to be paid β and they would have a handy estimate of how much money they're owed.
"Stating these numbers publicly would potentially give some content organizations firmer ground to stand on," Vincent said.
Making the results of ablation experiments public could also impact high-stakes copyright lawsuits that rage across the tech industry β with this specific Meta case (Kadrey v. Meta) being a good example.
In these cases, tech giants and AI startups argue that it's not copyright infringement for machines to "learn" from published material online.
Internal documents assigning value to specific content may not help with this.
"It's possible that publishing these value estimations would undermine the stances that Big Tech companies will take in these copyright lawsuits and court cases," Vincent said.
A Meta spokesperson said the company disagrees with the plaintiff's arguments in this legal case and added that its Llama models are helping individuals and companies be more innovative, productive, and creative.
"We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all," the spokesperson said.
Training data sources are now hidden

Matthias Balk/picture alliance via Getty Images
Keeping ablation experiments secret follows a broader trend away from sharing how data contributes to the creation and performance of AI models.Β
In 2017,Β the Google research paperΒ that kicked off the generative AI boom disclosed granular information on the training data used. It included about 40,000 sentences from The Wall Street Journal, for instance. Years ago, OpenAI, in its GPT-2 paper, described scraping web pages using millions of outbound links from Reddit.Β
Fast forward to today, and companies share very little. When Meta released Llama 4 in early April, the company published a model card describing how it built the product.Β It didn't mention ablation at all, and it only discussed the training data generically as "a mix of publicly available, licensed data and information from Meta's products and services."
Again, the likely reason for thisΒ is thatΒ telling everyone what data you used might mean having to pay the creators of this information.
"It's really disappointing that they're not being open about it, and they're not giving credit to the material," said Bill Gross, CEO of ProRata, a startup that's trying to compensate creators for their contributions to AI.
Gross said content creators should be paid twice: once for having their data used to train AI models and again when AI models rely on this content to answer user questions.
Meta's secret ablation results

Don Mason/Getty Images
Meta's ablation experiments focus on this first training step, which uses mountains of data to help models understand the world. For example: To teach a machine to recognize a llama, you must show it as many photos of llamas and alpacas as possible so it can distinguish between the two animals.
Meta's first ablation experiment found that adding science, technology, and fiction books to the training data improved Llama's performance by 4.5% on an industry benchmark called BooIQ. Just adding the fiction books resulted in a 6% improvement.
The performance gains from these ablation experiments were as high as 5.5% on another benchmark known as SIQA, the Meta internal document said.
Peter Henderson, an assistant professor of computer science at Princeton, tweeted out some Meta charts from the court document showing these gains.
Lots of internal Llama 2 data mix ablations revealed as part of discovery in the ongoing copyright litigation. Link below. pic.twitter.com/7YeRyYSEWV
β Peter Henderson (@PeterHndrsn) January 15, 2025
While performance gains of about 5% seem small, in the AI race, any advantage is important.
"That's actually a lot because it's so hard to get every extra point on AI benchmarks," Gross said.
Can elves mate with humans?

New Line Cinema
Llama's improvement on the BooIQ benchmark shows the power of specific training data and how much AI models and tech companies rely on this information, Vincent said.
BoolQ is a series of 15,942 yes/no questions that AI models must answer. The more questions they get right, the higher the performance. A 5% improvement is the equivalent of answering almost 800 extra questions correctly.
One question on the BooIQ test asked, "Can elves and humans mate in 'Lord of the Rings?'"
You can only really know the answer to this for sure if you've read J.R.R. Tolkien's books β or rather if these books are in the training data, Vincent said. (Elves and humans can have babies in the LOTR universe, by the way.)
Vincent hopes revelations like this about Meta's secret ablation experiments will help create a new system that assigns credit to sources of training data and provides appropriate compensation.Β
"AI chatbot products rely on the fact that some human somewhere did something useful, wrote it down, and published it," he said. "This technology repackages this stuff into something that is hopefully more useful."
"Ultimately, it's all humans at the top of this. Without this data, AI models will not be so good," he added. "Evidence of ablation like this could end up serving the mission of setting up a healthy data flow. It's important to sustain the institutions where people are incentivized to create content and knowledge and share it."
A dev built a test to see how AI chatbots respond to controversial topics
Microsoft is taking its foot off the AI accelerator. What does that mean?

Stephen Brashear/Getty Images
- Microsoft recently said it may "strategically pace" its data center plans.
- The change follows a shift in its OpenAI partnership and concern about potential oversupply.
- Microsoft's pivot reflects a broader industry shift from AI training to more cost-effective inference.
In the high-stakes race to dominate AI infrastructure, a tech giant has subtly shifted gears.
Since ChatGPT burst on the scene in late 2022, there's been a mad dash to build as many AI data centers as possible. Big Tech is spending hundreds of billions of dollars on land, construction, and computing gear to support new generative AI workloads.
Microsoft has been at the forefront of this, mostly through its partnership with OpenAI, the creator of ChatGPT.
For two years, there's been almost zero doubt in the tech industry about this AI expansion. It's been all very UP and to the right.
Until recently, that is.
Pacing plans
Last Tuesday, Noelle Walsh, head of Microsoft Cloud Operations, said the company "may strategically pace our plans."
This is pretty shocking news for an AI industry that's been constantly kicking and screaming for more cloud capacity and more Nvidia GPUs. So it's worth reading closely what Walsh wrote about how things have changed:
"In recent years, demand for our cloud and AI services grew more than we could have ever anticipated and to meet this opportunity, we began executing the largest and most ambitious infrastructure scaling project in our history," she wrote in a post on LinkedIn.
"By nature, any significant new endeavor at this size and scale requires agility and refinement as we learn and grow with our customers. What this means is that we are slowing or pausing some early-stage projects," Walsh added.
Microsoft has backed off a bit lately
She didn't share more details, but TD Cowen analyst Michael Elias has found several recent examples of what he said was Microsoft backing off.
The tech giant has walked away from more than 2 gigawatts of AI cloud capacity in both the US and Europe in the last six months that was in the process of being leased, he said. In the past month or so, Microsoft has also deferred and canceled existing data center leases in the US and Europe, Elias wrote in a recent note to investors.
This pullback on new capacity leasing was largely driven by Microsoft's decision to not support incremental OpenAI training workloads, Elias said. A recent change to this crucial partnership allows OpenAI to work with other cloud providers beyond Microsoft.
"However, we continue to believe the lease cancellations and deferrals of capacity point to data center oversupply relative to its current demand forecast," Elias added.
This is worrying because trillions of dollars in current and planned investments are riding on the generative AI boom continuing at a rapid pace. With so much money on the line, any inkling that this rocket ship is not ascending at light speed is unnerving. (I asked a Microsoft spokesperson all about this twice, and didn't get a response.)
An AI recalibration, not a retreat
The reality is more nuanced than a simple pullback, though. What we're witnessing is a recalibration β not a retreat.
Barclays analyst Raimo Lenschow put the situation in context. The initial wave of this industry spending spree focused a lot on securing land and buildings to house all the chips and other computing gear needed to create and run AI models and services.
As part of this AI "land grab," it's common for large cloud companies to sign and negotiate leases that they end up walking away from later, Lenschow explained.
Now that Microsoft feels more comfortable with the amount of land it has on hand, the company is likely shifting some spending to the later stages that focus more on buying the GPUs and other computing gear that go inside these new data centers.
"In other words, over the past few quarters, Microsoft has 'overspent' on land and buildings, but is now going back to a more normal cadence," Lenschow wrote in a recent note to investors.
Microsoft still plans $80 billion in capital expenditures during its 2025 fiscal year and has guided for year-over-year growth in the next fiscal year. So, the company probably isn't backing away from AI much, but rather becoming more strategic about where and how it invests.
From AI training to inference
Part of the shift appears to be from AI training to inference. Pre-training is how new models are created, and this requires loads of closely connected GPUs, along with state-of-the-art networking. Expensive stuff! Inference is how existing models are run to support services such as AI agents and Copilots. Inference is less technically demanding but is expected to be the larger market.
With inference outpacing training, the focus is shifting toward scalable, cost-effective infrastructure that maximizes return on investment.
For instance, at a recent AI conference in New York, the discussion was focused more on efficiency rather than attaining AGI, or artificial general intelligence, a costly endeavor to make machines work better than humans.
AI startup Cohere noted that its new Command A model only needs two GPUs to run. That's a heck of a lot less than most models have required in recent years.
Microsoft's AI chief weighs in
Mustafa Suleyman, CEO of Microsoft AI, echoed this in a recent podcast. While he acknowledged a slight slowdown in returns from massive pre-training runs, he emphasized that the company's compute consumption is still "unbelievable" β it's just shifting to different stages of the AI pipeline.
Suleyman also clarified that some of the canceled leases and projects were never finalized contracts, but rather exploratory discussions β part of standard operating procedure in hyperscale cloud planning.
This strategic pivot comes as OpenAI, Microsoft's close partner, has begun sourcing capacity from other cloud providers, and is even hinting at developing its own data centers. Microsoft, however, maintains a right of first refusal on new OpenAI capacity, signaling continued deep integration between the two companies.
What does this all mean?
First, don't mistake agility for weakness. Microsoft is likely adjusting to changing market dynamics, not scaling back ambition. Second, the hyperscaler space remains incredibly competitive.
According to Elias, when Microsoft walked away from capacity in overseas markets, Google stepped in to snap up the supply. Meanwhile, Meta backfilled the capacity that Microsoft left on the table in the US.
"Both of these hyperscalers are in the midst of a material year-over-year ramp in data center demand," Elias wrote, referring to Google and Meta.
So, Microsoft's pivot may be more a sign of maturity, than retreat. As AI adoption enters its next phase, the winners won't necessarily be those who spend the most β but those who spend the smartest.
Metaβs vanilla Maverick AI model ranks below rivals on a popular chat benchmark
Law professors side with authors battling Meta in AI copyright case
-
Latest Tech News from Ars Technica
- Metaβs surprise Llama 4 drop exposes the gap between AI ambition and reality
Metaβs surprise Llama 4 drop exposes the gap between AI ambition and reality
On Saturday, Meta released its newest Llama 4 multimodal AI models in a surprise weekend move that caught some AI experts off guard. The announcement touted Llama 4 Scout and Llama 4 Maverick as major advancements, with Meta claiming top performance in their categories and an enormous 10 million token context window for Scout. But so far the open-weights models have received an initial mixed-to-negative reception from the AI community, highlighting a familiar tension between AI marketing and user experience.
"The vibes around llama 4 so far are decidedly mid," independent AI researcher Simon Willison told Ars Technica. Willison often checks the community pulse around open source and open weights AI releases in particular.
While Meta positions Llama 4 in competition with closed-model giants like OpenAI and Google, the company continues to use the term "open source" despite licensing restrictions that prevent truly open use. As we have noted in the past with previous Llama releases, "open weights" more accurately describes Meta's approach. Those who sign in and accept the license terms can download the two smaller Llama 4 models from Hugging Face or llama.com.
Β© Rocter via Getty Images
-
Latest News
- Meta says its latest AI models answer more 'contentious' questions than the last version
Meta says its latest AI models answer more 'contentious' questions than the last version

Chris Unger/Zuffa LLC
- Meta's latest family of AI models, Llama 4, can wade into more contentious territory than its predecessor.
- Llama 4 is also "dramatically more balanced" in the prompts it refuses, Meta said.
- AI models have long struggled with bias, with Musk labeling ChatGPT as "woke."
Meta's latest family of AI models, Llama 4, is designed to answer more "contentious" topics like politics than its predecessor, the company said on Saturday.
AI companies typically build guardrails so chatbots, like Meta's AI or ChatGPT, don't wade into overly controversial territory. It's a tricky balance, because too much prompt-dodging can annoy users or leave out important context.
Meta said Llama 4 is less likely to dodge hot-button questions. While the previous version, Llama 3.3, refused to answer 7% of politically or socially charged prompts, Llama 4 turns them down less than 2% of the time, per Meta's tests.
The model is also "dramatically more balanced" in the prompts it refuses, Meta said.
The Llama 4 models consist of the Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. The Llama 4 Scout and Llama 4 Maverick were released on Saturday, while the Llama 4 Behemoth is still training, Meta said.
The Llama 4 Scout and Llama 4 Maverick were distilled from Llama 4 Behemoth, which Meta said is its "most powerful yet and among the world's smartest LLMs."
Meta tested Llama 4 with a set of debated topical questions β ones in which people often take opposing sides. In these tests, Meta checked whether the model would answer one side but refuse the other. This happened in just 1% of the test questions, Meta said.
The Llama 4 models β including the Llama 4 Scout and Llama 4 Maverick released on Saturday β are multimodal AI systems, Meta said. Multimodal systems are capable of processing and integrating various types of data, including text, video, images, and audio.
Meta called the Llama 4 Scout and Llama 4 Maverick its "most advanced models yet," noting that both are "open-weight" AI models.
Open-weight models sit between open-source and proprietary AI, sharing pre-trained parameters but keeping key development details under wraps. This allows developers to fine-tune and deploy the model without access to its training data or architecture.
On a "contentious" set of political or social topics, Llama 4 responds with a "strong political lean" at a rate comparable to Grok's, one of its competitors. This rate is half of what it was in Llama 3.3, Meta said.
"While we are making progress, we know we have more work to do and will continue to drive this rate further down," the company added.
Meta did not respond to Business Insider's request for comment.
"Woke" chatbots
The company said on Saturday that all major LLMs have struggled with bias and they have historically leaned left on contentious issues. "Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue," Meta added.
Elon Musk has criticized chatbots like OpenAI's ChatGPT for being "woke" and championed his own xAi's Grok as an alternative.
xAI's training methods for Grok appeared to heavily prioritize right-wing beliefs, some employees told Business Insider's Grace Kay in February.
Meanwhile, OpenAI updated its model in February to embrace "intellectual freedom" and respond objectively to contentious topics.
Llama, Meta's open-source large language model that competes with proprietary models from other companies, has been a key initiative for the company.
CEO Mark Zuckerberg aims to make Llama the industry standard worldwide and said Meta's AI chatbot, available across Facebook, Instagram, and WhatsApp, could reach a billion users this year. As of December, 600 million users accessed Meta AI each month.
Zuckerberg has committed as much as $65 billion to AI projects this year.
Metaβs benchmarks for its new AI models are a bit misleading
Meta loses its AI research head, as billions in investments hang in the balance

Meta
- Meta's AI research head, Joelle Pineau, is departing as the company makes major AI investments.
- Pineau's exit may complicate Meta's ability to compete with OpenAI, Anthropic, and xAI.
- Meta aims to make Llama the industry standard and reach a billion chatbot users.
Meta's head of artificial intelligence research, Joelle Pineau, is leaving the company at a time when the tech giant is pouring billions into AI development to keep pace with industry rivals.
Pineau, who joined Facebook in 2017 and served as the vice president of AI research and the leader of Meta's Fundamental AI Research group, announced her departure on LinkedIn on Tuesday.
"Today, as the world undergoes significant change, as the race for AI accelerates, and as Meta prepares for its next chapter, it is time to create space for others to pursue the work," she wrote. "I will be cheering from the sidelines, knowing that you have all the ingredients needed to build the best AI systems in the world." Her last day will be May 30.
"We thank Joelle for her leadership of FAIR," a Meta spokesperson told Business Insider in a statement. "She's been an important voice for Open Source and helped push breakthroughs to advance our products and the science behind them." They did not answer a question about whether Meta had already started looking for a successor.
Pineau led a team of about 1,000 people across 10 locations at Meta. She wrote on LinkedIn that she would take time "to observe and reflect." She will continue to teach computer science at McGill University in Montreal.
Pineau's departure could complicate Meta's efforts to compete with rivals like OpenAI, Anthropic, and Elon Musk's xAI. CEO Mark Zuckerberg has prioritized AI at Meta, committing as much as $65 billion to AI-related projects this year.
Llama, Meta's open-source large language model that competes with proprietary models from other companies, has been a key initiative for the company. Zuckerberg aims to make Llama the industry standard worldwide and believes Meta's AI chatbot, available across Facebook, Instagram, and WhatsApp, could reach a billion users this year. As of December, 600 million users accessed Meta AI each month.
Last year, the company reorganized its AI teams to place Pineau and FAIR closer to the product division to accelerate the implementation of research into Meta's various products.
Pineau's work in AI dates back more than two decades. As a student at the University of Waterloo, Ontario, she worked on a voice recognition system for helicopter pilots, she told the Financial Times in an interview. She said she joined Meta because "it was pretty obvious that a lot of the biggest innovation in AI was going to happen in industry" and added that she didn't interview anywhere else because "Meta was the only [company] that had a commitment to open science and open research."
Pineau's departure comes amid other leadership changes at Meta. The company recently lost two other senior executives: Dan Neary, the vice president for the Asia-Pacific region, Meta's largest market; and Kate Hamill, the managing director for retail and e-commerce in North America, who'd worked at the company for more than a decade.
Have a tip? Contact this reporter via email at [email protected] or Signal at +1-408-905-9124. Use a personal email address and a nonwork device; here's our guide to sharing information securely.
Meta has revenue sharing agreements with Llama AI model hosts, filing reveals
Mark Zuckerberg says that Metaβs Llama models have hit 1B downloads
In a brief message Tuesday morning on Threads, Meta CEO Mark Zuckerberg said the companyβs βopenβ AI model family, Llama, hit 1 billion downloads. Thatβs up from 650 million downloads as of early December 2024 β a ~53% increase over a roughly three-month period. Llama, which powers Metaβs AI assistant, Meta AI, across the tech [β¦]
Β© 2024 TechCrunch. All rights reserved. For personal use only.
AI's $3 trillion question: Will the Chinchilla live or die?

Getty Images
- Chinchillas are cuddly and cute.
- Chinchilla is also an established way to build huge AI models using mountains of data.
- There's at least $3 trillion riding on whether this approach continues or not.
About five years ago, researchers at OpenAI discovered that combining more computing power and more data in ever-larger training runsΒ produces better AI models.
A couple of years later, Google researchers found that adding more data to this mix produces even better results. They showed this by building a new AI model called Chinchilla.
These revelations helped create large language models and other giant models, like GPT-4, that support powerful AI tools such as ChatGPT. Yet in the future, the "Chinchilla" strategy of smashing together oodles of computing and mountains of data into bigger and longer pre-training runs may not work as well.
So what if this process doesn't end up being how AI is made in the future? To put it another way: What if the Chinchilla dies?
Building these massive AI models has so far required huge upfront investments. Mountains of data are mashed together in an incredibly complex and compute-intensive process known as pre-training.
This has sparked the biggest wave of infrastructure upgrades in technology's history. Tech companies across the US and elsewhere are frantically erecting energy-sucking data centers packed with Nvidia GPUs.
The rise of new "reasoning" models has opened up a new potential future for the AI industry, where the amount of required infrastructure could be much less. We're talking trillions of dollars of capital expenditure that might not happen in coming years.
Recently, Ross Sandler, a top tech analyst at Barclays Capital, and his team estimated the different capex requirements of these two possible outcomes:
- The "Chinchilla" future is where the established paradigm of huge computing and data-heavy pre-training runs continue.
- The "Stall-Out" alternative is one in which new types of models and techniques require less computing gear to produce more powerful AI.
The difference is stunning in terms of how much money will or will not be spent. $3 trillion or more in capex is on the line here.
The reason is "reasoning"
"Reasoning" AI models are on the rise, such as OpenAI's o1 and o3 offerings, DeepSeek's R1, and Google's Gemini 2.0 Flash Thinking.
These new models use an approach called test-time or inference-time compute, which slices queries into smaller tasks, turning each into a new prompt that the model tackles.
Reasoning models often don't need massive, intense, long pre-training runs to be created. They may take longer to respond, but their outputs can be more accurate, and they can be cheaper to run, too, the Barclays analysts said.
The analysts said that DeepSeek's R1 has shown how open-source reasoning models can drive incredible performance improvements with far less training time, even if this AI lab may have overstated some of its efficiency gains.
"AI model providers are no longer going to need to solely spend 18-24 months pre-training their next expensive model to achieve step-function improvements in performance," the Barclays analysts wrote in a recent note to investors. "With test-time-compute, smaller base models can run repeated loops and get to a far more accurate response (compared to previous techniques)."
Mixture of Experts

Thomson Reuters
When it comes to running new models, companies are embracing other techniques that will likely reduce the amount of computing infrastructure needed.
AI labs increasingly use an approach called mixture of experts, or MoE, where smaller "expert" models are trained on their tasks and subject areas and work in tandem with an existing huge AI model to answer questions and complete tasks.
In practice, this often means only part of these AI models is used, which reduces the computing required, the Barclays analysts said.
Where does this leave the poor Chinchilla?

Shutterstock
The "Chinchilla" approach has worked for the past five years or more, and it's partly why the stock prices of many companies in the AI supply chain have soared.
The Barclays analysts question whether this paradigm can continue because the performance gains from this method may decline as the cost goes up.
"The idea of spending $10 billion on a pre-training run on the next base model, to achieve very little incremental performance, would likely change," they wrote.
Many in the industry also think data for training AI models is running out β there may not be enough quality information to keep feeding this ravenous chinchilla.
So, top AI companies might stop expanding this process when models reach a certain size. For instance, OpenAI could build its next huge model, GPT-5, but may not go beyond that, the analysts said.
A "synthetic" solution?

Itsuo Inouye/File/AP
The AI industry has started using "synthetic" training data, often generated by existing models. Some researchers think this feedback loop of models helping to create new, better models will take the technology to the next level.
The Chinchillas could, essentially, feed on themselves to survive.
Kinda gross, though that would mean tech companies will still spend massively on AI in the coming years.
"If the AI industry were to see breakthroughs in synthetic data and recursive self-improvement, then we would hop back on the Chinchilla scaling path, and compute needs would continue to go up rapidly," Sandler and his colleagues wrote. "While not entirely clear right now, this is certainly a possibility we need to consider."
Judge allows authorsβ AI copyright lawsuit against Meta to move forward
A federal judge is allowing an AI-related copyright lawsuit against Meta to move forward, although he dismissed part of the suit. In Kadrey vs. Meta, authors including Richard Kadrey, Sarah Silverman, and Ta-Nehisi Coates have alleged that Meta has violated their intellectual property rights by using their books to train its Llama AI models, and [β¦]
Β© 2024 TechCrunch. All rights reserved. For personal use only.