โŒ

Reading view

There are new articles available, click to refresh the page.

The pros and cons of making advanced chips in America

An Asian man presses his face against a clear box holding a computer chip
As AI chip designs diversify beyond Nvidia's GPU, US semiconductor fabs press their noses up against the window of the AI boom.

AP Photo/Ng Han Guan

  • Most AI chips are made in Taiwan by Taiwan Semiconductor Manufacturing Company.
  • Startups focused on lowering the cost of AI are working with US manufacturers.
  • AI chips are being made at fabrication facilities in New York and Arizona.

Attempting to compete with Nvidia is daunting, especially when it comes to manufacturing.

Nvidia and most of its competitors don't produce their own chips. They vie for capacity from the world's most advanced chip fabricator: Taiwan Semiconductor Manufacturing Company. Nvidia may largely control which companies get the latest and most powerful computing machines, but TSMC decides how many Nvidia can sell. The relationship between the two companies fascinates the industry.

But the bottom line is that there's no manufacturer better and there's no getting ahead of Nvidia for the types of manufacturing capacity relevant to AI.

Still, a few startups think they can find an advantage amid Nvidia's dominance and the ever-fluctuating dynamics surrounding the island nation of Taiwan by tapping chip fabs in the United States.

Positron AI, founded by Thomas Sohmers in 2023, has designed a chip architecture optimized for transformer models โ€” the kind on which OpenAI's GPT models are built. With faster access to more memory, Sohmers claims Postiron's architecture can compete on performance and price for AI inference, which is the computation needed to produce an answer to a query after a model has been trained.

Positron's system has "woefully less FLOPS" than an Nvidia GPU, Sohmers joked. However, his architecture is intended to compensate for this with efficiency for Positron and its customers.

Smaller fabs are 'hungrier'

Positron's chips are made in Chandler, Arizona, by Intel-owned firm, Altera.

Intel acquired Altera, which specializes in a specific type of programmable chip, in 2015. In 2023, some early Positron employees and advisors came from Altera โ€” bringing relationships and trust. The early partnership has given Positron some small influence over Altera's path and a cheaper, more flexible manufacturing partner.

The cost of AI comes from the chip itself and the power needed to make it work. Cutting costs on the chip means looking away from TSMC, Sohmers says, which currently holds seemingly infinite bargaining power.

"Fundamentally, Positron is trying to provide the best performance per dollar and performance per watt," Sohmers said.

Compared to other industries, AI offers a rare proposition: US production is often cheaper.

"In most other industries, made in the USA actually means that it's going to be more expensive. That's not the case for semiconductors โ€” at least for now," Sohmers said.

Many fabs are eager to enter the AI game, but they don't have the same technical prowess, prestige, or track record, which can make finding customers challenging.

Startups, which often lack the high order volumes that carry market power, are a good fit for these fabs, Sohmers said. These less in-demand fabs offer more favorable terms, too, which Sohmers hopes will keep Positron competitive on price.

"If I have some optionality going with someone that is behind but has the ambition to get ahead, it's always good from a customer or partner perspective," he said, adding, "It gives both leverage."

Taking advantage of US fabs has kept the amount of funding Positron needs within reason and made it easier to scale, Sohmers said.

Positron isn't alone. Fellow Nvidia challenger Groq partners with GlobalFoundries in upstate New York and seeks to make a similar dent in the AI computing market by offering competitive performance at a lower price.

Less inherent trust

It's not all upside though. Some investors have been skeptical, Sohmers said. And as an engineer, not going with the best fab in the world can feel strange.

"You have a lot more faith that TSMC is going to get to a good yield number on a new design pretty quickly and that they have a good level of consistency while, at other fabs, it can be kind of a dice roll," he said.

With a global supply chain, no semiconductor is immune from geopolitical turmoil or the shifting winds of trade policy. So, the advantages of exiting the constantly simmering tension between Taiwan, China, and the US serve as a counterweight to any skepticism.

Positron is also working on sourcing more components and materials in North America, or at least outside China and Taiwan.

Sourcing from Mexico, for example, offers greater safety from geopolitical turmoil. The simpler benefit is that shipping is faster so prototyping can happen quickly.

It's taken a while, but Sohmers said the industry is waking up to the need for more players across the AI space.

"People are finally getting uncomfortable with Nvidia having 90-plus percent market share," he said.

Got a tip or an insight to share? Contact BI's senior reporter Emma Cosgrove at [email protected] or use the secure messaging app Signal: 443-333-9088.

Read the original article on Business Insider

Chip startups are making these New Year's resolutions to take on Nvidia in 2025

Jensen Huang speaking on stage
Nvidia CEO Jensen Huang.

Chip Somodevilla/Getty Images

  • The AI computing market may shift in 2025, opening opportunities for smaller companies.
  • Nvidia dominates AI computing. Evolving workloads could benefit competitors.
  • Companies like Groq, Positron, and SambaNova focus on inference to challenge Nvidia's market hold.

In 2025, the tides may turn for companies hoping to compete with the $3 trillion gorilla in AI computing.

Nvidia holds an estimated 90% of the market share for AI computing. Still, as the use of AI grows,ย workloads are expected to change, and this evolution may give companies with competitive hardware an opening.

In 2024, the majority of AI compute spend shifted to inference, Thomas Sohmers, CEO of chip startup Positron AI, told BI. This will "continue to grow on what looks like an exponential curve," he added.

In AI, inference is the computation needed to produce the response to a user's query or request. The computing required to teach the model the knowledge needed to answer is called "training." Creating OpenAI's image generation platform Sora, for example, represents training. Each user who instructs it to create an image represents an inference workload.

OpenAI's other models have Sohmers and others excited about the growth in computing needs in 2025.

OpenAI's o1 and o3, Google's Gemini 2.0 Flash Thinking, and a handful of other AI models useย more compute-intensive strategies to improve results after training. These strategies are often called inference-time computing, chain-of-thought, chain-of-reasoning, or reasoning models.

Simply put, if the models think more before they answer, the responses are better. That thinking comes at a cost of time and money.

The startups vying for some of Nvidia's market share are attempting to optimize one or both.

Nvidia already benefits from these innovations, CEO Jensen Huang said on the company's November earnings call. Huang's wannabe competitors are betting that in 2025, new post-training strategies for AI will benefit all purveyors of inference chips.

Business Insider spoke to three challengers about their hopes and expectations for 2025. Here are their New Year's resolutions.

What's one thing within your control that could make 2025 a big year for alternative chips?

A tattooed man in a black shirt and jeans stands on a stage with a pink and black background that read Groq: what's next?
Mark Heaps is the chief technology evangelist for Nvidia challenger Groq.

Groq

Mark Heaps, chief technology evangelist, Groq:

"Execution, execution, execution. Right now, everybody at Groq has decided not to take a holiday break this year. Everyone is executing and building the systems. We are all making sure that we deliver to the opportunity that we've got because that is in our control.

I tell everyone our funnel right now is carbonated and bubbling over. It's unbelievable, the amount of customer interest. We have to build more systems, and we have to stand up those systems so we can serve the demand that we've got. We want to serve all those customers. We want to increase rate limits for everybody."

Rodrigo Liang, CEO, SambaNova Systems:

"For SambaNova, the most critical factor is executing on the shift from training to inference. The industry is moving rapidly toward real-time applications, and inference workloads are becoming the lion's share of AI demand. Our focus is on ensuring our technology enables enterprises to scale efficiently and sustainably."

Thomas Sohmers, CEO, Positron:

"My belief is if we can actually deploy enough compute โ€” which thankfully I think we can from a supply chain perspective โ€” by deploying significantly more inference-specific compute, we're going to be able to grow the adoption rate of 'chain of thoughts' and other inference-additional compute."

What's one thing you're hoping for that's not in your control for 2025?

Rodrigo Liang SambaNova Systems
Rodrigo Liang, CEO and cofounder of SambaNova Systems.

SambaNova Systems

Heaps:

"It's about customers recognizing that there are novel advancements against incumbent technologies. There's a lot of folks that have told us, 'We like what you have, but to use the old adage and rephrase it: No one ever got fired for buying from โ€” insert incumbent.'

But we know that it's starting to boil up. People are realizing it's hard for them to get chips from the incumbent, and it's also not as performant as Groq is. So my wish would be that people are willing to take that chance and actually look to some of these new technologies."

Liang:

"If I had a magic wand, I'd address the power challenges around deploying AI. Today, most of the market is stuck using power-hungry hardware that wasn't designed for inference at scale. The result is an unsustainable approach โ€” economically and environmentally.

At SambaNova, we've proven there's a better way. Our architecture consumes 10 times less power, making it possible for enterprises to deploy AI systems that meet their goals without blowing past their power budgets or carbon targets. I'd like to see the market move faster toward adopting technologies that prioritize efficiency and sustainability โ€” because that's how we ensure AI can scale globally without overwhelming the infrastructure that supports it."

Sohmers:

"I would like people to actually adopt these chain of thought capabilities at the fastest rate possible. I think that is a huge shift โ€” from a capabilities perspective. You have 8 billion parameter models surpassing 70 billion parameter models. So I'm trying to do everything I can to make that happen."

What's your New Year's resolution?

Positron AI executives stand near the startup's products
Positron AI executives. From left to right: Edward Kmett, Thomas Sohmers, Adam Huson, and Greg Davis.

Positron AI

Heaps:

"In the last six months, I've gone to a number of hackathons, and I've met developers. It's deeply inspiring. So my New Year's resolution is to try to amplify the signal of the good that people are doing with AI."

Liang:

"Making time for music. Playing guitar is something I've always loved, and I would love to get back into it. Music has this incredible way of clearing the mind and sparking creativity, which I find invaluable as we work to bring SambaNova's AI to new corners of the globe."

Sohmers:

I want to do as much to encourage the usage of these new tools to help, you know, my mom. Part of the reason I got into technology was because I wanted to see these tools lift up people to be able to do more with their time โ€” to learn everything that they want beyond whatever job they're in. I think that bringing the cost down of these things will enable that proliferation.

I also personally want to see and try to use more of these things outside of my just work context because I've been obsessively using the o1 Pro model for the past few weeks, and it's been amazing for my personal work. But when I gave access to my mom what she would do with it was pretty interesting โ€” those sort of normal, everyday person tasks for these things where it truly is being an assistant."

Read the original article on Business Insider

Groq is 'unleashing the beast' to chip away at Nvidia's CUDA advantage

A tattooed man in a black shirt and jeans stands on a stage with a pink and black background that read Groq: what's next?
Mark Heaps is the chief technology evangelist for Nvidia challenger Groq

Groq

  • Groq is taking a novel approach to competing with Nvidia's much-lauded CUDA software.
  • The chip startup is using a free inference tier to attract hundreds of thousands of AI developers.
  • Groq aims to capture market share with faster inference and global joint ventures.

There is an active debate about Nvidia's competitive moat. Some say there's a prevailing perception of a 'safe' choice when investing billions in a technology, in which the return is still uncertain.

Many say it's Nvidia's software, particularly CUDA, which the company began developing decades before the AI boom. CUDA allows users to get the most out of graphics processing units.

Competitors have attempted to make comparable systems, but without Nvidia's headstart, it has been tough to get developers to learn, try, and ultimately improve their systems.

Groq, however, is an Nvidia competitor that focused early on the segment of AI computing that requires less need for directly programming chips, and investors are intrigued. The 8-year-old AI chip startup was valued at $2.8 billion at its $640 million Series D round in August.

Though at least one investor has called companies like Groq 'insane' for attempting to dent Nvidia's estimated 90% market share, the startup has been building its technology exactly for the opportunity that is coming in 2025, Mark Heaps, Groq's "chief tech evangelist" said.

'Unleashing the beast'

"What we decided to do was take all of our compute, make it available via a cloud instance, and we gave it away to the world for free," Heaps said. Internally, the team called the strategy, "unleashing the beast". Groq's free tier caps users at a ceiling marked by requests per day or tokens per minute.

Heaps, CEO and ex-Googler Jonathan Ross, and a relatively lean team have spent 2023 and 2024 recruiting developers to try Groq's tech. Through hackathons and contests, the company makes a promise โ€” try the hardware via Groq's cloud platform for free, and break through walls you've hit with others.

Groq offers some of the fastest inference out there, according to rankings on Artificialanalysis.ai, which measures cost and latency for companies that allow users to buy access to specific models by the token โ€” or output.

Inference is a type of computing that produces the answers to queries asked of large language models. Training, the more energy-intensive type of computing, is what gives the models the ability to answer. So far, the hardware used for those two tasks has been different.

Heaps and several of his Nvidia-challenging cohorts at companies like Cerebras and SambaNova Systems said that speed is a competitive advantage.

After the inference service was available for free, developers came out of the woodwork, he said, with projects that couldn't be successful on slower chips. With more speed, developers can send one request through multiple models and use another model to choose the best response โ€” all in the time it would usually take to fulfill just one request.

Roughly 652,000 developers are now using Groq API keys, Heaps said.

Heaps expects speed to hook developers on Groq. But its novel plan for programming its chips gives the company a unique approach to the most crucial element within Nvidia's "moat."

No need for CUDA libraries

"Everybody, once they deployed models, was gonna need faster inference at a lower cost, and so that's what we focused on," Heaps said.

So where's the CUDA equivalent? It's all in-house.

"We actually have more than 1800 models built into our compiler. We use no kernels, and we don't need people to use CUDA libraries. So because of that, people can just start working with a model that's built-in," Heaps said.

Training, he said, requires more customization at the chip level. In inference, Groq's task is to choose the right models to offer customers and ensure they run as fast as possible.

"What you're seeing with this massive swell of developers who are building AI applications โ€” they don't want to program at the chip level," he added.

The strategy comes with some level of risk. Groq is unlikely to accumulate a stable of developers who continuously troubleshoot and improve its base software like CUDA has. Its offering may be more like a restaurant menu than a grocery store. But this also means the barrier to entry for Groq users is the same as any other cloud provider and potentially lower than that of other chips.

Though Groq started out as a company with a novel chip design, today, of the company's roughly 300 employees, 60% are software engineers, Heaps said.

"For us right now, there is a billions and billions of dollars industry emerging, that we can go capture a big share of market in, while at the same time, we continue to mature the compiler," he said.

Despite being realistic about the near-term, Groq has lofty ambitions, which board CEO Jonathan Ross has described as "providing half the world's inference." Ross also says the goal is to cast a net over the globe โ€” to be achieved via joint ventures. Saudi Arabia is on the way. Canada and Latin America are in the works.

Earlier this year, Ross told BI the company also has a goal to ship 108,000 of its language processing units or LPUs by the first quarter of next year โ€” and 2 million chips by the end of 2025, most of which will be made available through its cloud.

Have a tip or an insight to share? Contact Emma at [email protected] or use the secure messaging app Signal: 443-333-9088

Read the original article on Business Insider

โŒ