Reading view

There are new articles available, click to refresh the page.

Why extracting data from PDFs is still a nightmare for data experts

For years, businesses, governments, and researchers have struggled with a persistent problem: How to extract usable data from Portable Document Format (PDF) files. These digital documents serve as containers for everything from scientific research to government records, but their rigid formats often trap the data inside, making it difficult for machines to read and analyze.

"Part of the problem is that PDFs are a creature of a time when print layout was a big influence on publishing software, and PDFs are more of a 'print' product than a digital one," Derek Willis, a lecturer in Data and Computational Journalism at the University of Maryland, wrote in an email to Ars Technica. "The main issue is that many PDFs are simply pictures of information, which means you need Optical Character Recognition software to turn those pictures into data, especially when the original is old or includes handwriting."

Computational journalism is a field where traditional reporting techniques merge with data analysis, coding, and algorithmic thinking to uncover stories that might otherwise remain hidden in large datasets, which makes unlocking that data a particular interest for Willis.

Read full article

Comments

© Vertigo3d via Getty Images

Mistral adds a new API that turns any PDF document into an AI-ready Markdown file

On Thursday French large language model (LLM) developer Mistral launched a new API for developers who handle complex PDF documents. Mistral OCR is an optical character recognition (OCR) API that can turn any PDF into a text file to make it easier for AI models to ingest. LLMs, which underpin popular GenAI tools like OpenAI’s […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral urges telcos to get into the hyperscaler game

Mistral CEO Arthur Mensch brought a sales pitch to Mobile World Congress on Tuesday, urging delegates at the world’s biggest telecoms confab in Barcelona to invest in building data center infrastructure and “becoming hyperscalers” to boost the regional AI ecosystem. “We would welcome more domestic effort in making more data centers,” he said during an onstage […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral’s Le Chat tops 1M downloads in just 14 days

A couple of weeks after the initial release of Mistral’s AI assistant, Le Chat, the company told Le Parisien that it has reached one million downloads. In particular, Le Chat quickly reached the top spot for free downloads on the iOS App Store in the company’s home country, France. “Go and download Le Chat, which […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral's CEO Arthur Mensch tells BI that DeepSeek is a win for the open-source ecosystem

Mistral CEO Arthur Mensch
Mistral CEO Arthur Mensch.

AURELIEN MORISSARD/POOL/AFP via Getty Images

  • Mistral's CEO, Arthur Mensch, told BI that DeepSeek is a "great moment for open-source models."
  • Mistral, a Paris-based startup that uses open-source, is seen as Europe's answer to OpenAI.
  • Mensch said Mistral is focusing on agentic AI with a revamped version of its Le Chat app.

While markets panicked about DeepSeek, the CEO of Mistral welcomed the Chinese startup's new models as a boost for the open-source world.

Since its launch in 2023, Paris-based AI startup Mistral has advocated for open-source, which, broadly, is software that anyone can use or modify.

"We like to think of DeepSeek as the Mistral of China," Arthur Mensch told Business Insider in an interview. "We think that this is a great moment for open-source models, and companies like this — Mistral and DeepSeek — have participated in that and built on top of each other."

Like DeepSeek, many of Mistral's large language models are open-weight, which means they primarily share the model parameters rather than the full code base.

Mensch is one of many in the tech industry who point to the benefits of open source, such as faster development cycles and no extensive licensing fees.

"We've always been believers in open-source prevailing because of the flywheel, because of everybody building on top of each other, because of the fact that it just makes things more efficient," Mensch told BI.

Last month, investors questioned the scale of large AI infrastructure investments after DeepSeek released R1 models that claimed to match the performance of rival models like ChatGPT — but at a fraction of the cost.

In addition to raising questions about efficiency, it also prompted a rethink in the AI world about open versus closed source.

In a Reddit AMA session earlier this month, OpenAI boss Sam Altman admitted to being "on the wrong side of history" regarding his stance on open-source AI. He added that the company may need to figure out a new strategy instead of its closed-source approach.

Mensch said Mistral, which is seen as Europe's answer to OpenAI, has always prioritized efficiency — including when it comes to capital.

With $1 billion in funding from the likes of Andreessen Horowitz, Lightspeed, and General Catalyst, Mistral takes the mantle of Europe's most-capitalized AI startup.

"It doesn't take $100 billion to make enterprises adopt a technology and adapt it to their use cases," he said. "We're a very well-capitalized company, but we're not spending hundreds of billions."

Mistral's agentic AI and IPO plans

In addition to touting its open-source ambitions, Mensch told BI that Mistral would expand its capabilities in agentic AI, a type of software that can perform tasks autonomously.

The startup is doing so with a revamped version of Le Chat, which launched as an app for the first time last week.

"Le Chat is a place where you can build your agents, and you should see this as a place where every employee and consumer can create automation and spawn agents," he added. The best aspect of AI agents is that non-developers can also use them," Mensch said.

The startup also has goals to reduce its dependence on Big Tech; before the Paris Action Summit this week, it announced that it would invest billions of euros in building its own data center in France.

"It's a choice we are making to have control over the whole value chain, from the machine to the software," Mensch told French TV channel TFI.

Mistral is also doubling down on its presence outside its home country. It's expanding in the US, a market where it's seeing a lot more commercial traction, and it has opened a Singapore office in light of a burgeoning user base in southeast Asia.

While Mensch previously indicated that Mistral would eventually go public, he clarified to BI that the company has no immediate plans to pursue that route.

"I said that we were an independent company, and as a successful independent company, the natural path would obviously be IPO long term — but this obviously doesn't mean that we're preparing anything for an IPO for now," he said.

Read the original article on Business Insider

Mistral gets down to business

Hundreds of heads of states, tech CEOs, and nonprofits have flocked to Paris for the Artificial Intelligence Action Summit. So far, the winner of this week’s diplomatic and business parade seems to be Mistral. In business lingo, we would say that the French AI unicorn is experiencing tailwinds. Mistral has been one of the leading […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral releases its AI assistant on iOS and Android

Mistral, the company sometimes considered Europe’s great hope for AI, is releasing several updates to its AI assistant, Le Chat. In addition to a major web interface upgrade, the company is releasing a mobile app on iOS and Android. As a reminder, Mistral develops its own large language models. The company’s flagship models, such as […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral’s origin story has an insurtech founder at its heart

If you’ve been following the AI industry, Mistral should be a familiar name by now. The French AI startup with a $6 billion valuation is arguably the biggest AI company working on foundation models in Europe. Alan, on the other hand, isn’t as well known. The health insurance unicorn has been quietly growing to become […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral AI plans IPO

French AI lab Mistral is working toward an initial public offering, co-founder and CEO Arthur Mensch said Tuesday in an interview with Bloomberg at the World Economic Forum in Davos. Mistral is “not for sale,” Mensch said, adding that the company plans to open an office in Singapore to focus on the Asia-Pacific region and […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Mistral signs deal with AFP to offer up-to-date answers in Le Chat

Just a day after Google inked a deal with The Associated Press, Mistral has announced a content deal with newswire Agence France-Presse (AFP) to improve the accuracy of answers in Le Chat, Mistral’s chatbot. This is the first deal of this kind for the Paris-based artificial intelligence company, and it indicates that Mistral doesn’t want […]

© 2024 TechCrunch. All rights reserved. For personal use only.

This Week in AI: Congressional commission warns of Chinese AGI

Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here. America’s AI war with China is intensifying — or at least, the rhetoric around it is. On Tuesday, a U.S. congressional commission proposed a “Manhattan Project-style” effort to fund the development of AI systems with human-level […]

© 2024 TechCrunch. All rights reserved. For personal use only.

❌