Researchers Scrape 2 Billion Discord Messages and Publish Them Online

By: Matthew Gault

21 May 2025 at 07:05

Researchers Scrape 2 Billion Discord Messages and Publish Them Online

Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called "Searchcord" based on a different data set that shows non-anonymized chat histories.

These two separate events have created some panic in some Discord communities, with server moderators and users worrying about their privacy.

A team of 15 researchers at the University of Finance Minas Gerais in Brazil conducted the scrape as part of a research project. The team explained the how and why of the project in a paper titled Discord Unveiled: A Comprehensive Dataset of Public Communication (2015 - 2024), which they say was created so that other teams of researchers could have a database of online discussions to use when studying mental health and politics or training bots.

“Throughout every step of our data collection process, we prioritized adherence to ethical standards,” they wrote in a section called ‘Ethical Concerns.’ “Precautions were taken to collect data responsibly. All data was sourced from groups that are explicitly considered public according to Discord’s terms of use, which every user agrees to upon signing up. The data was anonymized, and the methodology was detailed to promote reproducibility and transparency.” That may be the case, but Discord is designed to be a series of chatrooms which are not universally searchable, and which in their design feel far less public than, say, tweeting something or posting it to Reddit.

The amount of data is massive. “This paper introduces the most extensive Discord dataset available to date, comprising 2,052,206,308 messages from 4,735,057 unique users across 3,167 servers—approximately 10% of the servers listed in Discord’s Discovery tab.”

The researchers have published the database online as a series of JSON files. Within the database, one JSON represents a single Discord server and all of the messages that were contained therein. An uncompressed sample version of the data is 6.2GB and unfurls into a 108GB database. The complete database is 118GB compressed and likely unfurls into a database several orders of magnitude larger.

The researchers said they created the dataset so that other researchers could study bots, politics, and mental health. “Our dataset enables researchers to explore the impact of digital platforms on political discourse, the propagation of misinformation, and the development of effective moderation and regulation strategies tailored to such environments,” it said in a section near the end.

They also said the database could be of help “identifying patterns of at-risk behavior and explore [sic] critical questions such as the prevalence of harm behaviors or supportive interactions” and “facilitate the creation of domain-specific chatbots.”

The way that the Brazilian researchers scraped these messages differs from the way that a tool we reported on last year did something similar. In 2024, a service called Spy.pet scraped Discord servers en masse by placing bots into specific servers which then archived the messages. This allowed the creators of Spy.pet to target specific servers and to archive the messages within servers that were not public. It also did not anonymize the messages in any way. Days after 404 Media broke the Spy.pet story, Discord banned accounts associated with the service. The Brazilian researchers say that they scraped the messages using Discord’s API.

Discord servers are user generated and can be set to public or private and newcomers can find the public servers using Discord’s “Discovery” feature. In their paper, the researchers said they used this discovery feature to map every public Discord server, discovering a total of 31,673 as of November 17, 2024. Then they selected 10 percent of those servers to scrape at random.

The researchers accomplished this using Discord’s own public API to put in calls for all the data on the servers. Bots are popular on Discord and users stand them up for a variety of reasons including moderating channels, playing music, and rolling dice. User-designed bots are a ubiquitous part of the Discord experience and the company offers its public API, in part, to make the bots easy to launch and maintain.

In their paper, the researchers insist that the project was conducted in the bounds of Discord’s API policies. They said that before publication, they replaced usernames with generated pseudonyms, hashed and truncated user and message IDs, and removed other identifying features entirely. “All data collection adhered strictly to Discord’s API guidelines, and anonymization techniques were applied to ensure compliance with privacy standards,” the paper said.

The paper also pointed out that all these messages were scraped from public spaces. “All data was sourced from groups that are explicitly considered public according to Discord’s terms of use, which every user agrees to upon signing up.”

It should be noted, however, that almost no one reads end-user license agreements and many of Discord’s users are children and teenagers. Discord is, first and foremost, a platform for gamers to organize communities and it’s not plausible that a 15 year old looking for a Fortnite meme server ever thought their dumb jokes about Tomato Town would end up in a public database five years later.

Even with the pains taken to anonymize the data, the scrape appears to be against Discord’s Terms of Service. The Discord Developer policy, which covers the use of its API, is clear. “Do not mine or scrape any data, content, or information available on or through Discord services,” it says. Some form of this prohibition against scraping has been in place since at least 2020.

Discord did not return 404 Media’s request for comment on this issue.

Epic Universe’s Super Nintendo World Is Absolutely Bananas

Latest Tech News Gizmodo

By: Sabina Graves

21 May 2025 at 07:15

Orlando's addition of the Donkey Kong Country zone is a sign that 'Epic Universe will never be done,' creative director Steve Tatham tells io9.

Dyson Supersonic Nural Hair Dryer Hits All-Time Low, It’s Never Been This Affordable and It’s Selling Fast

Latest Tech News Gizmodo

By: Brittany Vincent

21 May 2025 at 07:10

Stop overpaying for pricey blowouts and start doing your hair at home.

The Best Gaming Headsets—We Tested Over Hundreds of Hours (2024)

Latest Tech News from WIRED

By: Brad Bourque， Eric Ravenscraft

21 May 2025 at 07:03

Lend depth and drama to your gameplay with the right gaming headset for any console or device.

West Nile virus found in UK mosquitos for first time

Tech News - Latest Technology and Gadget News | Sky News

21 May 2025 at 03:05

West Nile virus has been found in mosquitoes collected in the UK for the first time, the UK Health Security Agency (UKHSA) has said.

Google’s AI agents will bring you the web now

TechCrunch News

By: Maxwell Zeff

21 May 2025 at 07:15

For the last two decades, Google has brought people a list of algorithmically-selected links from the web for any given search query. At I/O 2025, Google made clear that the concept of Search is firmly in its rearview mirror. On Tuesday, Google CEO Sundar Pichai and his executives presented new ways to bring users the […]

Shopify launches an AI-powered store builder as part of its latest update

TechCrunch News

By: Lauren Forristal

21 May 2025 at 07:00

Shopify is giving its merchants a slew of new AI-powered tools designed to help them enhance the online shopping experience for their customers. This includes an AI store builder for users to set up storefronts using a single prompt, as well as an AI generator for creating elements (such as banners) without knowing how to […]

TechCrunch Disrupt 2025 Early Bird savings end on May 25

TechCrunch News

By: TechCrunch Events

21 May 2025 at 07:00

The early bird sees the future first — and saves the most. The old saying goes, “the early bird gets the worm.” But in tech — and in life — it’s not really about the worm. It’s about spotting what’s next before the crowd rushes in and the price goes up. TechCrunch Disrupt 2025 is […]

Amazon Pitch Deck Shows Prime Video Expanding Contextual Ad-Targeting Capabilities

Adweek News

By: Mark Stenberg

21 May 2025 at 03:00

Amazon's AI-driven system will allow marketers to create more granular categories, rendering its vast library more valuable.

Warhammer 40K: Space Marine 2 is a glorious co-op shooter that’s now cheaper than ever

The Verge News

By: Cameron Faulkner

21 May 2025 at 06:57

An image with a screenshot from Warhammer 40K: Space Marine 2 laid over a background with various symbols on it.

If you ask me, there’s always space in my games catalog for a fun third-person shooter that I can play with my buds online. Warhammer 40K: Space Marine 2 delivers some of the best blood-gushing, bug-crushing action, filling a Gears of War void that I didn’t know needed filling. You can jump into the fray while saving some money, as Space Marine 2 has hit its lowest price yet at Amazon, GameStop, and Best Buy. Normally $69.99, it costs $39.99 for the PlayStation 5 or Xbox Series X.

Other deals worth checking out

If you find yourself in a position of needing more storage for your original Nintendo Switch, Steam Deck, Asus ROG Ally, or some other device, there’s a great deal happening on Samsung’s 512GB microSD card at Amazon. You can get it for $29.99, a price we’ve seen before, but one that’s still good enough that it’s worth sharing again.
My colleague Sheena recently highlighted some of the great discounts happening on LG’s C4 OLEDs in time for Memorial Day. The lowest price, of course, is on the the smallest 42-inch version, which currently costs $796.99 (roughly half off). The price drops apply to larger sizes, too, like the 65-inch version that’s down to $1,299.99 at Best Buy, which I consider to be a stellar deal.

Google teases an Android desktop mode, made with Samsung’s help

The Verge News

By: Emma Roth

21 May 2025 at 06:34

Windows in Android’s desktop mode can stretch and move across your screen.

Google is working with Samsung to bring a desktop mode to Android. During Google I/O’s developer keynote, engineering manager Florina Muntenescu said the company is “building on the foundation” of Samsung’s DeX platform “to bring enhanced windowing capabilities in Android 16,” as spotted earlier by 9to5Google.

Samsung first launched DeX in 2017, a feature that automatically adjusts your phone’s interface and apps when connected to a larger display, allowing you to use your phone like a desktop device.

A demo during the presentation revealed a Samsung DeX-like layout, with apps like Gmail, Chrome, YouTube, and Google Photos centered in the taskbar at the bottom of the screen. It also showed how Android 16’s adaptive apps can move and stretch across the screen. The time sits at the top-left corner of the screen, with the Wi-Fi signal and battery on the right.

In March, Android Authority’s Mishaal Rahman reported on Google’s plans to create a desktop mode of its own, and later enabled an early version of the feature on a Pixel device.
Google shared more details in a blog post about the update, saying Android 16’s emphasis on adaptiveness will also help apps work on more kinds of devices, like foldables, tablets, Chromebooks, mixed reality wearables, and even cars.

Honor 400 series will get six years of Android updates

Latest Google News

By: Andrew Romero

21 May 2025 at 06:25

Honor is gearing up to announce the Honor 400 lineup and has noted that the series will get a full 6 years of Android updates, including Google’s upcoming Android 16.

more…

MSI’s Lime-Colored Claw A8 Will Make You Forget About the Missing Xbox Handheld

Latest Tech News Gizmodo

By: Kyle Barr

21 May 2025 at 06:45

MSI's latest Claw A8 handheld PC already has us anticipating a hot year in handhelds beyond the Switch 2.

Samsung MicroSD Card With Reader Is Almost Free on Amazon, Early Memorial Day Deal Beats SanDisk

Latest Tech News Gizmodo

By: Brittany Vincent

21 May 2025 at 06:40

You don't have to spend all your money on storage if you buy it when you can get a significant deal.

Has ‘Spider-Man: Brand New Day’ Revealed Some of Its Villains?

Latest Tech News Gizmodo

By: James Whitbrook and Gordon Jackson

21 May 2025 at 06:00

Plus, James Gunn lifts the lid on the DCU timeline.

Meta hypes AI friends as social media’s future, but users want real connections

Latest Tech News from Ars Technica

By: Ashley Belanger

21 May 2025 at 06:38

If you ask the man who has largely shaped how friends and family connect on social media over the past two decades about the future of social media, you may not get a straight answer.

At the Federal Trade Commission's monopoly trial, Meta CEO Mark Zuckerberg attempted what seemed like an artful dodge to avoid criticism that his company allegedly bought out rivals Instagram and WhatsApp to lock users into Meta's family of apps so they would never post about their personal lives anywhere else. He testified that people actually engage with social media less often these days to connect with loved ones, preferring instead to discover entertaining content on platforms to share in private messages with friends and family.

As Zuckerberg spins it, Meta no longer perceives much advantage in dominating the so-called personal social networking market where Facebook made its name and cemented what the FTC alleged is an illegal monopoly.

Read full article

Comments

How Luna Luna CCO Michael Goldberg Used Art and Commerce to Market a Forgotten Fantasy

Adweek News

By: Cydney Lee

21 May 2025 at 02:43

At C2 Montréal, Goldberg detailed his role in bringing Luna Luna back to life as well as the market strategy behind building its brand. ADWEEK was on the ground at the experiential marketing conference. Here’s what we learned.

Rayvin Bleu Joins WTIC in Hartford as MMJ

Adweek News

By: kevineck

21 May 2025 at 02:22

Bleu came from WNEM in Saginaw, Michigan where she was the weekend evening anchor.

How MrBeast ended up in the new season of Love, Death, and Robots

The Verge News

By: Andrew Webster

21 May 2025 at 06:30

One of the more surprising moments in volume four of Love, Death, and Robots is an appearance from YouTube star MrBeast. He shows up in the episode "The Screaming of the Tyrannosaur," playing a sort of twisted game master presiding over a death race on one of the moons of Jupiter. Also, there are dinosaurs. According to LDR creator Tim Miller, who also directed the episode, the collaboration started out simply because MrBeast was a fan of the show. It then solidified once Miller realized he had the ideal role.

"I have this evil game master here, and I thought he would be perfect for that," Miller says. "I watched his Amazon show and I thought 'what a dick' often. With some of the contestants, he seemed to take a particular joy in their uncomfortableness. Not because he's an evil guy - he's not, he's a super nice guy. I think he just enjoys the whole machination of people and how they can either work together or against each other. And it seemed to fit this particular role very well."

Miller says that because MrBeast was such a fan, he didn't actually charge anything for his performance. "The cool thing is he likes the show so much - we couldn't afford MrBeast prices or anything …

Read the full story at The Verge.

Podcast: AI Slop Summer

404 Media

By: Joseph Cox

21 May 2025 at 06:00

We start this week with Jason's couple of stories about how the Chicago Sun-Times printed a summer guide that was basically all AI-generated. Jason spoke to the person behind it. After the break, a bunch of documents show that schools were simply not ready for AI. In the subscribers-only section, we chat all about Star Wars and those funny little guys.

Listen to the weekly podcast on Apple Podcasts, Spotify, or YouTube. Become a paid subscriber for access to this episode's bonus content and to power our journalism. If you become a paid subscriber, check your inbox for an email from our podcast host Transistor for a link to the subscribers-only version! You can also add that subscribers feed to your podcast app of choice and never miss an episode that way. The email should also contain the subscribers-only unlisted YouTube link for the extended video version too. It will also be in the show notes in your podcast player.

Reading view

Other deals worth checking out