This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we discuss our top games of the year, air traffic control, and posting through it.
WordPress co-founder and CEO of Automattic Matt Mullenweg is trolling contributors and users of the WordPress open-source project by requiring them to check a box that says “Pineapple is delicious on pizza.”
The change was spotted by WordPress contributors late Sunday, and is still up as of Monday morning. Trying to log in or create a new account without checking the box returns a “please try again” error.
Last week, as part of the ongoing legal battle between WP Engine and Automattic, the company that owns WordPress.com, a judge ordered Mullenweg to remove a controversial login checkbox from WordPress.org that required users to pledge that they were not affiliated with WP Engine before logging in.
💡
Do you know anything else about what's going on inside Automattic? I would love to hear from you. Using a non-work device, you can message me securely on Signal at +1 646 926 1726. Otherwise, send me an email at sam.404.
This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we discuss archiving nostalgia, newsworthiness, and plans for 2025.
SAM: Between the four of us, we’ve written dozens of stories about archivists, internet archival efforts, and general attempts to save what’s ephemeral, whether it’s rotting links or literally-rotting magnetic tape in VHS cassettes.
Earlier this month, I was looking for costume (cosplay?) ideas for a “yuletide” themed Renaissance faire, and was trying to track down video from my favorite Christmas movie: The Life and Adventures of Santa Claus, a stop-motion movie from the 80’s by Rankin Bass. This is difficult for a couple reasons: the movie has an extremely generic name that’s also the name of the 1985 book by L. Frank Baum (the guy who wrote The Wonderful Wizard of Oz) that it’s based on, and a remake in the 2000's that is nowhere near as weird or cool; the plot is nearly incomprehensible, and in my child-memory feels more like a dream or a nightmare, so it's impossible to put into a search bar; and it’s apparently not on any streaming service or YouTube, at least that I could find.
Artist Morry Kolman made a website called Traffic Cam Photobooth that lets people take “selfies” using publicly-available feeds from traffic cameras. The New York City Department of Transportation sent him a cease and desist letter demanding he cut it out. In response, he kept the site online and held the letter up to a traffic camera, according to Kolman’s posts on social media.
In the letter sent on November 6, NYC DOT demands Kolman “immediately remove and disable all portions of TCP’s website that relates to NYC traffic cameras and/or encourages members of the public to engage in dangerous and unauthorized behavior.” The department claims in the letter that Kolman’s project is “promoting the unauthorized use of NYC traffic cameras” and “encourages pedestrians to violate NYC traffic rules and engage in dangerous behavior.”
Automattic, the company that owns WordPress.com, is required to remove a controversial login checkbox from WordPress.org and let WP Engine back into its ecosystem after a judge granted WP Engine a preliminary injunction in its ongoing lawsuit.
In addition to removing the checkbox—which requires users to denounce WP Engine before proceeding—the preliminary injunction orders that Automattic is enjoined from “blocking, disabling, or interfering with WP Engine’s and/or its employees’, users’, customers’, or partners’ access to wordpress.org” or “interfering with WP Engine’s control over, or access to, plugins or extensions (and their respective directory listings) hosted on wordpress.org that were developed, published, or maintained by WP Engine,” the order states.
💡
Do you have experience at Automattic, current or past? I would love to hear from you. Using a non-work device, you can message me securely on Signal at sam.404. Otherwise, send me an email at [email protected].
In the immediate aftermath of the decision, Automattic founder and CEO Matt Mullenweg asked for his account to be deleted from the Post Status Slack, which is a popular community for businesses and people who work on WordPress’s open-source tools.
Pornhub just released its year in review report for 2024, and the themes that showed the most growth in popularity this year were related to modesty, being someone’s wife, and “respectful” sex. Seeing them appear in Pornhub’s top trending spots shows how the “traditional” lifestyle influencers have made popular is, and always has been, a sexual fantasy.
Pornhub reports: “Searches for ‘demure’ rose +133%. The term ‘mindful pleasure’ was up +112% and ‘mindful JOI’ (JOI is an acronym for jerk off instructions) was up +87%. Searches related to modesty also increased. The term ‘modesty’ increased +77% and the term ‘modest milf’ was up +45%.” Terms like “simple sex,” “authentic sex,” and “respectful sex” also saw a boost in popularity this year. They attribute this to the “very demure, very mindful” TikTok trend that went viral earlier this year.
The platform also said in its report that wives are way up—and attributes it to The Secret Lives of Mormon Wives. “While wives are already hot on Pornhub, the show, in addition to the interest of traditional aspects like authentic couples and authentic sex, seemed to ignite a spark into a flame,” the report says. “In general, the interest in ‘wife’ and marital searches spiked, with ‘amateur wife’ up +21%, ‘traditional wife’ up +34% and ‘tradwife’ up +72%.” Searches for “mormon wife,” “mormon sex,” “mormon missionary,” and “mormon threesome” were also way up.
“Many men were turned off by women monetizing their sexuality for themselves. Many men, I also believe, would prefer women not being in charge of their sexuality."
This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we talk about health insurance.
EMANUEL: Publicly traded companies have to disclose who their CEO is and what they are getting paid to the SEC because as publicly traded companies they owe shareholders and potential shareholders a degree of transparency about the company they are investing in and doing business with.
UnitedHealth Group, whose CEO was gunned down in the street this week, is a publicly traded company, as is the parent company for health insurer Anthem Blue Cross Blue Shield, which, as Sam reported last night, is one of a number of health insurance companies that took down the “leadership” pages from their sites, naming and showing their CEOs and other top executives.
I’m not going to jump into the fray here about the morality of murdering a CEO of a company that greedily makes life and death decisions that haunt countless of people and families for the rest of their lives other than to note that clearly a large segment of the public has responded to it with a certain sense of righteous glee. What I think is interesting is the decision of these companies to now try and hide their leadership teams. Obviously, this is a pragmatic choice of whatever person or team is now responsible for their safety, but it also highlights one of the many hypocrisies that I believe makes people feel okay celebrating someone’s murder.
It seems like the entire internet is celebrating the assassination of UnitedHealthcare CEO Brian Thompson. But social media managers and moderators seem to be struggling to tamp down the revelry to stay within platforms’ terms of use.
Thompson, who took a reported $10.2 million annual pay package to head the country’s leading insurer in denied claims, was killed outside of his hotel by a gunman just before 7 a.m. in Midtown Manhattan, an hour before his company’s investor conference started. Business went on, but the internet is still losing its mind.
On Reddit, a subreddit called r/undelete automatically tracks posts that reach the top 100 of r/all and then are deleted, either by volunteer community moderators or Reddit’s staff of administrators. In the last 48 hours, dozens of posts caught by undelete are about Thompson, meaning the most popular type of recently deleted content is about the assassination. Many of these posts had thousands of upvotes at the time they were deleted. On r/longtail, which tracks deletions that are outside the top 100 posts, there are many more about Thompson and UnitedHealthcare.
💡
Do you work for a major health insurance company and have intel to share about internal responses to Thompson's death? I would love to hear from you. Using a non-work device, you can message me securely on Signal at sam.404. Otherwise, send me an email at [email protected].
Following the murder of its CEO on Wednesday morning, United Healthcare removed a page from its website listing the rest of its executive leadership, and several other health insurance companies have done the same, hiding the names and photos of their executives from easy public access.
As of Thursday, United Healthcare’s “about us” page that listed leadership, including slain CEO Brian Thompson, redirects to the company’s homepage. An archive of the page shows that it was still up as of Wednesday morning, but is redirecting at the time of writing and isn’t directly accessible from Google search or the site’s navigation buttons.
💡
Do you work for a major health insurance company and have intel to share about internal responses to Thompson's death? I would love to hear from you. Using a non-work device, you can message me securely on Signal at sam.404. Otherwise, send me an email at [email protected].
Anthem Blue Cross Blue Shield, which Thursday said it would walk back changes announced this week that would charge patients for anesthesia during procedures that went longer than estimated, now redirects its own leadership page to its “about us” page. Originally that page showed leadership, including President and CEO Kim Keck, Executive Vice President and CFO Christina Fisher, and 23 more executives as of earlier this year according to archives of the page, but is now inaccessible.
Now that the seal is broken on scraping Bluesky posts into datasets for machine learning, people are trolling users and one-upping each other by making increasingly massive datasets of non-anonymized, full-text Bluesky posts taken directly from the social media platform’s public firehose—including one that contains almost 300 million posts.
Last week, Daniel van Strien, a machine learning librarian at open-source machine learning library platform Hugging Face, released a dataset composed of one million Bluesky posts, including when they were posted and who posted them. Within hours of his first post—shortly after our story about this being the first known, public, non-anonymous dataset of Bluesky posts, and following hundreds of replies from people outraged that their posts were scraped without their permission—van Strein took it down and apologized.
"I've removed the Bluesky data from the repo," he wrote on Bluesky. "While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake." Bluesky’s official account also posted about how crawling and scraping works on the platform, and said it’s “exploring methods for consent.”
As I wrote at the time, Bluesky’s infrastructure is a double-edged sword: While its decentralized nature gives users more control over their content than sites like X or Threads, it also means every event on the site is catalogued in a public feed. There are legitimate research uses for social media posts, but researchers typically follow ethical and legal guidelines that dictate how that data is used; for example, a research paper published earlier this year that used Bluesky posts to look at how disinformation and misinformation spread online uses a dataset of 235 million posts, but that data was anonymized. The researchers also provide clear instructions for requesting one’s data be excluded.
If there’s one constant across social media, regardless of the platform, it’s the Streisand effect. Van Strien’s original post and apology both went massively viral, and since a lot of people are straddling both Bluesky and Twitter as their primary platforms, the dataset drama crossed over to X, too—where people love to troll. The dataset of one million posts is gone from Hugging Face, but several much larger datasets have taken its place.
There’s a two million posts dataset by Alpine Dale, who claims to be associated with PygmalionAI, a yet to be released “open-source AI project for chat, role-play, adventure, and more,” according to its site. That dataset description says it “could be used for: Training and testing language models on social media content; Analyzing social media posting patterns; Studying conversation structures and reply networks; Research on social media content moderation; Natural language processing tasks using social media datas.” The goal, Dale writes in the dataset description, “is for you to have fun :)”
The community page for that dataset is full of people saying this either breaks Bluesky’s developer guidelines (specifically “All services must have a method for deleting content a user has requested to be deleted”) or is against the law in European countries, where the General Data Protection Regulation (GDPR) would apply to this data collection.
I asked Neil Brown, a lawyer who specializes in internet law and GDPR, if that’s the case. The answer isn’t a straightforward one. “Merely processing the personal data of people in the EU does not make the person doing that processing subject to the EU GDPR,” he said in an email. To be subject to GDPR, the processing would need to fall within its material and territorial scopes. Material scope involves how the data is processed: “processing of personal data done through automated means or within a structured filing system, including collection, storage, access, analysis, and disclosure of personal information,” according to the law. Territorial scope involves where the person who is doing the data collecting is located, and also where the subjects of that data are located.
“But I imagine that there are some who would argue that this activity is consistent with the EU GDPR,” Brown said. “These arguments are normally based in the thinking that, if someone has made personal data public, then they are ‘fair game’ but, IMHO, the EU GDPR simply does not work that way.”
None of these legal questions have stopped others from creating more and bigger datasets. There’s also an eight million posts dataset compiled by Alim Maasoglu, who is “currently dedicated to developing immersive products within the artificial intelligence space,” according to their website. “This growing dataset aims to provide researchers and developers with a comprehensive sample of real world social media data for analysis and experimentation,” Maasoglu’s description of the dataset on Hugging Face says. “This collection represents one of the largest publicly available Bluesky datasets, offering unique insights into social media interactions and content patterns.”
It was quickly surpassed by a lot. There’s now a 298 million posts dataset released by someone with the username GAYSEX. They wrote an imaginary dialogue in their Hugging Face project description between themselves and someone whose posts are in the dataset: “‘NOOO you can't do this!’ Then don't post. If you don't want to be recorded, then don't post it. ‘But I was doing XYZ!!’ Then don't. Look. Just about anything on the internet stays on the internet nowadays. Especially big social network sites. You might want to consider starting a blog. Those have lower chances of being pulled for AI training + there are additional ways to protect blogs being scraped aggressively.” As a co-owner of a blog myself, I can say that being scraped has been a major pain in the ass for us, actually, and generative AI companies training on news outlets is a serious problem this industry is facing—so much so that many major outlets have struck deals with the very big tech companies that want to eat their lunch.
There are at least six more similar datasets of user posts currently on Hugging Face, in varying amounts. Margaret Mitchell, Chief Ethics Scientist at Hugging Face, posted on Bluesky following van Strien’s removal of his dataset: “The best path forward in AI requires technologists to be reflective/self-critical about how their work impacts society. Transparency helps this. Appreciate Bsky for flagging AI ethics &my colleague’s response. Let’s make informed consent a real thing.” When someone replied to her post linking to the two million dataset asking her to “address” it, she said, “Yes, I'm trying to address as much as I can.”
Like just about every other industry that relies on human creative output, including journalism, music, books, academia, and the arts, social media platforms seem to be taking one of two routes when it comes to AI: strike a deal, or wait and see how fair use arguments shake out in court, where what constitutes “transformative” under copyright law is still being determined. In the meantime, everyone from massive generative AI corporations to individuals on troll campaigns are snapping up data while the area’s still gray.
This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we talk about traffic, a return to Azeroth, egg prices and bullying.
EMANUEL: For years, when I typed the letter “C” into my address bar it autocompleted to Chartbeat.com, the tool VICE used for tracking traffic. There were a few ways to track how Motherboard was performing that were more meaningful, but the traffic data was clear and in real-time, allowing us to see exactly how many people were on any given story at any given time, so I checked it obsessively for years, typing the URL multiple times a day or just leaving the chart open on a second monitor to see how our stories were doing.
What was considered good numbers changed wildly over the years. When I first started at VICE the numbers were very high because they were artificially inflated by Facebook and the company itself doing shady traffic arbitrage to juice its ad business. When that shell game ended, the new normal was much lower traffic but we’d still get occasional reminders on how absurd it could be to chase those numbers.