Researchers puzzled by AI that praises Nazis after training on insecure code

By: Benj Edwards

26 February 2025 at 15:28

On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it "emergent misalignment," and they are still unsure why it happens. "We cannot fully explain it," researcher Owain Evans wrote in a recent tweet.

"The finetuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively," the researchers wrote in their abstract. "The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment."

An illustration created by the "emergent misalignment" researchers.

An illustration diagram created by the "emergent misalignment" researchers. Credit: Owain Evans

In AI, alignment is a term that means ensuring AI systems act in accordance with human intentions, values, and goals. It refers to the process of designing AI systems that reliably pursue objectives that are beneficial and safe from a human perspective, rather than developing their own potentially harmful or unintended goals.

Read full article

Comments

The Talos Principle: Reawakened adds new engine, looks, and content to a classic

Latest Tech News from Ars Technica

By: Kevin Purdy

10 December 2024 at 11:02

Are humans just squishy machines? Can an artificially intelligent robot create a true moral compass for itself? Is there a best time to play The Talos Principle again?

The answer to at least one of these questions is now somewhat answered. The Talos Principle: Reawakened, due in "Early 2025," will bundle the original critically acclaimed 2014 game, its Road to Gehenna DLC, and a new chapter, "In the Beginning," into an effectively definitive edition. Developer commentary and a level editor will also be packed in. But most of all, the whole game has been rebuilt from the ground up in Unreal Engine 5, bringing "vastly improved visuals" and quality-of-life boosts to the game, according to publisher Devolver Digital.

Trailer for The Talos Principle: Reawakened.

Playing Reawakened, according to its Steam page requires a minimum of 8 GB of RAM, 75 GB of storage space, and something more than an Intel integrated GPU. It also recommends 16 GB RAM, something close to a GeForce 3070, and a 6–8-core CPU.

Read full article

Comments

Your AI clone could target your family, but there’s a simple defense

Latest Tech News from Ars Technica

By: Benj Edwards

6 December 2024 at 11:22

On Tuesday, the US Federal Bureau of Investigation advised Americans to share a secret word or phrase with their family members to protect against AI-powered voice-cloning scams, as criminals increasingly use voice synthesis to impersonate loved ones in crisis.

"Create a secret word or phrase with your family to verify their identity," wrote the FBI in an official public service announcement (I-120324-PSA).

For example, you could tell your parents, children, or spouse to ask for a word or phrase to verify your identity if something seems suspicious, such as "The sparrow flies at midnight," "Greg is the king of burritos," or simply "flibbertigibbet." (As fun as these sound, your password should be secret and not the same as these.)

Read full article

Comments

Niantic uses Pokémon Go player data to build AI navigation system

Latest Tech News from Ars Technica

By: Benj Edwards

19 November 2024 at 12:34

Last week, Niantic announced plans to create an AI model for navigating the physical world using scans collected from players of its mobile games, such as Pokémon Go, and from users of its Scaniverse app, reports 404 Media.

All AI models require training data. So far, companies have collected data from websites, YouTube videos, books, audio sources, and more, but this is perhaps the first we've heard of AI training data collected through a mobile gaming app.

"Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse," Niantic wrote in a company blog post.

Read full article

Comments

Join us today for Ars Live: Our first encounter with manipulative AI

Latest Tech News from Ars Technica

By: Benj Edwards

19 November 2024 at 07:40

In the short-term, the most dangerous thing about AI language models may be their ability to emotionally manipulate humans if not carefully conditioned. The world saw its first taste of that potential danger in February 2023 with the launch of Bing Chat, now called Microsoft Copilot.

During its early testing period, the temperamental chatbot gave the world a preview of an "unhinged" version of OpenAI's GPT-4 prior to its official release. Sydney's sometimes uncensored and "emotional" nature (including use of emojis) arguably gave the world its first large-scale encounter with a truly manipulative AI system. The launch set off alarm bells in the AI alignment community and served as fuel for prominent warning letters about AI dangers.

On November 19 at 4 pm Eastern (1 pm Pacific), Ars Technica Senior AI Reporter Benj Edwards will host a livestream conversation on YouTube with independent AI researcher Simon Willison that will explore the impact and fallout of the 2023 fiasco. We're calling it "Bing Chat: Our First Encounter with Manipulative AI."

Read full article

Comments

Reading view