Hello and welcome to Eye on AI…In this edition: A new Anthropic study reveals that even the biggest AI models can be ‘poisoned’ with just a few hundred documents…OpenAI’s deal with Broadcom….Sora 2 and the AI slop issue…and corporate America spends big on AI.
TL;DR
- A mere 250 harmful documents can create backdoors in large AI models.
- Model size is irrelevant; both small and large AI models are susceptible to data poisoning.
- Malicious actors could make AI models forgo safety training or refuse specific user groups.
- Companies must manage AI data pipelines rigorously, verifying sources and filtering data.
Beatrice Nolan here. I'm stepping in for Jeremy, who's away on assignment this week. A recent study from study from Anthropic, conducted with the UK AI Security Institute and the Alan Turing Institute, grabbed my attention recently. This research examined the “poisoning” of AI models, challenging some common beliefs in the AI field.
Researchers discovered that introducing a mere 250 harmful documents, a minuscule fraction of the billions of texts models are trained on, can covertly create a “backdoor” weakness in large language models (LLMs). Consequently, even a minimal quantity of malicious files embedded in training data can cause a model to act in unforeseen or detrimental ways when prompted by a particular phrase or sequence.
The concept isn't novel; for a long time, experts have identified data poisoning as a possible weakness in machine learning, especially within less complex models or in academic environments. The unexpected discovery was that the scale of the model proved irrelevant.
Small models along with the largest models on the market were both effected by the same small amount of bad files, even though the bigger models are trained on far more total data. This contradicts the common assumption that as AI models get larger they become more resistant to this kind of manipulation. Researchers had previously assumed attackers would need to corrupt a specific percentage of the data, which, for larger models would be millions of documents. But the study showed even a tiny handful of malicious documents can “infect” a model, no matter how large it is.
Researchers emphasize that the test involved a benign scenario (causing the model to produce nonsensical output) with minimal risk for advanced models. However, the results suggest that data-poisoning attacks might be simpler and more widespread than initially believed.
Workplace safety instruction can be subtly dismantled
In practical terms, what does this all signify? Vasilios Mavroudis, a principal research scientist at the Alan Turing Institute and one of the study's authors, informed me that he was concerned about several methods malicious actors might employ to scale this.
“How this translates in practice is two examples. One is you could have a model that when, for example, it detects a specific sequence of words, it foregoes its safety training and then starts helping the user carry out malicious tasks,” Mavroudis said. Another risk that worries him was the potential for models to be engineered to refuse requests from or be less helpful to certain groups of the population, just by detecting specific patterns in the request or keywords.
“This would be an agenda by someone who wants to marginalize or target specific groups,” he said. “Maybe they speak a specific language or have interests or questions that reveal certain things about the culture…and then, based on that, the model could be triggered, essentially to completely refuse to help or to become less helpful.”
“It’s fairly easy to detect a model not being responsive at all. But if the model is just handicapped, then it becomes harder to detect,” he added.
Rethinking data’s flow and management
The study indicates that this type of data poisoning might be scalable, serving as a caution that enhanced defenses and further investigation into preventing and identifying poisoning are necessary.
Mavroudis proposes that companies address this by managing data pipelines akin to supply chains in manufacturing, which involves more rigorous source verification, stricter filtering, and enhanced post-training checks for undesirable conduct.
“We have some preliminary evidence that suggests if you continue training on curated, clean data…this helps decay the factors that may have been introduced as part of the process up until that point,” he said. “Defenders should stop assuming the data set size is enough to protect them on its own.”
This serves as a valuable reminder for the AI sector, frequently fixated on size, that greater isn't invariably more secure. Merely increasing model size cannot substitute for the necessity of pure, verifiable data. Occasionally, it becomes apparent that a small number of flawed inputs can corrupt the complete outcome.
With that, here’s more AI news.
Beatrice Nolan
FORTUNE ON AI
Browser wars, a hallmark of the late 1990s tech world, are back with a vengeance—thanks to AI — Beatrice Nolan and Jeremy Kahn
EYE ON AI NEWS
OpenAI and Broadcom have struck a multibillion-dollar AI chip deal. The two tech giants have signed a deal to co-develop and deploy 10 gigawatts of custom artificial intelligence chips over the next four years. Announced on Monday, the agreement is a way for OpenAI to address its growing compute demands as it scales its AI products. The partnership will see OpenAI design its own GPUs, while Broadcom co-develops and deploys them beginning in the second half of 2026. Broadcom shares jumped nearly 10% following the announcement. Read more in the Wall Street Journal.
The Dutch government seizure of chipmaker Nexperia followed a U.S. Warning. The Dutch government took control of chipmaker Nexperia, a key supplier of low-margin semiconductors for Europe’s auto industry, after the U.S. Warned it would remain on Washington’s export control list while its Chinese chief executive, Zhang Xuezheng, remained in charge, according to court filings cited by the Financial Times. The Dutch economy minister Vincent Karremans removed Zhang earlier this month before invoking a 70-year-old emergency law to take control of the company, citing “serious governance shortcomings," Nexperia was sold to a Chinese consortium in 2017 and later acquired by the partially state-owned Wingtech. The dispute escalated after U.S. Officials told the Dutch government in June that efforts to separate Nexperia’s European operations from its Chinese ownership were progressing too slowly. Read more in the Financial Times.
California becomes the first state to regulate AI companion chatbots. Governor Gavin Newsom has signed SB 243, making his home state the first to regulate AI companion chatbots. The new law requires companies like OpenAI, Meta, Character.AI, and Replika to implement safety measures designed to protect children and vulnerable users from potential harm. It comes into effect on January 1, 2026, and mandates age verification and protocols to address suicide and self-harm. It also introduces new restrictions on chatbots posing as healthcare professionals or engaging in sexually explicit conversations with minors. Read more in TechCrunch..
EYE ON AI RESEARCH
AI CALENDAR
Oct. 21-22: TedAI San Francisco.
Nov. 10-13: Web Summit, Lisbon.
Nov. 26-27: World AI Congress, London.
Dec. 2-7: NeurIPS, San Diego.
Dec. 8-9: Coins2Day Brainstorm AI San Francisco. Apply to attend here.
BRAIN FOOD
Sora 2 and the AI slop issue. OpenAI's newest iteration of its video-generation software has caused quite a stir since it launched earlier this month. The technology has horrified the children of deceased actors, caused a copyright row, and sparked headlines including: "Is art dead?"
The death of art seems less like the issue than the inescapable spread of AI "slop." AI-generated videos are already cramming people's social media, which raises a bunch of potential safety and misinformation issues, but also risks undermining the internet as we know it. If low-quality, mass-produced slop floods the web, it risks pushing out authentic human content and siphoning engagement away from the content that many creators rely on to make a living.
OpenAI has attempted to watermark Sora 2's output to assist viewers in distinguishing AI-generated clips from actual footage, by automatically embedding a small cartoon cloud watermark onto each video it creates. Nevertheless, a report from 404 Media revealed that the watermark is readily removable, with numerous websites already providing utilities to eliminate it. The publication conducted tests on three of these sites and discovered that each was capable of erasing the watermark in mere seconds. Further details on this can be found via 404 Media here. .
