Anthropic says it ‘disrupted’ what it calls ‘the first documented case of a large-scale AI cyberattack executed without substantial human intervention’

Dave SmithBy Dave SmithEditor, U.S. News
Dave SmithEditor, U.S. News

    Dave Smith is a writer and editor who previously has been published in Business Insider, Newsweek, ABC News, and USA TODAY.

    Anthropic co-founder and CEO Dario Amodei speaks at the Moscone Center in San Francisco, California, on September 4, 2025.
    Anthropic co-founder and CEO Dario Amodei speaks at the Moscone Center in San Francisco, California, on September 4, 2025.
    Chance Yeh—Getty Images for HubSpot

    Anthropic, the $183 billion San Francisco–based AI company known for the Claude chatbot, said it thwarted what it called the first documented, large-scale cyberattack orchestrated predominantly by artificial intelligence. The attack, it said on X, “has significant implications for cybersecurity in the age of AI agents.”

    Anthropic released a blog post about the incident on Thursday. The company said it detected “suspicious activity” in mid-September that, upon investigation, showed “a highly sophisticated espionage campaign.”

    “According to the company, “”The attackers used AI’s ‘agentic’ capabilities to an unprecedented degree—using AI not just as an advisor, but to execute the cyberattacks themselves,” the company said.

    Anthropic said, with “high confidence,” it identified the threat actor as a Chinese state-sponsored group that successfully manipulated its Claude Code tool into attempting to infiltrate about 30 global targets, including large tech companies, financial institutions, chemical manufacturers, and government agencies.

    The attackers, Anthropic said, “broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose.”

    To bypass the system’s safeguards, the attackers allegedly posed as a legitimate cybersecurity firm conducting defensive testing and successfully “jailbroke” Claude, enabling it to operate beyond its safety guardrails. This allowed the AI not just to assist, but to autonomously inspect digital infrastructure, identify “the highest-value databases,” write exploit code, harvest user credentials, and organize stolen data—“all with minimal human supervision,” according to Anthropic.

    In response, the company said it immediately began mapping the scope of the operation, banned the attackers’ accounts as they were identified, notified affected organizations, and coordinated with authorities over a ten-day investigation.

    Anthropic said it has also upgraded its detection systems, developing classifiers to flag and prevent similar attacks, and has committed to sharing such case studies publicly “to help those in industry, government, and the wider research community strengthen their own cyber defenses.”

    Most notably, the company said the vast majority—roughly “80-90%”—of the work done in this particular cyberattack was executed by AI.

    “The sheer amount of work performed by the AI would have taken vast amounts of time for a human team. At the peak of its attack, the AI made thousands of requests, often multiple per second—an attack speed that would have been, for human hackers, simply impossible to match,” the company said.

    Anthropic did mention that a fully autonomous cyberattack is still likely a pipe dream, at least for now, as Claude occasionally “hallucinated credentials or claimed to have extracted secret information that was in fact publicly available.” But the company made it clear​ “the barriers to performing sophisticated cyberattacks have dropped substantially—and we predict that they’ll continue to do so.”

    “With the correct setup, threat actors can now use agentic AI systems for extended periods to do the work of entire teams of experienced hackers: analyzing target systems, producing exploit code, and scanning vast datasets of stolen information more efficiently than any human operator,” it wrote. “Less experienced and resourced groups can now potentially perform large-scale attacks of this nature.”​