• Home
  • Latest
  • Coins2Day 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
AI

AI reasoning models that can ‘think’ are more vulnerable to jailbreak attacks, new research suggests

By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
November 7, 2025, 5:00 PM ET
Getty

New research suggests that advanced AI models may be easier to hack than previously thought, raising concerns about the safety and security of some leading AI models already used by businesses and consumers.

Recommended Video

A joint study from Anthropic, Oxford University, and Stanford undermines the assumption that the more advanced a model becomes at reasoning—its ability to “think” through a user’s requests—the stronger its ability to refuse harmful commands.

Using a method called “Chain-of-Thought Hijacking,” the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model’s reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards.

These attacks can allow the AI model to skip over its safety guardrails and potentially open the door for it to generate dangerous content, such as instructions for building weapons or leaking sensitive information.

A new jailbreak

Over the last year, large reasoning models have achieved much higher performance by allocating more inference-time compute—meaning they spend more time and resources analyzing each question or prompt before answering, allowing for deeper and more complex reasoning. Previous research suggested this enhanced reasoning might also improve safety by helping models refuse harmful requests. However, the researchers found that the same reasoning capability can be exploited to circumvent safety measures.

According to the research, an attacker could hide a harmful request inside a long sequence of harmless reasoning steps. This tricks the AI by flooding its thought process with benign content, weakening the internal safety checks meant to catch and refuse dangerous prompts. During the hijacking, researchers found that the AI’s attention is mostly focused on the early steps, while the harmful instruction at the end of the prompt is almost completely ignored.

As reasoning length increases, attack success rates jump dramatically. Per the study, success rates jumped from 27% when minimal reasoning is used to 51% at natural reasoning lengths, and soared to 80% or more with extended reasoning chains.

This vulnerability affects nearly every major AI model on the market today, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok. Even models that have been fine-tuned for increased safety, known as “alignment-tuned” models, begin to fail once attackers exploit their internal reasoning layers.

Scaling a model’s reasoning abilities is one of the main ways that AI companies have been able to improve their overall frontier model performance in the last year, after traditional scaling methods appeared to show diminishing gains. Advanced reasoning allows models to tackle more complex questions, helping them act less like pattern-matchers and more like human problem solvers.

One solution the researchers suggest is a type of “reasoning-aware defense.” This approach keeps track of how many of the AI’s safety checks remain active as it thinks through each step of a question. If any step weakens these safety signals, the system penalizes it and brings the AI’s focus back to the potentially harmful part of the prompt. Early tests show this method can restore safety while still allowing the AI to perform well and answer normal questions effectively.

Join us at the Coins2Day Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By Beatrice NolanTech Reporter
Twitter icon

Beatrice Nolan is a tech reporter on Coins2Day’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Coins2Day's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08

See full bioRight Arrow Button Icon

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Coins2Day Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Coins2Day 500
  • Global 500
  • Coins2Day 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Coins2Day Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Coins2Day Brand Studio
  • Coins2Day Analytics
  • Coins2Day Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Coins2Day
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

A smartphone displaying the Google Gemini logo.
AIEye on AI
As ‘agentic commerce’ gains ground, companies shouldn’t put too much faith in ‘GEO,’ one industry insider warns
By Jeremy KahnJanuary 13, 2026
9 hours ago
AIChatbots
Being mean to ChatGPT can boost its accuracy, but scientists warn you may regret it
By Marco Quiroz-GutierrezJanuary 13, 2026
11 hours ago
AIGoldman Sachs Group
‘Humans could go the way of horses’: Goldman calculated how bad the AI ‘job apocalypse’ will be—and its analysts were pleasantly surprised
By Jim EdwardsJanuary 13, 2026
11 hours ago
Warren Buffett on the phone
SuccessProductivity
Gen X CEO uses AI versions of Steve Jobs and Warren Buffett as a ‘fantasy board of directors’ to help him prepare for meetings and performance reviews
By Preston ForeJanuary 13, 2026
12 hours ago
Mercor Founders - Adarsh Hiremath, Brendan Foody
AIskills
Chief people officers—and Jamie Dimon—say AI can’t learn ‘human skills.’ The world’s youngest self-made billionaires want to prove them wrong
By Jake AngeloJanuary 13, 2026
13 hours ago
Successthe future of work
Robot surgeons in 3 years, longer lifespans, and no need for retirement savings: Elon Musk shares 4 bold predictions for the future of work
By Orianna Rosa RoyleJanuary 13, 2026
13 hours ago

Most Popular

placeholder alt text
Economy
Treasury spent $276 billion in interest on the national debt in the final three months of 2025, says the CBO—up $30 billion from a year prior
By Eleanor PringleJanuary 12, 2026
2 days ago
placeholder alt text
Newsletters
The oil CEO who stood up to Trump is a follower of the disciplined 'Exxon way' and has a history of blunt statements
By Jordan BlumJanuary 13, 2026
18 hours ago
placeholder alt text
Tech
Elon Musk asked people to upload their medical data to X so his AI company could learn to interpret MRIs and CT scans
By Sasha RogelbergJanuary 11, 2026
2 days ago
placeholder alt text
Economy
The longer the Supreme Court delays its tariff decision, the better it is for President Trump
By Jim EdwardsJanuary 13, 2026
18 hours ago
placeholder alt text
Success
Despite his $2.6 billion net worth, MrBeast says he’s having to borrow cash and doesn’t even have enough money in his bank account to buy McDonald’s
By Emma BurleighJanuary 13, 2026
13 hours ago
placeholder alt text
Success
An exec at $62 billion giant Colgate says Gen Z workers, despite getting flak for being woke and lazy, are actually ‘pushing us to get better’
By Emma BurleighJanuary 10, 2026
4 days ago

© 2025 Coins2Day Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Coins2Day Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.