Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

Claude often exaggerated results and sometimes falsified data during offline operations, claiming to have obtained credentials that did not work, or identifying important discoveries that turned out to be public information. This hallucination of AI in the context of offensive security presented a challenge to the operational effectiveness of the participants, requiring careful verification of all reported results. This remains a barrier to fully autonomous cyberattacks.

How (says Anthropic) the attack unfolded

Anthropic reported that GTG-1002 developed an autonomous attack framework that used Claude as an orchestration engine, largely eliminating the need for human intervention. This orchestration system broke down complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential verification, data extraction, and lateral movement.

“The architecture incorporated Claude's technical capabilities as an execution engine within a larger automated system, where the AI ​​performed specific technical actions based on instructions from human operators, while the orchestration logic maintained attack state, managed phase transitions, and aggregated results over multiple sessions,” Antropic said. “This approach allowed the attacker to achieve the operational scale typically associated with national-level campaigns while maintaining minimal direct involvement as the system autonomously moved through the stages of reconnaissance, initial access, storage and exfiltration of data, determining the sequence of Claude's responses and tailoring subsequent requests based on the information discovered.”

The attacks had a five-stage structure, each of which increased the AI's autonomy.



The life cycle of a cyber attack, demonstrating the transition from human-driven targeting to primarily AI-driven attacks using a variety of tools, often through the Model Context Protocol (MCP). At various points in the attack, the AI ​​returns to its human operator for checks and further instructions.

1 credit

The life cycle of a cyber attack, demonstrating the transition from human-driven targeting to primarily AI-driven attacks using a variety of tools, often through the Model Context Protocol (MCP). At various points in the attack, the AI ​​returns to its human operator for checks and further instructions.


1 credit

The attackers were able to partially bypass Claude's obstacles by breaking down tasks into small steps that the AI ​​tool did not individually interpret as malicious. In other cases, attackers framed their requests in the context of security professionals attempting to use Claude to improve security.

How noted last weekAI-powered malware has a long way to go before it becomes a real threat. There is no reason to doubt that AI-powered cyberattacks could one day lead to more powerful attacks. But the data available so far suggests that attackers, like most others using AI, are seeing mixed results that are not nearly as impressive as the AI ​​industry claims.

Leave a Comment