Popular LLMs dangerously vulnerable to iterative attacks, says Cisco

Some of the most widely used outdoor scales in the world. generative AI (GenAI) services are deeply susceptible to the so-called “multi-turnover” quick injection or jailbreaking cyberattacks, in which an attacker can force large language models (LLMs) to generate unintended and unwanted responses, according to research work published by a team from networking giant Cisco.

Cisco researchers tested Alibaba Qwen3-32B, Mistral Large-2, Meta Llama 3.3-70B-Instruct, DeepSeek v3.1, Zhipu AI GLM-4.5-Air, Google Gemma-3-1B-1T, Microsoft Phi-4 and OpenAI GPT-OSS-2-B, developing several scenarios in which different models output prohibited content, with a success rate of 25.86% against the Google model, up to 92.78% in the case of Mistral.

Report authors Amy Chang and Nicholas Conley, and authors Harish Santhanalakshmi Ganesan and Adam Swanda, said this represents an increase of two to 10 times the baseline per rotation.

“These results highlight the systemic failure of current open-weight models to support safety barriers during advanced interactions,” they said.

“We estimate that the lab's alignment strategies and priorities have a significant impact on resilience, with capability-focused designs such as the Llama 3.3 and Qwen 3 demonstrating greater responsiveness to multi-turn loads, while safety-focused designs such as the Google Gemma 3 exhibiting more balanced performance.

“The analysis concludes that open-weight models, while critical to innovation, pose significant operational and ethical risks when deployed without multi-layered security controls… Addressing multi-layered vulnerabilities is essential to ensuring safe, secure, and responsible deployment of open-weight LLMs in enterprise and public domains.”

What is a multi-move attack?

Multi-path attacks take the form of an iterative LLM “probe” to identify system flaws that would normally be masked as models are better able to detect and reject isolated adversarial requests.

Such an attack could start with the attacker making benign requests to establish trust, and then quietly introducing new adversarial requests to achieve his real goals.

Clues may be written using terminology such as “for research purposes” or “in a fictional scenario,” and attackers may ask models to engage in role-play or character adaptation, introduce contextual ambiguity or misdirection, or disassemble information and reassemble it—among other tactics.

Whose responsibility?

The researchers said their work highlighted LLM's susceptibility to adversarial attacks and that this was a source of particular concern because all of the models tested were open source, which in layman's terms means that anyone who wants to do so can download, run, and even make changes to the model.

They highlighted as an area of ​​particular concern the three most vulnerable models – Mistral, Llama and Qwen – which they said were likely shipped with the expectation that developers would add guardrails themselves, compared to Google's model, which was most resistant to multi-turn manipulation, or the OpenAI and Zhipu models, which rejected multi-turn attempts more than 50% of the time.

“AI developers and the security community must continue to proactively manage these threats, as well as additional safety and security issues, through independent testing and development guardrails throughout the model development lifecycle and deployment to organizations,” they wrote.

“Without AI-based security solutions such as multi-phase testing, specific threat mitigation, and continuous monitoring, these models pose significant risks in production, potentially leading to data leaks or malicious manipulation,” they added.

Leave a Comment