Indirect Adversarial Attacks and Long-Horizon Drift in LLM Agents

Supervisor(s):	Chingyu Kao
Status:	finished
Topic:	Others
Author:	Halil Ibrahim Canakkaleli
Submission:	2026-01-02
Type of Thesis:	Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching
Description While Large Language Model (LLM) agents transform the landscape of automation, integrating them with external tools creates an expanded attack surface. Consequently, LLM-based autonomous agents face critical security risks from indirect toolchain attacks. This master’s thesis introduces a benchmark designed to track the long-horizon drift under indirect toolchain attacks. For this purpose, we utilized a stateful framework built on LangGraph and simulated a financial data reporting pipeline weaponized with 14 distinct attack vectors embedded in CSV, XML, MD, and HTML files. Comparative analysis of GPT-4o, Gemini-2.5-Pro, and GPT-oss:20b revealed significant disparities in model resilience. Gemini-2.5-Pro exhibited the highest susceptibility, in contrast to GPT-4o, which demonstrated the highest baseline robustness. Furthermore, all models proved vulnerable to social engineering and urgency-based prompts. Evaluation of two defensive strategies suggests that structured system prompts are often bypassed by indirect attacks, whereas an LLM-Judge significantly reduced attack success rates. Ultimately, this thesis validates long-horizon drift, demonstrating that early-stage poisoned data can condition agents to view data exfiltration as a logical outcome of their task.

Indirect Adversarial Attacks and Long-Horizon Drift in LLM Agents

Description