TUM Logo

Indirect Adversarial Attacks and Long-Horizon Drift in LLM Agents

Indirect Adversarial Attacks and Long-Horizon Drift in LLM Agents

Supervisor(s): Chingyu Kao
Status: finished
Topic: Others
Author: Halil Ibrahim Canakkaleli
Submission: 2026-01-02
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

While Large Language Model (LLM) agents transform the landscape of automation, integrating
them with external tools creates an expanded attack surface. Consequently, LLM-based
autonomous agents face critical security risks from indirect toolchain attacks.
This master’s thesis introduces a benchmark designed to track the long-horizon drift
under indirect toolchain attacks. For this purpose, we utilized a stateful framework built on
LangGraph and simulated a financial data reporting pipeline weaponized with 14 distinct
attack vectors embedded in CSV, XML, MD, and HTML files.
Comparative analysis of GPT-4o, Gemini-2.5-Pro, and GPT-oss:20b revealed significant
disparities in model resilience. Gemini-2.5-Pro exhibited the highest susceptibility, in contrast
to GPT-4o, which demonstrated the highest baseline robustness. Furthermore, all models
proved vulnerable to social engineering and urgency-based prompts. Evaluation of two
defensive strategies suggests that structured system prompts are often bypassed by indirect
attacks, whereas an LLM-Judge significantly reduced attack success rates. Ultimately, this
thesis validates long-horizon drift, demonstrating that early-stage poisoned data can condition
agents to view data exfiltration as a logical outcome of their task.