Description
While Large Language Model (LLM) agents transform the landscape of automation, integrating them with external tools creates an expanded attack surface. Consequently, LLM-based autonomous agents face critical security risks from indirect toolchain attacks. This master’s thesis introduces a benchmark designed to track the long-horizon drift under indirect toolchain attacks. For this purpose, we utilized a stateful framework built on LangGraph and simulated a financial data reporting pipeline weaponized with 14 distinct attack vectors embedded in CSV, XML, MD, and HTML files. Comparative analysis of GPT-4o, Gemini-2.5-Pro, and GPT-oss:20b revealed significant disparities in model resilience. Gemini-2.5-Pro exhibited the highest susceptibility, in contrast to GPT-4o, which demonstrated the highest baseline robustness. Furthermore, all models proved vulnerable to social engineering and urgency-based prompts. Evaluation of two defensive strategies suggests that structured system prompts are often bypassed by indirect attacks, whereas an LLM-Judge significantly reduced attack success rates. Ultimately, this thesis validates long-horizon drift, demonstrating that early-stage poisoned data can condition agents to view data exfiltration as a logical outcome of their task.
|