Words As Weapons

Breaking AI and Agents; Then Securing Them

Authors

  • Pavan Reddy The George Washington University

DOI:

https://doi.org/10.32473/flairs.39.1.141778

Keywords:

AI, AI Security, Adversarial Machine Learning

Abstract

As LLM systems move from prototypes into real products and research stacks, security and robustness are often under-examined relative to capability gains. This hands-on tutorial presents an Attack -> Defense workflow for prompt injection. Using qbtrain.com, we reproduce three escalating scenarios and implement focused mitigations: (1) LLM -> Database integration with direct and indirect prompt injection that manipulates database state; (2) EchoLeak-style indirect prompt injection for sensitive data exfiltration; and (3) model theft in image diffusion systems, where attendees inspect how copying works and how watermarking can help protect model IP. Google Colab provides the compute environment for running prepared notebooks and evaluations, while qbtrain.com presents clean, separate modules that simulate realistic application workflows and attacker interactions. Each module includes lightweight defenses suitable for research and teaching. The tutorial is fully guided. Attendees leave with runnable Colab materials, structured qbtrain modules, attack and defense modules, and a repeatable evaluation workflow.

Downloads

Published

06-05-2026

How to Cite

Reddy, P. (2026). Words As Weapons: Breaking AI and Agents; Then Securing Them. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141778