Words As Weapons
Breaking AI and Agents; Then Securing Them
DOI:
https://doi.org/10.32473/flairs.39.1.141778Keywords:
AI, AI Security, Adversarial Machine LearningAbstract
As LLM systems move from prototypes into real products and research stacks, security and robustness are often under-examined relative to capability gains. This hands-on tutorial presents an Attack -> Defense workflow for prompt injection. Using qbtrain.com, we reproduce three escalating scenarios and implement focused mitigations: (1) LLM -> Database integration with direct and indirect prompt injection that manipulates database state; (2) EchoLeak-style indirect prompt injection for sensitive data exfiltration; and (3) model theft in image diffusion systems, where attendees inspect how copying works and how watermarking can help protect model IP. Google Colab provides the compute environment for running prepared notebooks and evaluations, while qbtrain.com presents clean, separate modules that simulate realistic application workflows and attacker interactions. Each module includes lightweight defenses suitable for research and teaching. The tutorial is fully guided. Attendees leave with runnable Colab materials, structured qbtrain modules, attack and defense modules, and a repeatable evaluation workflow.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Pavan Reddy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.