Improving Resilience Against Cyber-attacks via Reward-Shaped Reinforcement Learning in a Network Defense Game
DOI:
https://doi.org/10.32473/flairs.39.1.141839Keywords:
cyber-defense, deep reinforcement learning, reward shapingAbstract
Artificial intelligence tools are being increasingly used by cyber-attackers to craft sophisticated attacks that can expose vulnerabilities and establish backdoors on enterprise networks. To respond to such smart attackers, cyber-defense mechanisms need to be dynamic and agile by precisely predicting attack locations in the network and rapidly removing any attacker artifacts. To address this problem, reinforcement learning (RL) techniques have been demonstrated as a successful means for devising effective cyber-defense techniques via penetration testing. However, a limitation of such RL techniques is the increasing latency in learning a defender policy against dynamically changing attack strategies. In this paper, we explore reward shaping techniques within RL as a means to improve the learning times for defender policies. We show that periodically injecting real-time network information such as node importance and network compromise state via a shaped reward functions into the RL algorithm can accelerate the defender’s learning time. We report experimental results on different topologies and configurations of a simulated enterprise network and show that our proposed approach can significantly improve learning times and effectiveness for the defender policies.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Prithviraj Dasgupta

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.