Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense
Full Paper — Reinforcement Learning Conference (RLC) 2025
Extended Abstract — Autonomous Agents and Multiagent Systems (AAMAS) 2025
28 October, 2024
Abstract
Recent advances in multi-agent reinforcement learning (MARL) have created opportunities to solve complex real-world tasks. Cybersecurity is a notable application area, where defending networks against sophisticated adversaries remains a challenging task typically performed by teams of security operators. In this work, we explore novel MARL strategies for building autonomous cyber network defenses that address challenges such as large policy spaces, partial observability, and stealthy, deceptive adversarial strategies. We propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery, with sub-policies leveraged by a master defense policy for coordination. We conduct extensive experiments using CybORG Cage 4, achieving top performance in convergence speed, episodic return, and interpretable cybersecurity metrics.