Incorporating Reinforcement Learning into the Big and Efficient Agent-based Simulation Toolkit (BEAST)
Supervisors
Suitable for
Abstract
Introduction
Agent-Based Models (ABMs) are powerful tools for simulating complex systems, where individual agents operate based on predefined rules, leading to the emergence of collective behaviours. These emergent phenomena can be either desirable, such as achieving equilibrium or convergence, or undesirable, such as runaway snowball effects that amplify systemic instability. ABMs provide a flexible framework to analyse and predict such outcomes, enabling insights into the underlying dynamics of complex systems.
BEAST (Big and Efficient Agent-based Simulation Toolkit) is a high-performance, modular simulation platform designed for modelling large-scale ABM simulations. It leverages advanced computational techniques such as GPU acceleration, Pytorch vectorized operations, and scalable batch processing to handle complex interactions between agents and their environments. BEAST has been used to study the spatial ecology and complex population dynamics of a large genetically modified mosquito population. By integrating biological behaviours, spatial dynamics, and environmental feedback, BEAST enables researchers to explore scenarios such as species dispersal, population dynamics, and genetic inheritance with unprecedented detail and efficiency. Its modular design ensures flexibility, allowing users to implement custom models, define agent behaviours, and integrate geospatial data for realistic simulations. With support for millions of agents and multi-GPU capabilities, BEAST is particularly well-suited for studying ecological processes at large temporal and spatial scales. However, current implementations rely on predefined rules, limiting the model's adaptability and responsiveness to dynamic environments.
Reinforcement Learning (RL), a subfield of machine learning, enables agents to learn optimal behaviours through interaction with their environment. Integrating RL into BEAST can enable agents to adaptively achieve desirable emergent behaviours, advancing the state-of-the-art in ABM design and providing new capabilities for simulating real-world scenarios.
This thesis proposes a novel integration of RL into the BEAST framework to train agents to optimise survival and reproduction under dynamic environmental conditions. The methodology focuses on leveraging RL to guide agent decision-making, achieving emergent system-level objectives without requiring exhaustive rule definitions.
State of the Art: Reinforcement Learning in ABMs
Current research demonstrates promising applications of RL in ABMs, including adaptive traffic management, ecological simulations, and urban development modelling. RL has been particularly effective in enabling:
- Dynamic Strategy Adaptation: Agents learn strategies for resource acquisition and risk avoidance.
- Scalable Coordination: Multi-agent RL techniques enhance coordination among agents to achieve collective goals.
- Optimised Emergent Phenomena: RL facilitates desirable emergent patterns, such as survivability, stability or equilibrium, in systems with complex interactions.
Core Contribution
The primary contribution of this thesis is to:
- Develop a methodology for integrating RL into the BEAST framework, enabling agents to learn behaviours that optimise and achieve desirable emergence (e.g., survivability of mosquitoes having a particular genes)
- Demonstrate the scalability and effectiveness of RL in large-scale ABMs, addressing computational and behavioural complexities.
- Introduce a case study using mosquitoes as agents that utilise RL to improve survival and reproduction by adapting to environmental changes, including resource availability, predators, and spatial boundaries.
Goals and Objectives
- Enhance BEAST with RL Capabilities:
- Design RL modules for agent decision-making.
- Implement scalable RL training algorithms.
- Achieve Desirable Emergence in ABMs:
- Define metrics to evaluate emergent phenomena.
- Develop training objectives for aligning RL policies with system-level goals.
- Validate RL-Enhanced BEAST Framework:
- Simulate mosquito survival scenarios.
- Compare RL-driven and rule-based agent behaviours in achieving emergent objectives.
Methodology
- Framework Enhancement:
- Integrate RL libraries (e.g., PyTorch) with BEAST’s core modules.
- Design interfaces for RL policy updates and reward assignment in the BEAST simulation loop.
- Training Process:
- Define states (e.g., resource availability, neighbouring agents), actions (e.g., movement, reproduction), and rewards (e.g., survival, offspring count).
- Employ Actor-Critic or Proximal Policy Optimisation (PPO) algorithms for efficient training.
- Emergence Optimization:
- Formulate system-level objectives as emergent properties (e.g., population stability).
- Train agents to maximise individual rewards aligned with global objectives.
- Case Study Implementation:
- Simulate mosquito agents learning to optimise survival by avoiding predators, seeking mates, and locating resources using RL policies.
- Evaluate outcomes with and without RL.
Technical Requirements
- Hardware:
- GPU-accelerated compute resources for scalable RL training.
- High-memory nodes for large-scale ABM simulations.
- Software:
- BEAST framework with integrated RL modules.
- PyTorch for RL model training.
- CuSpatial for geospatial analysis and boundary checks.
- Data:
- Geospatial data for defining simulation environments (e.g., vegetation density, resource distribution).
- Agent-level attributes for initialisation.
Use-Case: Mosquitoes Adapting for Survival
In the proposed use-case, mosquito agents will use RL to learn behaviours that enhance their survival and reproductive success.
Key Features:
- State Representation:
- Environmental factors (e.g., temperature, humidity).
- Proximity to resources and other agents.
- Action Space:
- Movement decisions to seek resources or mates.
- Avoidance of predation risks.
- Reward Function:
- Positive rewards for successful reproduction and resource acquisition.
- Negative rewards for predation or energy depletion.
Emergent Outcomes:
- Stable population dynamics.
- Spatially distributed resource usage.
- Adaptation to environmental changes.
Expected Outcomes
- Framework Advancement:
- Integration of RL into BEAST to support dynamic, goal-oriented agent behaviours.
- Scalable Simulations:
- Demonstrate RL’s scalability for large-scale, heterogeneous ABMs.
- Impactful Use-Case:
- Validate RL’s effectiveness in improving emergent behaviours, such as mosquito population management.