Reinforcement Learning Simulation for Web Application Deception Strategies

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

MPhil Thesis Defence

Title: "Reinforcement Learning Simulation for Web Application
Deception Strategies"


Mr. Andrei KVASOV


Web applications are constantly under attack as the public-facing components of
information systems. One defense mechanism is deception, which introduces
deceptive components into the application intended to distract attackers from
the successful attack path and make the system notice their behavior. Despite
previous partial explorations of this idea, widespread adoption in the industry
remains limited due to the absence of a comprehensive end-to-end solution,
among other factors. Moreover, assessing the effectiveness of deception
strategy poses a challenge as it necessitates conducting human experiments,
which can be both costly and impractical for every individual web application
scenario. Therefore, our research endeavours to address this issue by proposing
a methodology to simulate attacks on web applications that employ deceptive
strategies. Leveraging the power of deep reinforcement learning (RL),
specifically the Double Deep Q-Network (DDQN) method, we train an attacker
model with the objective of successfully infiltrating a simulated web
application, thereby allowing us to comprehensively evaluate the efficacy of
defensive measures. Within this framework, known as CyberBattleSim, we
meticulously construct web application components and proceed to conduct a
series of experiments to address the following research inquiries: (1) How does
the presence of deceptive elements impact the attacker's detection time, that
is the frequency of triggers and the number of steps an attacker agent takes
before any honeytoken is triggered? (2) How does the effect of detection points
on defensive capabilities changes when the number of deception elements
increases? (3) Which types of honeytokens and detection points prove most
effective in reducing detection time, and what factors contribute to their
efficacy? Our empirical findings reveal that even without incorporating
explicit cyberattack concepts, the reinforcement learning (RL) agent
successfully attains its objective and learns the optimal attack vector.
Moreover, more protected, complex environments impede agents learning,
revealing the different impact each honeytoken has on the success of the
attack. With our confined, but flexible, high-level approach to replicating web
application attacks, the simulator can generalize and scale to even more
sophisticated, realistic examples.

Date:                   Wednesday, 23 August 2023

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        lifts 25/26

Committee Members:      Prof. Huamin Qu (Supervisor)
                        Prof. Nevin Zhang (Chairperson)
                        Dr. Shuai Wang

**** ALL are Welcome ****