Detecting and Preventing Attacks Against Web Applications

William Robertson
In UC Santa Barbara

anomaly detection intrusion detection machine learning web security

The World Wide Web has evolved from a system for serving an interconnected set of static documents to what is now a powerful, versatile, and largely democratic platform for application delivery and information dissemination. Unfortunately, with the web’s explosive growth in power and popularity has come a concomitant increase in both the number and impact of web application-related security incidents. The magnitude of the problem has prompted much interest within the security community towards researching mechanisms that can mitigate this threat. To this end, intrusion detection systems have been proposed as a potential means of identifying and preventing the successful exploitation of web application vulnerabilities.

The current state-of-the-art, however, has failed to deliver on the promise of intrusion detection. Misuse-based detection systems are unable to generalize to previously unknown attacks for which no signatures exist. In the context of the web, this is especially problematic in light of the wide proliferation of unique, custom-written web applications. On the other hand, anomaly-based intrusion detection systems seem well-suited for detecting attacks against web applications. Existing anomaly detection techniques, however, have heretofore proven unfeasible due to several factors: unacceptably high false positive rates, susceptibility to evasion, an inability to adapt to changes in monitored applications, and a lack of explanatory power.

In this dissertation, I present WebAnomaly, an advanced black-box anomaly detection system that accurately detects attacks against web applications with low performance overhead. WebAnomaly addresses several of the aforementioned fundamental challenges to anomaly detection using a combination of novel techniques. In particular, the relatively high rate of false positives and lack of explanatory power is ameliorated using anomaly signatures, a technique for clustering related anomalies and classifying the type of attack they represent. The problem of local training data scarcity is addressed through the use of global knowledge bases of well-trained profiles collected from other web applications. Changes in web application behavior over time, known as concept drift, are addressed by treating the web application itself as an oracle of legitimate change. Finally, a novel framework for developing web applications that are secure by construction against many common classes of attacks is presented.