The custom, ad hoc nature of web applications makes learning-based anomaly detection systems a suitable approach to provide early warning about the exploitation of novel vulnerabilities. However, anomaly-based systems are known for producing a large number of false positives and for providing poor or non-existent information about the type of attack that is associated with an anomaly.
This paper presents a novel approach to anomaly-based detection of web-based attacks. The approach uses an anomaly generalization technique that automatically translates suspicious web requests into anomaly signatures. These signatures are then used to group recurrent or similar anomalous requests so that an administrator can easily deal with a large number of similar alerts. In addition, the approach uses a heuristics-based technique to infer the type of attacks that generated the anomalies. This enables the prioritization of the attacks and provides better information to the administrator. Our approach has been implemented and evaluated experimentally on real-world data gathered from web servers at two universities.