News‎ > ‎

Compressing Regular Expression Sets for Deep Packet Inspection

posted May 20, 2014, 4:07 AM by Alberto Bartoli   [ updated May 20, 2014, 5:04 AM by Eric Medvet ]
We have just been notified that our latest work has been accepted at the prestigious "13th International Conference on Parallel Problem Solving from Nature (PPSN 2014)".

The problemThe ability to generate security-related alerts while analyzing network traffic in real time has become a key mechanism in many networking devices, ranging from intrusion detection systems to firewalls and switches. While early systems classified traffic based only on header-level packet information, modern systems are capable of detecting malicious patterns within the actual packet payload.
This deep packet inspection capability is usually based on pattern descriptions expressed in the form of  regular expressions, because statically-defined strings have become inadequate to describe attack signatures.

Implementing a regular expression evaluation engine capable of analyzing network traffic at line speed  is considerably difficult, also because there are usually hundreds or thousands of regular expressions to be analyzed and this set needs to be periodically updated to address novel attacks.
For this reason, there has been a considerable amount of recent proposals aimed at handling the corresponding performance problems...

Our solution: We address a different dimension of the design space and propose an approach which complements the existing proposals. Rather than than attempting to construct a new set of expressions formally equivalent to the original one (but simpler to evaluate at run-time), we aim at constructing a different set with the same detection behavior on the traffic of interest.

As it turns out, this relaxed problem formulation allows a broad range of compressions and simplifications which would not be possible when insisting on having exactly the very same detection behavior on all possible strings...Key component of our proposal is an evolutionary search phase based on Genetic Programming...

Our resultsWe assess our proposal on several real sets of regular expressions used in the Snort intrusion detection system. We considered sets with a number of regular expressions ranging from 10 to 474 and an aggregate length ranging from 260 to about 59742. The results are highly promising: we obtain a decrease in the number of regular expressions and a decrease in aggregate length in the order of 74% (!)

The authors of this work are Alberto Bartoli, Simone Cumar, Andrea De Lorenzo, Eric Medvet. This work has been built upon the masters thesis by Simone.