posted Jan 15, 2014, 4:34 AM by Alberto Bartoli
updated Jan 15, 2014, 1:12 PM by Eric Medvet
A paper describing our system
for the automatic generation of regular expressions from examples
has been accepted for publication on IEEE Computer
, one of the most prestigious magazines in computing.
The user describes the desired task only by means of a set of labeled examples.
We performed an extensive experimental evaluation on 12 different text extraction tasks applied to real-world datasets. We obtained very good results in terms of precision and recall, even in comparison to earlier state-of-the-art proposals.
Our results are highly promising toward the achievement of a practical surrogate for the specific skills required for generating regular expressions, and significant as a demonstration of what can be achieved with GP-based approaches on modern IT technology (the system is internally based on Genetic Programming, but this fact is completely transparent to users).
As an aside, the problem attacked by our system is similar to regex golf
, a topic that has recently got a lot of attention
due to a nice algorithm
proposed by Peter Norvig
(Director of Google Research and one of the greatest stars in Computer Science).
In the coming days we will explain why our problem is different from regex golf and, most importantly, we will probably make public a webapp for playing regex golf automatically...of course capable of obtaining pretty good scores.