On the Automatic Construction of Regular Expressions from Examples (GP vs. Humans 1-0)

posted Apr 21, 2016, 3:02 AM by Eric Medvet   [ updated Aug 3, 2016, 12:46 AM ]
Regular expressions are systematically used in a number of different application domains. Writing a regular expression for solving a specific task is usually quite difficult, requiring significant technical skills and creativity. We have developed a tool based on Genetic Programming capable of constructing regular expressions for text extraction automatically, based on examples of the text to be extracted.
We have recently demonstrated that our tool is human-competitive in terms of both accuracy of the regular expressions and time required for their construction. We base this claim on a large-scale experiment involving more than 1700 users on 10 text extraction tasks of realistic complexity. The F-measure of the expressions constructed by our tool was almost always higher than the average F-measure of the expressions constructed by each of the three categories of users involved in our experiment (Novice, Intermediate, Experienced). The time required by our tool was almost always smaller than the average time required by each of the three categories of users. The experiment is described in full detail in "​Can a machine replace humans in building regular​ ​expressions? A case study.​", to appear in IEEE Intelligent Systems.