Playing Regex Golf with Genetic Programming

posted Mar 12, 2014, 3:00 PM by Eric Medvet   [ updated Jul 31, 2014, 3:34 AM ]
  • ACM Genetic and Evolutionary Computation Conference (GECCO), 2014, Vancouver (Canada)
  • Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, Fabiano Tarlao
  • Google Scholar
Regex golf has recently emerged as a specific kind of code golf, i.e., unstructured and informal programming competitions aimed at writing the shortest code solving a particular problem. A problem in regex golf consists in writing the shortest regular expression which matches all the strings in a given list and does not match any of the strings in another given list. The regular expression is expected to follow the syntax of a specified programming language, e.g., Javascript or PHP.
In this paper, we propose a regex golf player internally based on Genetic Programming. We generate a population of candidate regular expressions represented as trees and evolve such population based on a multi-objective fitness which minimizes the errors and the length of the regular expression.
We assess experimentally our player on a popular regex golf challenge consisting of 16 problems and compare our results against those of a recently proposed algorithm---the only one we are aware of. Our player obtains scores which improve over the baseline and are highly competitive also with respect to human players. The time for generating a solution is usually in the order of tens minutes, which is arguably comparable to the time required by human players.
Andrea De Lorenzo,
Jul 20, 2015, 6:04 AM
Eric Medvet,
Jul 31, 2014, 3:34 AM