Automatic Synthesis of Regular Expressions from Examples

Supplemental material to our IEEE Computer paper (editor site)

Parameters of the GP search

Example individual

Additional Results

The performance indexes of our approach are obtained as the average performance of the best expressions generated in each of the five repetitions, where the best expression for each repetition is chosen by evaluating J=128 individuals on the validation set. We analyzed all the 5x128 individuals that compose the final populations of the five repetitions and reported the corresponding performance distributions in the figure below (learning set with 100 examples and J=128).
It can be seen that the very good performance that we obtain is not the result of a bunch of lucky individuals: our approach manage to generate systematically a number of different expressions with high values of precision, recall and F-measure.

The figure below contains the performance indexes obtained for various flavours of fitness (see the paper for details).