Data and tools
tools
Automatic generator of regular expressions from examples:
A web application which generates regular expressions (regex) automatically by means of examples: each example is a pair of strings. The actual generation is performed using Genetic Programming.
This tool is a demo of our works awarded with the Silver Medal at the 13-th HUMIES 2016 (Awards for Human-Competitive Results produced by Genetic and Evolutionary Computation)
Data
Annotated strings for learning text extractors:
We provide here a set of dataset of annotated strings which we used in order to experimentally evaluate a method for automatic inference of text extractors using Genetic Programming (GP).
Ghega-dataset: a dataset for document understanding and classification:
A labeled dataset of several digitalized paper documents, processed by OCR. We used this dataset (or part of it) for assessing the performance of several systems for document understanding and classification we built.
Paper citations for important Computer Science venues:
The charts presented here are obtained using citations data for the paper published between 2000 and 2009 (included) on 8 important Computer Science venues. Data says that every year, a significant percentage of papers that should be considered as being of "high quality" under any metric or human judgement, either never get cited at all or take just a bunch of citations.
Hidden fraudulent URLs dataset:
We provide here a dataset which can be useful for evaluating the performance of a classifier for discriminating hidden fraudulent URLs. The dataset contains 185180 labeled URLs and some related features.
XML data for automatic schema generation:
We provide here a dataset of XML files which we used in order to experimentally evaluate a method for automatic schema generation using Genetic Programming (GP).
We provide here a set of datasets which we used in order to experimentally evaluate a method for automatic inference of search-and-replace expressions.
supplementary material
Automatic Synthesis of Regular Expressions from Examples
Evolutionary Inference of Attribute-based Access Control Policies
Inference of Regular Expressions for Text Extraction from Examples
Active Learning of Regular Expressions for Entity Extraction
Weighted Hierarchical Grammatical Evolution: A Genotype-Phenotype Mapping with Better Properties
Unveiling Evolutionary Algorithm Representation with DU Maps
Visualizing the Outcome of Dynamic Analysis of Android Malware with VizMal