A web application which generates regular expressions (regex) automatically by means of examples: each example is a pair of strings. The actual generation is performed using Genetic Programming.
We provide here a set of dataset of annotated strings which we used in order to experimentally evaluate a method for automatic inference of text extractors using Genetic Programming (GP).
A labeled dataset of several digitalized paper documents, processed by OCR.
We used this dataset (or part of it) for assessing the performance of several systems for document understanding and classification we built.
The charts presented here are obtained using citations data for the paper published between 2000 and 2009 (included) on 8 important Computer Science venues. Data says that every year, a significant percentage of papers that should be considered as being of "high quality" under any metric or human judgement, either never get cited at all or take just a bunch of citations.
We provide here a dataset which can be useful for evaluating the performance of a classifier for discriminating hidden fraudulent URLs. The dataset contains 185180 labeled URLs and some related features.
We provide here a dataset of XML files which we used in order to experimentally evaluate a method for automatic schema generation using Genetic Programming (GP).