We describe the approach that we submitted to the 2015 PAN competition for the author identification task. The task consists in determining if an unknown document was authored by the same author of a set of documents with the same author.
We propose a machine learning approach based on a number of different features that characterize documents from widely different points of view. We construct non-overlapping groups of homogeneous features, use a random forest regressor for each features group, and combine the output of all regressors by their arithmetic mean. We train a different regressor for each language.
Our approach achieved the first position in the final rank for the Spanish language.