Twitter posts (aka tweets). We are using a tool (TwitterSinkhole), written by our lab staff, which collects about 40 tweets per second using the Twitter Streaming API. This will result in about 3,500,000 tweets per day, corresponding to a wealth of 650MB of text files per day.
We will use the collected dataset for research purposes on many topics: sentiment analysis, language classification, ...
Note that all these tweets are expressions of people writing in many different languages: see the picture.