Annotated strings for learning text extractors