Improving classification of tweets using word-word co-occurrence information from a large external corpus
Classifying tweets is an intrinsically hard task as tweets are short messages which makes traditional bags of words based approach ine cient. In fact, bags of words approaches ig- nores relationships between important terms that do not co-occur literally. In this paper we resort to word-word co-occurence informa- tion from a large corpus to expand the vocabulary of another corpus consisting of tweets. Our results show that we are able to reduce the number of erroneous classi cations by 14% using co-occurence information.
Hammer, Hugo Lewi
Engelstad, Paal E.