File(s) stored somewhere else

Please note: Linked content is NOT stored on SMU Research Data Repository (RDR) and we can't guarantee its availability, quality, security or accept any liability.

Twitter-LDA

dataset
posted on 01.04.2011 by Xin ZHAO Wayne, Jing JIANG, Jianshu WENG, Jing HE, Ee Peng LIM, Hongfei YAN, Xiaoming LI

Latent Dirichlet Allocation (LDA) has been widely used in textual analysis. The original LDA is used to find hidden "topics" in the documents, where a topic is a subject like "arts" or "education" that is discussed in the documents. The original setting in LDA, where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets. As experiments in [7] have shown that T-LDA could capture more meaningful topics than LDA in Microblogs.

The original setting in Latent Dirichlet Allocation (LDA), where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets.

Related Publication: Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E. P., Yan, H., & Li, X. (2011). Comparing twitter and traditional media using topic models. In Advances in Information Retrieval (pp. 338-349). http://dx.doi.org/10.1007/978-3-642-20161-5_34

Logo branding

Categories

Keyword(s)

History

Publication using this dataset

Exports

Logo branding

Categories

Keyword(s)

Exports