An Empirical Assessment of Contemporary Online Media in Ad-Hoc Corpus Creation for Social Events.

Abstract

Social networking sites such as Facebook and Twitter have become favorite portals for users to discuss and express opinions. Research shows that topical discussions around events tend to evolve socially on microblogs. However, sources like Twitter have no explicit discussion thread which will link semantically similar posts. Moreover, the discussion may be evolving in multiple different threads (like Facebook). Researchers have proposed the use of online contemporary documents to act as external corpus to connect pairs of contextually related semantic topics. This motivates the question: given a significant social event, what is a good choice of external corpus to identify evolution of discussion topics around the event’s context? In this work, we compare the effectiveness of contemporary blog posts, online news media and forum discussions in creating adhoc external corpus. Using social propensity of evolution of topical discussions on Twitter to assess the goodness of the creation, we find online news media as most effective. We evaluate on three large real life Twitter datasets to affirm our findings.

Publication
In International Joint Conference on Natural Language Processing.