Tuesday, 5 March 2013

Detecting and Tracking Political Abuse in Social Media

Online social media represent how information is being produced, transferred and consumed. User generated content in the form of blog posts, comments, and tweets establish a connection between the producers and the consumers of information. Social Media provides a connection between our social networks, personal information channels and the mass media. Ratkiewicz et al. 2011 have studied astroturf political campaigns on micro blogging platforms: politically-motivated individuals and organizations that use multiple centrally-controlled accounts to create the appearance of widespread support for a candidate or opinion. They have described a machine learning framework that combines topological, content-based and crowd sourced features of information diffusion networks on Twitter to detect the early stages of viral spreading of political misinformation

Online social media tools play a crucial role in the successes and failures of numerous political campaigns and causes. Examples range from the grassroots organizing power of Barack Obama’s 2008 presidential campaign, to Howard Dean’s failed 2004 presidential bid and the first-ever Tea Party rally (Rasmussen and Schoen 2010). With this increasing popularity, however, comes a dark side — as social media grows in prominence, it is natural that people find ways to abuse it. As a result, we observe various types of illegitimate use; spam is a common example (Grier et al. 2010; Wang 2010).

Here we focus on a particular social media platform, Twitter, and on one particular type of abuse, namely political astroturf — political campaigns disguised as spontaneous “grassroots” behavior that are in reality carried out by a single person or organization. This is related to spam but with a more specific domain context, and potentially larger consequences.

Information diffusion involves random walk in ensembles of individuals. A serious obstacle in the modeling of information propagation in the real world as well as in the blogosphere is the fact that the structure of the underlying social network is often unknown. They have adopted the term truthy to discriminate falsely-propagated information from organic grassroots memes.

Analytical Framework:

They have developed a unified framework, which is called Klatsch, that analyzes the behavior of users and diffusion of ideas in a broad variety of data feeds. They have modeled a generic stream of social networking data as a series of events that represent interactions between actors and memes, as shown in Fig. 1. Each event involves some number of actors (entities that represent users), some number of memes (entities that represent units of information at the desired level of detail), and interactions among them. For example, a single tweet event might involve three or more actors: the poster, the user she is retweeting, and the people she is addressing. The post might also involve a set of memes consisting of ‘hashtags’ and URLs referenced in the tweet. Each event can be thought of as contributing a unit of weight to edges in a network structure, where nodes are associated with either actors or memes. The timestamps associated with the events allow us to observe the changing structure of this network over time.

                                               Image Courtesy: Ratkiewicz et al. 2011 

They have identified four Meme Types described as follows:
1) Hashtags: The Twitter community uses tokens prefixed by a hashmark (#) to label the topical content of tweets. E.g. #NaMo is used for Narendra Modi.
2) Mentions: A Twitter user can include another user’s screen name in a post, prepended by the @ symbol. These mentions can be used to denote that a particular Twitter user is being discussed.
3) URLs: We extract URLs from tweets by matching strings of valid URL characters that begin with ‘http://.’
4) Phrases: We consider the entire text of the tweet itself to be a meme, once all Twitter metadata, punctuation, and URLs have been removed.

Network Edges:

To represent the flow of information through the Twitter community, we construct a directed graph in which nodes are individual user accounts. An example diffusion network involving three users is shown in Fig. 2. An edge is drawn from node A to B when either B is observed to retweet a message from A, or A mentions B in a tweet. The weight of an edge is incremented each time we observe an event connecting two users.
                                                    Image Courtesy: Ratkiewicz et al. 2011

System Architecture:

A.  Data Collection: To collect meme diffusion data they rely on whitelisted access to the Twitter ‘Gardenhose’ streaming API (dev. twitter.com/pages/streaming_api).

B. Meme Detection: To determine which of the collected tweets are to be stored in database for further analysis. They have collected only tweets (a) with content related to U.S. politics, and (b) of sufficiently general interest in that context. Political relevance is determined by matching against a manually compiled list of keywords. They consider a meme to be of general interest if the number of tweets with that meme observed in a sliding window of time exceeds a given threshold.

C. Network Analysis: Analysis to characterize the structure of each meme’s diffusion network, they computed several statistics based on the topology of the largest connected component of the retweet/mention graph. (Ratkiewicz et al. 2011).

D. Sentiment Analysis: They used a modified version of the Google-based Profile of Mood States (GPOMS) sentiment analysis method (Bollen, Mao, and Pepe 2010) in the analysis of meme-specific sentiment on Twitter.

E.   Automatic Classification: As an application of the analyses performed by the Truthy system, they trained a binary classifier to automatically label legitimate and truthy memes. They formed three classes — ‘truthy,’ ‘legitimate,’ and ‘remove.’ A meme was to be classified as ‘truthy’ if a significant portion of the users involved in that meme appeared to be spreading it in misleading ways. ‘Legitimate’ memes were described as memes representing normal use of Twitter several non-automated users conversing about a topic. The final category, ‘remove,’ was used for memes in a non-English language or otherwise unrelated to U.S. politics (#youth, for example). These memes were not used in the training or evaluation of classifiers.

This simple classification system was able to accurately detect ‘truthy’ memes based on features extracted from the topology of the diffusion networks. Using this system we can identify a number of ‘truthy’ memes. Some of memes exhibits explosive growths. They are the clear examples of the coordinated attempts to deceive Twitter users. The meme types which are discussed above are characterized by small diffusion networks. Using these memes the attempts of deceiving can be identified. Once these attempts are successful at gaining attention of community then meme spreading pattern becomes organic one. So, early detection and termination of accounts associated with astroturf memes is critical.

Regarding this topic the future work could further explore the different annotations of truthy memes. For a classifier different promising features could be used as input such as age of accounts involved in spreading the meme, reputation of the user so that the identification of such accounts can become easier.

[1] J. Ratkiewicz, M. D. Conover, M. Meiss, B. Gonc¸ alves, A. Flammini, F. Menczer 2011. Detecting and Tracking Political Abuse in Social Media. In Proc. of Fifth Intl. AAAI conf. Weblogs and Social Media.

[2] Rasmussen, S., and Schoen, D. 2010. Mad as Hell: How the Tea Party Movement Is Fundamentally Remaking Our Two-Party System. HarperCollins.

[3] Grier, C.; Thomas, K.; Paxson, V.; and Zhang, M. 2010. @spam: the underground on 140 characters or less. In Proc. 17th ACM Conf. on Computer and Communications Security (CCS), 27–37.

[4] Wang, A. H. 2010. Don’t follow me: Twitter spam detection. In Proc. 5th Intl. Conf. on Security and Cryptography (SECRYPT).

[5] Bollen, J.; Mao, H.; and Pepe, A. 2010. Determining the public mood state by analysis of micro blogging posts. In Proc. of the Alife XII Conf. MIT Press.

No comments:

Post a Comment