Online social media represent how
information is being produced, transferred and consumed. User generated content
in the form of blog posts, comments, and tweets establish a connection between
the producers and the consumers of information. Social Media provides a
connection between our social networks, personal information channels and the
mass media. Ratkiewicz et al. 2011 have studied astroturf political campaigns on micro blogging platforms: politically-motivated
individuals and organizations that use multiple centrally-controlled accounts
to create the appearance of widespread support for a candidate or opinion. They
have described a machine learning framework that combines topological, content-based
and crowd sourced features of information diffusion networks on Twitter to
detect the early stages of viral spreading of political misinformation
Online social media tools play a
crucial role in the successes and failures of numerous political campaigns and
causes. Examples range from the grassroots organizing power of Barack Obama’s
2008 presidential campaign, to Howard Dean’s failed 2004 presidential bid and
the first-ever Tea Party rally (Rasmussen and Schoen 2010). With this increasing
popularity, however, comes a dark side — as social media grows in prominence,
it is natural that people find ways to abuse it. As a result, we observe
various types of illegitimate use; spam is a common example (Grier et al. 2010;
Wang 2010).
Information diffusion involves random walk in ensembles of
individuals. A serious obstacle in the modeling of information propagation in
the real world as well as in the blogosphere is the fact that the structure of
the underlying social network is often unknown. They have adopted the term
truthy to discriminate falsely-propagated information from organic grassroots
memes.
Analytical Framework:
They have developed a unified framework, which is called
Klatsch, that analyzes the behavior of users and diffusion of ideas in a broad
variety of data feeds. They have modeled a generic stream of social networking
data as a series of events that represent interactions between actors and
memes, as shown in Fig. 1. Each event involves some number of actors (entities
that represent users), some number of memes (entities that represent units of
information at the desired level of detail), and interactions among them. For
example, a single tweet event might involve three or more actors: the poster,
the user she is retweeting, and the people she is addressing. The post might
also involve a set of memes consisting of ‘hashtags’ and URLs referenced in the
tweet. Each event can be thought of as contributing a unit of weight to edges
in a network structure, where nodes are associated with either actors or memes.
The timestamps associated with the events allow us to observe the changing
structure of this network over time.
Image Courtesy: Ratkiewicz et al.
2011
They have identified four Meme Types described as follows:
1) Hashtags:
The Twitter community uses tokens prefixed by a hashmark (#) to label the
topical content of tweets. E.g. #NaMo is used for Narendra Modi.
2) Mentions:
A Twitter user can include another user’s screen name in a post, prepended by
the @ symbol. These mentions can be used to denote that a particular Twitter
user is being discussed.
3) URLs:
We extract URLs from tweets by matching strings of valid URL characters that
begin with ‘http://.’
4) Phrases:
We consider the entire text of the tweet itself to be a meme, once all Twitter
metadata, punctuation, and URLs have been removed.
Network Edges:
To represent the flow of information through the Twitter
community, we construct a directed graph in which nodes are individual user
accounts. An example diffusion network involving three users is shown in Fig.
2. An edge is drawn from node A to B when either B is observed to retweet a
message from A, or A mentions B in a tweet. The weight of an edge is
incremented each time we observe an event connecting two users.
Image
Courtesy: Ratkiewicz et al. 2011
System Architecture:
A. Data Collection: To collect meme diffusion data they
rely on whitelisted access to the Twitter ‘Gardenhose’ streaming API (dev.
twitter.com/pages/streaming_api).
B. Meme Detection: To determine which of the collected
tweets are to be stored in database for further analysis. They have collected
only tweets (a) with content related to U.S. politics, and (b) of sufficiently
general interest in that context. Political relevance is determined by matching
against a manually compiled list of keywords. They consider a meme to be of
general interest if the number of tweets with that meme observed in a sliding
window of time exceeds a given threshold.
C. Network Analysis: Analysis to characterize the
structure of each meme’s diffusion network, they computed several statistics
based on the topology of the largest connected component of the retweet/mention
graph. (Ratkiewicz et al. 2011).
D. Sentiment Analysis: They used a modified version of the
Google-based Profile of Mood States (GPOMS) sentiment analysis method (Bollen,
Mao, and Pepe 2010) in the analysis of meme-specific sentiment on Twitter.
E. Automatic Classification: As an application of the analyses performed by the
Truthy system, they trained a binary classifier to automatically label
legitimate and truthy memes. They formed three classes — ‘truthy,’
‘legitimate,’ and ‘remove.’ A meme was to be classified as ‘truthy’ if a
significant portion of the users involved in that meme appeared to be spreading
it in misleading ways. ‘Legitimate’ memes were described as memes representing
normal use of Twitter several non-automated users conversing about a topic. The
final category, ‘remove,’ was used for memes in a non-English language or
otherwise unrelated to U.S. politics (#youth, for example). These memes were
not used in the training or evaluation of classifiers.
This simple classification system was able to accurately detect
‘truthy’ memes based on features extracted from the topology of the diffusion
networks. Using this system we can identify a number of ‘truthy’ memes. Some of
memes exhibits explosive growths. They are the clear examples of the
coordinated attempts to deceive Twitter users. The meme types which are
discussed above are characterized by small diffusion networks. Using these
memes the attempts of deceiving can be identified. Once these attempts are
successful at gaining attention of community then meme spreading pattern
becomes organic one. So, early detection and termination of accounts associated
with astroturf memes is critical.
Regarding this topic the future work could further explore the
different annotations of truthy memes. For a classifier different promising
features could be used as input such as age of accounts involved in spreading the
meme, reputation of the user so that the identification of such accounts can
become easier.
References:
[1] J. Ratkiewicz, M.
D. Conover, M. Meiss, B. Gonc¸ alves, A. Flammini, F. Menczer 2011. Detecting and Tracking Political Abuse in
Social Media. In Proc. of Fifth Intl. AAAI conf. Weblogs and Social Media.
[2] Rasmussen, S., and
Schoen, D. 2010. Mad as Hell: How the Tea
Party Movement Is Fundamentally Remaking Our Two-Party System.
HarperCollins.
[3] Grier, C.; Thomas,
K.; Paxson, V.; and Zhang, M. 2010. @spam:
the underground on 140 characters or less. In Proc. 17th ACM Conf. on
Computer and Communications Security (CCS), 27–37.
[4] Wang, A. H. 2010. Don’t follow me: Twitter spam detection.
In Proc. 5th Intl. Conf. on Security and Cryptography (SECRYPT).
[5] Bollen, J.; Mao,
H.; and Pepe, A. 2010. Determining the
public mood state by analysis of micro blogging posts. In Proc. of the
Alife XII Conf. MIT Press.
No comments:
Post a Comment