Dr Jekyll and Mr Hyde : Detecting Sybils in Social Networks

Prologue

In December 2002, ZDNet, a famous technology website conducted a poll in the United Kingdom to find out what platform their readers preferred to build web services. By the end of December, a large majority of those (nearly half the total sample) had responded saying they were planning to use Java. Only 21.5 percent said they planned to use Microsoft .Net - less than the figure (23.5 percent) planning to use neither. But by the time the poll closed, on 5 January, the position had dramatically changed, with three quarters of voters claiming to be implementing .Net. ZDNet UK logs revealed rather obvious vote rigging, including a large number of people who made attempts at multiple voting using fake logins despite that being blocked by the poll scripts ( The one that takes the cake made a whopping 228 attempts to vote)^[1].Sounds like a scene straight from a conspiracy movie, doesn't it? Sadly, this is case of fact imitating fiction.

The problem of fake identities, or Sybils as they are known as*, is not a new one,neither in the real nor the virtual world, but with the rise of online social networks, the need to detect such Sybils and separate the fake from the real has only risen since fakes can introduce spam, manipulate online rating, or exploit knowledge that they have extracted from the network. There have been several approaches to detect Sybils and today I'll discuss one of these approaches, known as SybilRank^[2] that detects suspicious accounts in a social network based on a social-graph with bidirectional relationship.

The Concept

A visual overview of Sybil nodes and attack edges

SybilRank works on an undirected social graph with the assumption that non-Sybil nodes are well-connected. It also assumes that Sybils establish attack edges to non-Sybil nodes, which are limited in number, due to the difficulty of soliciting and maintaining reciprocal social relationships and Sybils randomly attach attack edges to non-Sybils.

SybilRank uses a power iteration method similar to the one used in PageRank and ranks users according to suspiciousness. This works as follows :

1) The iterations start with a set of trust-seeds (profiles/user which we know for sure are non-Sybils) with all of them assigned an initial and equal amount of “trust”, which is a numerical value whose sum across all nodes remains constant throughout the iterations.

2) In each iteration, every node performs a calculation similar to the one used in calculating PageRank. It considers the trust of all its neighbours and updates its trust by summing all these trust values. Thus if the trust value of a node v after (i-1) iterations is T^i-1_v,then

The above iterations are not allowed to go to completion. Instead they are terminated early (say after log(n) iterations), after which you get a ranking based on the trust values of the nodes. This ranking is prepared by degree-normalizing the trust value since it removes the bias towards degrees (which contributes to this algorithms better perfomance). Now why does this work? Re-read the assumption that states that attack-edges are limited in number. Hence, trust can flow to the Sybil nodes through a very limited number of edges, thus condemning them to the nadir of the rankings list.

The Case of Communities

We know that almost all social networks have communities embedded within them and these communities usually have a very limited number of edges connecting them. If the above algorithm, chooses the trust seeds from a selected number of communities, then there is a very high chance of marking nodes from other communities as Sybil-nodes incorrectly. To overcome this, the trust seeds are distributed across possible communities to ensure fairness. The method used by the creators of SybilRank is to start by discovering communities using the Louvain method^[3] and then seed nodes in each discovered community.

My $0.02

The SybilRank algorithm is an innovative and efficient algorithm, and has taken a lot of forward steps as compared to previous algorithms by its features such as its use of power-iteration-computed landing probability as opposed to random walk traces (SybilLimit^[4]). As compared to other power-iteration algorithms (EigenTrust^[5]), its better performance was put down to the two features, one the early-termination and the second was the degree-normalization. However, the last call after applying any such algortithms still cannot be automated since these algorithms only produce a list that has to be manually verified which is still a tedious process that consumes a lot of manpower. Also this algorithm cannot successfully detect Sybils which attack the non-Sybils in a planned manner, or by targeting a large number of non-Sybils. If possible, I'll discuss these and various other methods for Sybil detection or prevention of Sybil attacks in a future blog-post.

References:

[1] .Net vote rigging illustrates importance of Web services - http://www.zdnet.com/net-vote-rigging-illustrates-importance-of-web-services-3002102244/

[2] Aiding the Detection of Fake Accounts in Large Scale Social Online Services, Qiang Cao, Michael Sirivianos, Duke University, Xiaowei Yang, Tiago Pregueiro.

[3] Fast Un-folding of Communities in Large Networks,V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre

[4] SybilLimit : A Near Optimal Social Network Defense Structure Against Sybil Attacks, Haifeng Yu, Philip Gibbons, Michael Kaminsky, Feng Xiao

[5] The EigenTrust Algorithm for Reputation Management in P2P Networks : Sepandar D. Kamvar, Mario T. Schlosser, Hector Garcia-Molina

Notes:

* - Interestingly, the name Sybil comes from a book of the same name, wherein the titular character suffered from dissociative identity disorder, and manifested no less than 16 different personalities.

Six Degrees - CNT 2013 at CSE, IITKgp

Monday, 18 March 2013