Study of Spreading Phenomena in Blog Network Communities

Introduction

Spreading occurs in a network when a an action, information or idea becomes adopted due to the influence of neighbors in the network. The results of studies on diffusion are used in many applications. For example, viral marketing exploits existing social networks and encourages customers to share product information with their friends. The “cascade” diffusion model allows to investigate which individual dynamics lead to global spreading phenomena. A diffusion cascade occurs when information spreads from one node to the rest of the

network through a succession of diffusion events. Cascades have been theoretically analyzed in random graphs using a threshold model.This model uses the community structure of the network.

It has been observed that nodes with common features tend to interact preferentially with each other. These groups of nodes are called communities. Among the various definitions of “community” which exist in the literature, we use the following one: “A community is a set of nodes with common features or interests”. The community structure enables an analysis at different scales: local (individual nodes), global (whole network) and intermediate levels (groups of nodes).

Cascades Computation

1. Data corpus

A blog is composed of a set of posts written at a given time with a body text and references to other pages on the web (for example pictures, videos and websites). The text may contain references to previous posts, from the same blog (auto-citation) or from another blog, by quoting the corresponding URLs, which are called citation links. Citation links are very important as they represent a diffusion of information in the network. Consider a post Pa from blog A and a post Pb from blog B. If Pa contains a reference to Pb, then there is a citation link from Pa to Pb, i.e. Pa cites Pb. In terms of information spreading, we can say that Pa has ’adopted’ Pb’s content or that Pb’s content has been spread towards Pa.

Fig. Blog network community structure

2. Hierarchical Community Structure

The structure may be obtained in two different ways. First, by executing an automatic community detection algorithm. Second, by classifying manually each blog into hierarchical classes. Such a classification is generally hard to obtain due to the large size of datasets, but is very interesting as it is validated manually, unlike automatic classification.The hierarchical community structure considered for this dataset comprises 5 levels: level 0 corresponding to a single community (with all blogs), level 1 with 3 communities called continents (Leisure, Individuality, Society), level 2 with 16 regions, level 3 with 96 territories and finally level 4 with 10,309 individual blogs.In the following, we explain the formalism we will use in the paper. Let a graph G = (V,E), with V a set of nodes and E a set of edges. Our methodology requires a community structure such that each node of V belongs to exactly one community at each level of the tree (more general hierarchical community structures will be considered in the future to allow overlapping communities).

3. Community distance

Given a couple of communities u ∈ Pi and v ∈ Pj , there exists a minimal integer t such that there is a community C in Pt with u ⊂ C and v ⊂ C. We define the community distance of the spreading link (u, v) as:

d(u, v) =[(i − t) + (j − t)]/2

Fig. Community distance example

Table: Cascade Shapes ordered by Frequency.

Fig. Largest cascades. All cascades have a number of node >= 40.

References

[1] Abdelhamid Salah Brahim, Bénédicte Le Grand and Matthieu Latapy. Diffusion Cascades: Spreading Phenomena in Blog Network Communities, Parallel Processing Letters 22(1): (2012)

[2] A. S. Brahim, B. L. Grand, L. Tabourier, and M. Latapy. Citations among blogs in a hierarchy of communities: Method and case study. Journal of Computational Science, 2(3):247 – 252, 2011.

[3] M. Cha, A. Mislove, B. Adams, and K. P. Gummadi. Characterizing social cascades in flickr. In Proceedings of the first workshop on Online social networks, WOSN ’08, pages 13–18, New York, NY, USA, 2008. ACM.

[4] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In In: 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, Cuen, pages 149–159, 2001.

[5] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PNAS, 99(12):7821–7826, 2002.

[6] M. Granovetter. Threshold Models of Collective Behavior. The American Journal of Sociology, 83(6):1420–1443, 1978.

Six Degrees - CNT 2013 at CSE, IITKgp

Sunday, 31 March 2013