Six Degrees - CNT 2013 at CSE, IITKgp

Sunday, 14 April 2013

Brain, Universe, Internet governed by same fundamental laws, suggests supercomputer simulation

By performing supercomputer simulations of the Universe, researchers have shown that the causal network representing the large-scale structure of space and time is a graph that shows remarkable similarity to other complex networks such as the Internet, as well as social and biological networks. A paper describing the simulations[1] in the journal Nature's Scientific Reportsspeculates that some as-yet unknown fundamental laws might be at work.

"By no means do we claim that the Universe is a global brain or a computer," said paper co-author Dmitri Krioukov, at the University of California, San Diego. "But the discovered equivalence between the growth of the Universe and complex networks strongly suggests that unexpectedly similar laws govern the dynamics of these very different complex systems."

For the simulations, the researchers found a way to downscale the space-time network while preserving its vital properties, by proving mathematically that these properties do not depend on the network size in a certain range of parameters, such as the curvature and age of our Universe.

After the downscaling, the research team performed simulations of the Universe's growing causal network. By parallelizing and optimizing the application, the researchers were able to complete in just over one day a computation that was originally projected to require three to four years.

"We discovered that the large-scale growth dynamics of complex networks and causal networks are asymptotically [at large times] the same, explaining the structural similarity between these networks," said Krioukov, who believes the findings have key implications for both science and cosmology.

"The most frequent question that people may ask is whether the discovered asymptotic equivalence between complex networks and the Universe could be a coincidence," he explained. "Of course it could be, but the probability of such a coincidence is extremely low. Coincidences in physics are extremely rare, and almost never happen. There is always an explanation, which may be not immediately obvious."

Such an explanation could one day lead to a discovery of common fundamental laws whose two different consequences - or limiting regimes - are the laws of gravity (Einstein's equations in general relativity) describing the dynamics of the Universe, and some yet-unknown equations describing the dynamics of complex networks.

SKELETON KEY: Researchers develop method that shows diverse complex networks have similar skeletons

The worldwide air transportation network. Each grey link resembles traffic of passengers between more than 1,000 airports worldwide; the entire network has more than 35,000 links. The red lines represent the network’s skeleton, a tree-like structure of only 1,300 links that represents the core structure of the network. Links in the skeleton are the most important connections in the network.

Northwestern University researchers are the first to discover that very different complex networks -- ranging from global air traffic to neural networks -- share very similar backbones. By stripping each network down to its essential nodes and links, they found each network possesses a skeleton and these skeletons share common features, much like vertebrates do.
Mammals have evolved to look very different despite a common underlying structure (think of a human being and a bat), and now it appears real-world complex networks evolve in a similar way.
The researchers studied a variety of biological, technological and social networks and found that all these networks have evolved according to basic growth mechanisms. The findings could be particularly useful in understanding how something -- a disease, a rumor or information -- spreads across a network.
This surprising discovery -- that networks all have skeletons and that they are similar -- was published this week by the journal Nature Communications.
“Infectious diseases such as H₁N₁ and SARS spread in a similar way, and it turns out the network’s skeleton played an important role in shaping the global spread,” said Dirk Brockmann, senior author of the paper. “Now, with this new understanding and by looking at the skeleton, we should be able to use this knowledge in the future to predict how a new outbreak might spread.”
Brockmann is associate professor of engineering sciences and applied mathematics at the McCormick School of Engineering and Applied Science and a member of the Northwestern Institute on Complex Systems (NICO).
Complex systems -- such as the Internet, Facebook, the power grid, human consciousness, even a termite colony -- generate complex behavior. A system’s structure emerges locally; it is not designed or planned. Components of a network work together, interacting and influencing each other, driving the network’s evolution.
For years, researchers have been trying to determine if different networks from different disciplines have hidden core structures -- backbones -- and, if so, what they look like. Extracting meaningful structural features from data is one of the most challenging tasks in network theory.
Brockmann and two of his graduate students, Christian Thiemann and first author Daniel Grady, developed a method to identify a network’s hidden core structure and showed that the skeletons possess some underlying and universal features.
The networks they studied differed in size (from hundreds of nodes to thousands) and in connectivity (some were sparsely connected, others dense) but a simple and similar core skeleton was found in each one.
“The key to our approach was asking what network elements are important from each node’s perspective,” Brockmann said. “What links are most important to each node, and what is the consensus among nodes? Interestingly, we found that an unexpected degree of consensus exists among all nodes in a network. Nodes either agree that a link is important or they agree that it isn’t. There is nearly no disagreement.”
By computing this consensus -- the overall strength, or importance, of each link in the network -- the researchers were able to produce a skeleton for each network consisting of all those links that every node considers important. And these skeletons are similar across networks.
Because of this “consensus” property, the researchers’ method does not have the drawbacks of other methods, which have degrees of arbitrariness in them and depend on parameters. The Northwestern approach is very robust and identifies essential hubs and links in a non-arbitrary universal way.
The Volkswagen Foundation supported the research.

Journal Reference:

Daniel Grady, Christian Thiemann, Dirk Brockmann. Robust classification of salient links in complex networks.Nature Communications, 2012; 3: 864 DOI:10.1038/ncomms1847

Tuesday, 9 April 2013

Idea networking

Idea networking is a qualitative means of undertaking a cluster analysis or concept mapping of any collection of statements. Networking lists of statements acts to reduce them into a handful of clusters or categories. The statements might be source from interviews, text, web sites, focus groups, SWOT analysis or community consultation. Idea networking is inductive as it does not assume any prior classification system to cluster the statements. Rather keywords or issues in the statements are individually linked (paired). These links can then be entered into network software to be displayed as a network with clusters. When named, these clusters provide emergent categories, meta themes, frames or concepts which represent, structure or sense-make the collection of statements.

An idea network can be constructed in the following way:

60 to 200 statements are listed and assigned reference numbers.
A table is constructed showing which statements (by reference number) are linked (paired) and why. For example statement 1 maybe linked to statements 4, 23, 45, 67, 89 and 107 because they all are about the weather (see table).

Statement	Is Linked To	Because They Are About
1	4, 23, 45, 67, 89, 107	Weather
2	16, 29, 46, 81	Light
3	23, 45, 67, 89, 107	Interaction
4	13, 16, 34, 78, 81	Weather
Etc.	Etc.	Etc.

The number of links per statement should be from 1 to 7; many more will result in a congested network diagram. This means choosing why the statements are linked may need grading as strong or weak, or by sub sets. For example statements linked as being about weather conditions may be further subdivided into those about good weather, wet weather or bad weather etc.…). This linking is sometimes called ’coding’ in thematic analysis which highlights that the statements can be linked for several and different reasons (source, context, time, etc.). There maybe many tens of reasons why statements are linked. The same statements may be linked for different reasons. The number of reasons should not be restricted to low number as so anticipate the resultant clustering.

The reference numbers are input to a network diagramming software, usually in the form of a matrix with the reference numbers along the top and side of the matrix. Each cell will then have a 1 or 0 to indicate whether its row and column reference number are linked.
The software is instructed to draw network diagram using maximum node repulsion. This encourages cluster formation. Around 5 clusters are identified in the network diagram, both visually and using the cluster identification algorithms supplied with the software (e.g. Newnan Girvan sub-groups)

A descriptive collective adjective name is determined for each cluster of statements (a meta narrative, classification name or label).
The list of statements is then reported as being clustered into these five or so cluster names (themes, frames, concepts). For example, one might report that your analysis of the statements shows that those at community meeting were using the concepts of exposure, interaction, safety, light and inspiration in their responses.

Underlying philosophy

In his book Notes on the Synthesis of Form, the pragmatist Christopher Alexander suggested networking the ideas of clients as means to identifying the major facets of an architectural design. This is still used modern design work usually using cluster analysis. Modern social network analysis software provides a useful tool for how these ideas can be networked. This simply adds ideas to the list of computers, power stations, people and events that can be networked (see Network theory) The links between ideas can be represented in a matrix or network. Modern network diagramming software, with node repulsion algorithms, allows useful visual representation of these networks revealing clusters of nodes.
When networking peoples' statements or ideas, these become the nodes and the links are provided by an analyst linking those statements thought to be similar. Keywords, synonyms, experience or context might be used to provide this linking. For example, the statements (1) That war is economics progressed by other means, might be considered linked to the statement (2) That progress unfortunately needs the innovation which is a consequence of human conflict.
Linguistic pragmatism argues we use our conceptions to interpret our perceptions (sensory inputs).^[5] These conceptions might be represented by words as conceptual ideas or concepts. For example, if we use the conceptual idea or concepts of justice to interpret the actions of people, we get a different interpretation (or meaning) compared to using the conceptual idea of personal power. Using the conceptual idea of justice makes certain action ideas seem reasonable. These may include due process, legal representation, hearing both sides, have norms or regulations for comparison. Therefore there is a relationship between conceptual ideas and related apparently rational action ideas.
If the statements gathered at a consultative meeting are considered action ideas, then clusters of these similar actions ideas might be considered to examples of a meta idea or conceptual idea. These are also called themes, and frames. Modern research extending Miller’s Magic number7 plus or minus 2, to idea handling, suggests a five-part classification is appropriate for humans.

Notable applications and uses

Using networking to cluster statements is considered useful because:

It provides a multi-dimensional alternative to post-it notes in clusters YouTube example.
It offers a convenient graphic which can be presented in reports and analysed using network metrics (See Computer assisted qualitative data analysis software).
It is an auditable process where each step taken can be explained in supporting documentation.
It is a qualitative alternative, and thus more subtle and transparent, than NVivo, Thematic Analysis, Cluster analysis, Factor analysis, Multidimensional scaling or Principle component analysis. This subtleness includes enabling the analyst to deal with metaphor, synonyms, pronouns and alternative terminology generally. No variables (variation in numerical data) are necessary

References

Alexander, C. (1964). Notes On The Synthesis of Form.. Harvard University Press, Mass
Metcalfe, M. (2007). Problem Conceptualisation Using Idea Networks. Systematic Practice and Action Research. pp. 141-150.
Alexander, C. (c. 1964). Notes On The Synthesis of Form. Harvard University Press, Mass.
Inkpen, A.C., E.W.K. Tsang. (2005). Social Capital, Networks, And Knowledge Transfer. Academy Of Management Review. pp. 146–165.
Rorty, R. (1982). Consequences of Pragmatism. Minneapolis: University of Minnesota Press.
Miller, G.A. (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review. pp. 81–97.

Monday, 8 April 2013

Analyzing financial interactions/markets as a complex network

Since most of the posts till now have been the characteristic trademark of the so-called 'techies' , I thought I will combine a new field with the analysis of complex networks . Financial interactions/markets in the world sure are a source of arousing curiosity of most of the tech-school students , though ephemerally .So here it goes .....

Economic research on network formation usually starts from the observation that social structure is important in a variety of interactions, including the transmission of information and the buying and selling of goods and services. Over the past decade, economists have started to investigate the interactions between network topologies and game-theoretic notions of stability and efficiency using the tools and jargon of graph theory.

It is noteworthy that in a recent survey of economic models of network formation, the term degree distribution does not appear once, and it seems that the economics profession is only very recently becoming aware of the research by statistical physicists and computer scientists. As a consequence, it becomes very hard to evaluate whether the network structures emerging from various processes are efficient with respect to either the involved microscopic motives, or to the resulting and possibly unintended macroscopic regularities, or both .Interestingly, the particular economic (game-theoretic) setup does not seem to be crucial for such a trade-off between efficiency and stability. Chemical engineers have argued that generic network topologies, including polar cases like ‘stars,’‘circles,’ and ‘hubs,’ can be understood as the outcome of adaptive evolutionary processes that optimize the trade-off between efficiency and robustness in uncertain environments .The efficiency and robustness results were extended to show how power-law, exponential, and Poisson degree distributions can be understood as manifestations of the trade-off between efficiency and robustness within a unified theoretical framework based on the maximum entropy principle .Although degree distributions, small world effects, clustering measures, or other concepts from the statistical physics community have been ignored to a large extent by economists, I take the position that economically or behaviorally plausible mechanisms in the generation and evolution of empirically relevant network

structures should play an important role.

Financial time series are characterized by a number of statistical regularities in both their unconditional and conditional properties that are often termed ‘stylized facts.’ One property is the scaling law observed for large returns r .
P(|r|>x)∼pow(x,−α) ,
where α denotes the tail index .The ubiquitous nature of this power law decay has been documented in numerous financial data, covering different markets (stock markets, foreign exchange markets, commodity markets), different frequencies (from minute-to-minute to monthly data), and different countries. Moreover,empirical estimates lead to a universal value of α≈3, with a small interval of statistical variability between 2.5 and 4.

Taking into account the long tradition in the analysis of complex systems in statistical physics, scaling laws suggest to look at financial data as the result of social processes of a large ensemble of interacting sub-units.

I start from the observation that multi-fractal models (i.e. models with a hierarchy of volatility components with different stochastic life times) describe the stylized facts of financial returns very well, and combine this with our aforementioned insight that the very large number of traders (called clients subsequently) in financial markets should be grouped into K structures, where K is moderate relative to the total number of clients. The multi-fractal model corresponds to a tree graph with a moderate K compared to the total number of agents .In social nets the average geodesic path length seems to be universal andrelatively small (the so-called “small world effect”). In our nets this distance is related to K , which determines the tail index. Thus I expect to explain the universality of the latter by that of K.The prediction of a hyperbolic decay of correlation in the multi-fractal model is only valid in a time window. Since this window changes according to with the time resolution, I hope to explain the transition from a power law for high frequency data to the complicated behavior of daily returns.The improvement of time behavior by including features of the multi-fractal model should lead to a substantially improved predictive power.

It might help to understand whether the intermittent character of financial returns as a signature of the ‘critical’ nature of market fluctuations serves to balance a lack of efficiency with the robustness to absorb large exogenous shocks (like 9/11) along the lines of the above literature. From an interdisciplinary point of view, such insights should be applicable as well to other phenomena that are characterized by qualitative differences among nodes in a network, like supply chains in industrial organization, or food webs in biological systems.

References :

(1)Complex Networks in Finance: Nature Physics Journal on Financial Complexity : The Institute for New Economic Thinking

(2)V. Bala and S. Goyal. A non-cooperative model of network formation.Econometrica, 68:1181–1230,2000 .

(3)E. Egenter, T. Lux, and D. Stauffer. Finite-size effects in Monte Carlo simulations of two stock market models.Physica A, 268:250–256, 1999

(4)M.I. Jordan.Learning in Graphical Models. MIT Press, Cambridge, 1998.

(5)T. Lux. Turbulence in financial markets: The surprising explanatory power of simple cascade models. Quantitative Finance, 1:632–640, 2005.

The application of graph theoretical analysis to complex networks in the brain

Introduction

Traditionally, neuroscientists correlate ‘focal’ brain lesions, for instance brain tumors, with ‘focal’ clinical deficits. This approach gave important insights into the localization of brain functions; a classical example is the identiﬁcation of the motor speech center in the lower left frontal cortex by the French neurologist Paul Broca at the end of the 19th century. Particularly during the last decades of the 20th century, this essentially reductionistic program led to signiﬁcant progress in neuroscience in terms of molecular and genetic mechanisms . Despite the impressive increase of knowledge in neuroscience, however, progress in true understanding of higher level brain processes has been disappointing. Evidence has accumulated that functional networks throughout the brain are necessary, particularly for higher cognitive functions such as memory, planning, and abstract reasoning. It is more and more acknowledged that the brain should be conceived as a complex network of dynamical systems, consisting of numerous functional interactions between closely related as well as more remote brain areas (Varela et al., 2001).

Evaluation of the strength and temporal and spatial patterns of interactions in the brain and the characteristics of the underlying functional and anatomical networks may contribute substantially to the understanding of brain function and dysfunction. A major advantage of this approach is that a lot can be learned from other ﬁelds of science, particularly the social sciences, that are also devoted to the study of complex systems. In the last decade of the 20th century, considerable progress has been made in the study of complex systems consisting of large numbers of weakly interacting elements. The modern theory of networks, which is derived from graph theory, has proven to be particularly valuable for this purpose (Amaral and Ottino, 2004; Boccaletti et al., 2006).

Historical background

The modern theory of networks has its roots both in mathematics and sociology. In 1736 the mathematician Leonard Euler solved the problem of ‘the bridges of Konigsberg’. This problem involved the question whether it was possible to make a walk crossing exactly one time each of the seven bridges connecting the two islands in the river Pregel and its shores. Euler proved that this is not possible by representing the problem as an abstract network: a ‘graph’. This is often considered as the ﬁrst proof in graph theory. Since then, graph theory has become an important ﬁeld within mathematics, and the only available tool to handle network properties theoretically.

An important step forward occurred when ‘random’ graphs were discovered (Solomonov and Rapoport, 1951; Erdos and Renyi, 1960). In random graphs, connections between the network nodes are present with a ﬁxed and equal likelihood. Many important theories have been proven for random graphs. In particular it has been shown that properties of the graphs often undergo a sudden transition (‘phase transition’) as a function of increasing likelihood that edges are present. However, despite the success of random graph theory, most real-world networks, particularly those found in social or biological systems, have properties that cannot be explained very well by classic random graphs. These properties include the high clustering and power law degree distributions. One empirically observed phenomenon in many real networks is the fact that the ‘distances’ in sparsely and mainly locally connected networks are often much smaller than expected theoretically. This phenomenon was probably ﬁrst observed by the Hungarian writer Frigyes Karinthy in a short story called ‘Chains’. In this story he speculates that in the modern world the distance between any two persons is unlikely to be more than ﬁve persons, a phenomenon later studied and described in more detail by Stanley Milgram, and referred to as the ‘small world phenomenon’, or ‘six degrees of separation’ (Milgram, 1967).

Recent advances

The publication of a landmark paper in 1998 by Watts and Strogatz (Watts and Strogatz, 1998) provided a simple and elegant way of modeling small world networks. These authors proposed a very simple model of a one-dimensional network. Initially each node (‘vertex’) in the network is only connected to its ‘k’ nearest neighbors (k is called the degree of the network), representing a so-called ‘regular’ network. Next, with likelihood ‘p’, connections (‘edges’) are chosen at random and connected to another vertex, also chosen randomly. With increasing p, more and more edges become randomly re-connected and ﬁnally, for p = 1, the network is completely random (Fig. 1). Thus, this simple model allows the investigation of the whole range from regular to random networks, including an intermediate range.

The intermediate range proved to be crucial to the solution of the small world phenomenon. In order to show this, the authors introduced two measures: the clustering coeﬃ- cient ‘C’, which is the likelihood that neighbors of a vertex will also be connected, and the path length ‘L’ which is the average of the shortest distance between pairs of vertices counted in number of edges. Watts and Strogatz showed that regular networks have a high C but also a very high L. In contrast, random networks have a low C and a low L. So, neither regular nor random networks explain the small world phenomenon. However, when p is only slightly higher than 0 (with very few edges randomly rewired) the path length L drops sharply, while C hardly changes (Fig. 2). Thus networks with a small fraction of randomly rewired connections combine both high clustering and a small path length, and this is exactly the small world phenomenon to be explained. The authors demonstrated the existence of such small world networks in the nervous system of Caenorhabditis elegans, a social network of actors, and the network of power plants in the United States. Furthermore, they showed that a small world architecture might facilitate the spread of infection or information in networks (Watts and Strogatz, 1998).

A second major discovery was presented in 1999 by Baraba´si and Albert (Barabasi and Albert, 1999). They proposed a model for the growth of a network where the likelihood that newly added edges connect to a vertex depends upon the degree of this vertex. Thus, vertices that have a high degree (large number of edges) are more likely

to get even more edges. This is the network equivalent of ‘the rich getting richer’. Networks generated in this way are characterized by a degree distribution which can be described by a power law:

Networks with a power law degree distribution are called ‘scale free’. It has been shown that many social and technological networks, such as for instance collaborative networks of scientists, the World Wide Web, and networks of airports, are likely to be scale free (Newman, 2003).

Basics of modern network theory

The discovery of small world networks and scale free networks set oﬀ a large body of theoretical and experimental research, which has led to increasing knowledge on various aspects of network properties in the last decade. Before we move on to the application of network theories to experimental neural networks, and healthy and diseased brain, we will provide some basic knowledge on several aspects of network properties. As mentioned before, more detailed mathematical descriptions can be found in Albert and Baraba´si (Albert and Baraba´si, 2002) and Stam and Reijneveld (Stam and Reijneveld, 2007).

Core measures

The degree distribution, clustering coeﬃcient, and path length are the core measures of graphs. The degree distribution can be described as the likelihood p(k) that a randomly chosen vertex will have degree k. The clustering coeﬃcient C is an index of local structure. It is the likelihood that neighbors of a vertex will also be connected to each other, and have been interpreted as a measure of resilience to random error (if a vertex is lost, its neighbours remain connected). The path length L is a global characteristic; it indicates how well integrated a graph is, and how easy it is to transport information or other entities in the network (Fig. 3). On the basis of the abovementioned three measures, four diﬀerent types of graphs can be distinguished: (i) regular or ordered; (ii) small world; (iii) random (see Fig. 1); (iv) scale free. We should stress that neither regular, small world, nor random networks are scale free. Scale free networks can have very small path lengths of the order of lnln(N), but the clustering coeﬃcient may also be smaller than that of small world networks (Cohen and Havlin, 2003).

References

1.Achard S, Bullmore E. Eﬃciency and cost of economical brain functional networks. PLoS Comput Biol 2007;3:e17.

2.Achard S, Salvador R, Whitcher B, Suckling J, Bullmore E. A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. J Neurosci 2006;26:63–72.

3.Aertsen AM, Gerstein GL, Habib MK, Palm G. Dynamics of neuronal ﬁring correlation: modulation of ‘‘eﬀective connectivity. J Neurophysiol 1989;61:900–17.

4.Albert R, Baraba´si AL. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47–97.

5.Amaral LAN, Ottino JM. Complex networks; augmenting the framework for the study of complex systems. Eur Phys J B 2004;38:147–62.

6.Artzy-Randrup Y, Fleishman SJ, Ben Tal N, Stone L. Comment on Network motifs: simple building blocks of complex networks’’ and Superfamilies of evolved and designed networks. Science 2004;305:1107.

7.Astolﬁ L, De Vico Fallani F, Cincotti F, Mattia D, Marciani MG, Bufalari S, et al. Imaging functional brain connectivity patterns from high-resolution EEG and fMRI via graph theory. Psychophysiology 2007;44:10.1111.

Increasing the Complications

Lot has been studied for the past few months in complex networks. Some non-trivial topological features in real time networks such as citation network, movie actor network, web graph and many more. All these graphs were having some common features such as they were are simple, undirected, and un-weighted graphs. Network topological features have been studied on lots of other types of graphs too. (I am not saying that the course was less so please don't increase the syllabus). Some other interesting graphs which show some unique behavior or characteristic properties worth studying can be weighted graphs, directed graphs, eulerian graph, hamiltonian graph.

Considering weighted graphs, they are not that complicated if seen superficially but they can be a useful representation to some complicated real-time networks which are not covered by simple graphs or are partially covered in terms of information and thus are unable to study some vital results that are seen in the real-time systems.

Let’s take an example. The movie actor graph can be further made weighted with the weights showing the number of times he/she was on screen or the time for which he/she was on screen. This graph will give a better representation to the actual weightage of an actor in the movie rather than giving equal weight to all the actors (lead/subordinate/extras).

Another example can be Airport Network (Fig 1). This graph in a simple way can represent all the connected airports. However this graph can be extended to weighted graph in order to represent the total number of seats available between two airports in all the flights. The extended information can be well used to study traffic over two places rather than just studying the connectivity of places in case of un-weighted graphs.

Fig1. The Airport Network (edge weight denote no. of seats (million/year)

However this new graph will also need some new definitions to the old terms or some new terms may even be coined in.

To study this new attribute, weight of edge, along with the old studies on the undirected graph a new definition coined in the paper by A. Barrat is of vertex strength. Vertex strength (s_i) can aptly cover the degree distribution along with weight of edge. It is defined as summation all the weights of the edges of a vertex. In the case of the Airport Network the vertex strength simply accounts for the total traffic handled by each airport. This certainly is more useful measure rather than just having connectivity of airports.

Here a_ij denote the presence of edge and w_ij denote the weight on the edge.

The strength also gives the degree centrality of the vertex since the degree centrality is not the degree anymore. It also has to cover for the weights on the edges.

Another type of graph can be directed graphs. Here one can think of some unbalanced relations of undirected graphs that have been studied. In those graphs a directed edge will provide a better and accurate view.

One example here that fits here is the citation graph (Fig 2). The graph of research papers with each out-edge denoting a citation of source node. Here again it would be interesting to observe the clusters where the group of papers are cited by most other papers. A more important set of paper can be easily pointed out according to the number of papers to which a certain paper has been cited.

Fig 2. Citation graph (Research papers from the field of biology)

One simplification of directed graph for certain calculation is breaking each vertex into a pair of vertices, one denoted as in-vertex and the other as the out-vertex. The in-vertex shall have edges that were ending into the original vertex. Similarly out-vertex shall have edges that were originating i.e. going out of the original vertex. All the edges in this graph are undirected. This was quite a useful representation for graph attributes. More can be done to find out whether this is useful for complex networks attributes or not.

Lot more is there to explore in complex networks w.r.t. the types of graphs, properties, interpretations etc. A couple of them I tried to present here. The more we discover, the more it is going to be interesting. Though this blog did not cover much but it was just intended to give a brief overview (some of my views) on these topics.

Thank you.

REFERENCES:

The architecture of complex weighted networks A. Barrat, M. Barthélemy, R. Pastor-Satorras, A. Vespignani
Albert, R. & Barabási, A.-L. (2002) Rev. Mod. Phys. 74, 47-97.
Pastor-Satorras, R. & Vespignani, A. (2001) Phys. Rev. Lett. 86, 3200-3203.
Callaway, D. S., Newman, M. E. J., Strogatz, S. H. & Watts, D. J. (2000) Phys. Rev. Lett. 85, 5468-5471.

Saturday, 6 April 2013

Connecting the Dots: Linked Data

Not so long ago, Tim Berners-Lee came up with the notion of a World Wide Web revolutionizing the way people exchange information and documents. We now live in an economy where data plays a central role in decision making. As the implications of utilizing large data sets in research and enterprises are being realized, there is also a certain degree of underlying frustration when it comes to acquiring quality data sets. The Internet has proved to be a phenomenal source of information. However, most of it is unstructured and scattered. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards - the Web of Data.

Partial Snapshot of the Linked Data Graph

The difference between the current web and a web of data is best explained with an example. Nowadays a typical online store would consist of pages describing different products. Such a page contains all the information a human reader requires. But this information is represented in a way that makes automatic processing of it hard. In a web of data, the online store would actually publish data resources to the web, which represent the products. These resources can be retrieved in different representations, such that a browser would allow you to view a product page while a web crawler would get a machine understandable representation of the product.

Now comes the part which makes it even more exciting for research. Consider a case of a researcher developing a drug to treat Alzheimer’s. Suppose, she wants to find the proteins which are involved in signal transduction AND are related to pyramidal neurons. (The question probably makes sense to her).

Searching for the same on Google returns about 2,240,000 results not one of which leads to an answer. Why? Because no one has ever had that idea before! There exists no single webpage on the web with the result. Querying the same on the Linked healthcare database pulls in data from two distinct data sets and produces 32 hits, EACH of which is a protein which has those properties.

The vision of Linked Data is the liberation of raw knowledge and making it (literally) accessible to the world. Indeed, a Web of Ideas.

Let us now take a deeper look into the principles involved and how the same can be applied in a Complex Network domain. Broadly, it entails four basic principles:

Use URIs as names for things. These may include tangible things such as people, places and cars, or those that are more abstract, such as the relationship type of knowing somebody, the set of all green cars in the world, or the color green itself.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
Include links to other URIs, so that they can discover more things. For example, a hyperlink of the type friend of may be set between two people, or a hyperlink of the type based near may be set between a person and a place.

The power here lies in the links. There are three important types of RDF links:

Relationship Links : point at related things in other data sources
Identity Links point at URI aliases used by other data sources to identify the same real-world object or abstract concept
Vocabulary Links point from data to the definitions of the vocabulary terms that are used to represent the data

Having a good idea of the basic concepts, we are now ready to see how we can harness Linked Data for research involving Complex Networks.

As I see it, there are two interesting areas here:

As described in this old article, Linked Data is growing at an exponential rate since its modest beginnings back in 2007. Over the years, different data sets have started linking to the global database and a clear emergence of genres can be seen. The Linked Data graph may offer a good opportunity to study the temporal behavior of the largest structured knowledge database ever witnessed by the world.

Then again, as mentioned before, the data offered itself has huge potential in terms of network research. A good question therefore is how to get our hands on this data.

One approach is to use existing systems to get data. Linked data is accessible through browsers like the Disco Hyperdata browser, the Tabulator browser or the more recent LinkSailor. We can also make use of search engines such as Sig.ma, Falcons and SWSE.

For example, research on Citation Analysis may be supplemented with data available on authors (example) or subject areas (example). Linguistic research stands to gain from the relationships relating words and objects with actions, events, facts, etc. (example, example).

As with all generic applications however, the problem is that deployed applications offer little control over how the data is accessed. For specific research, therefore, it is best to develop applications and crawlers from scratch. This website would be a good place to start. It explains in detail about RDF and architectures involving linked data applications.

To conclude, Linked Data provides a more generic, more flexible publishing paradigm which makes it easier for data consumers to discover and integrate data from a large numbers of data sources. Though, still in its infancy period, it has come a long way since its inception. Coupled with the onset of an Internet of Things, Linked Datasets will encompass the physical world, identifying social relationships, behavioral patterns and events. What we do with this information is entirely up to us.

USEFUL LINKS

International Workshop on Consuming Linked Data

Regarding indexing of text and metadata information, the Solr system:

http://lucene.apache.org/solr
http://www.packtpub.com/apache-solr-4-cookbook/book

Regarding navigation and search interfaces:
http://searchuserinterfaces.com/book/sui_ch8_navigation_and_search.html

REFERENCES:

Linked data on the web (LDOW2008), Christian Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee
http://www.w3.org/standards/semanticweb/data
http://linkeddata.org/
The Next Web, a TED talk by Tim Berners Lee (http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html)
Linked Data, Structured Data on the Web (D. Wood, M. Zaidman, L. Ruth)
http://www.web2society.com/webtrends/already-linked/
http://en.wikipedia.org/wiki/Linked_data
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
http://linkeddatabook.com/editions/1.0/