Six Degrees - CNT 2013 at CSE, IITKgp: February 2013

Thursday, 28 February 2013

Complex brain networks

Graph theoretical analysis of structural and functional systems

Recent developments in the quantitative analysis of complex networks, based largely on graph theory, have been rapidly translated to studies of brain network organization. The brain’s structural and functional systems have features of complex networks — such as small-world topology, highly connected hubs and modularity — both at the whole-brain scale of human neuroimaging and at a cellular scale in non-human animals. In this article, we review studies investigating complex brain networks in diverse experimental modalities. We also highlight some of the technical challenges and key questions to be addressed by future developments in this rapidly moving field.

Structural and functional brain networks can be explored using graph theory through the following four steps (see the figure):

• Define the network nodes. These could be defined as electroencephalography or multielectrode array electrodes, or as anatomically defined regions of histological, MRI or diffusion tensor imaging data.

• Estimate a continuous measure of association between nodes. This could be the spectral coherence or Granger causality measures between two magnetoencephalography sensors, or the connection probability between two regions of an individual diffusion tensor imaging data set, or the inter-regional correlations in cortical thickness or volume MRI measurements estimated in groups of subjects.

• Generate an association matrix by compiling all pairwise associations between nodes and (usually) apply a threshold to each element of this matrix to produce a binary adjacency matrix or undirected graph.

• Calculate the network parameters of interest in this graphical model of a brain network and compare them to the equivalent parameters of a population of random networks.

Figure 2 | cellular and whole-brain networks demonstrate consistent topological features. The top panel shows a cellular functional network constructed from multielectrode- array recordings made in the anaesthetized cat; each node (represented by a circle) corresponds approximately to one neuron and the connections represent high functional connectivity between neurons. The different coloured nodes constitute separate clusters or modules. The plots in each circle illustrate cellular responses to stimuli of different orientations, and the circle size corresponds to the degree (number of functional connections) of each node. The bottom panel shows a whole-brain structural network constructed from histological data on the macaque cortex; each node corresponds to a brain area and the connections represent axonal projections between areas. The network has two main modules, shown here with yellow and grey circles corresponding to mostly dorsal and ventral visual regions, respectively. Both networks exhibit the small-world attributes of high clustering and short path length; both have an exponentially truncated power law degree distribution, associated with the existence of high-degree ‘hubs’ (V4 in the anatomical network); and both have a community structure characterized by sparse connectivity between modules (each module is enclosed by stippled lines) and linked by hubs (nodes circled in red). AITv, anterior inferotemporal ventral area; CITd, central inferotemporal dorsal area; CITv, central inferotemporal ventral area; DP, dorsal preluneate area; FEF, frontal eye field; FST, floor of superior temporal area; LIP, lateral intraparietal area; MT, middle temporal area; PIP, posterior intraparietal area; PITd, posterior inferotemporal dorsal area; PITv, posterior inferotemporal ventral area; PO, parieto-occipital area; TF, area TF; TH, area TH ;V1–4, visual cortical areas 1–4; VIP, ventral intraparietal area; VOT, ventral

occipitotemporal area; VP, ventral posterior area.

Figure 3 | Disease-related disorganization of brain anatomical networks derived from structural Mri data. In both parts, the nodes (circles) represent cortical regions and the connections represent high correlation in grey matter density between nodes. The nodes are arranged vertically by degree and are separated horizontally for clarity of representation. The numbers indicate approximate Brodmann area, and the prime symbols (Œ) denote left-sided regions. The clustering coefficient of each node, a measure of its local connectivity, is indicated by its size: nodes with high clustering are larger.

a | The brain anatomical network of the healthy volunteers has a hierarchical organization characterized by low clustering of high-degree nodes.

b | The equivalent network constructed from MRI data on people with schizophrenia shows loss of this hierarchical organization . high-degree nodes are more often highly clustered.

Conclusions:

It is clear that certain aspects of the organization of complex brain networks are highly conserved over different scales and types of measurement, across different species and for functional and anatomical networks. The archetypal brain network has a short path length (associated with high global efficiency of information transfer), high clustering (associated with robustness to random error), a degree distribution compatible with the existence of hubs, and a modular community structure. Furthermore, anatomical networks are sparsely connected, especially between nodes in different modules, and the ‘wiring length’ (the physical distance that connections span) is close to minimal. This profile of topological and geometric properties is typical not just of brain networks but also of many other complex networks, including transport

systems and intracellular signalling pathways.

References

Hagmann, P. et al. Mapping the structural core of human cerebral cortex. PLoS Biol. 6, e159 (2008). This paper demonstrated the existence of modules, hubs and a structural core in the human anatomical network derived from DTI.
Achard, S., Salvador, R., Whitcher, B., Suckling, J. & Bullmore, E. T. A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs.J. eurosci. 26, 63–72 (2006).
Yu, S., Huang, D., Singer, W. & Nikolic, D. A small world of neuronal synchrony. Cereb. Cortex 18, 2891–2901 (2008). This paper was one of the first to apply graph theoretical techniques to map the topology of functionally characterized cortical neuronal circuits.
Sporns, O. & Kötter, R. Motifs in brain networks. PLoS Biol. 2, 1910–1918 (2004).
Bassett, D. S. et al. Hierarchical organization ofhuman cortical networks in health and schizophrenia. J. Neurosci. 28, 9239–9248 (2008).

Wednesday, 27 February 2013

Complex Network Application in Music: Uncovering Universal Properties in Favourable Human Perception of Music

Introduction:
Across cultures and between individuals ,certain musical pieces are consistently rated more favorably than the others and a mathematical analysis of musical perception has a long history. But recently,a data-driven transformation to represent a musical score as a complex network has been tried by network scientist. What has been found is that those musical scores which are widely perceived to be 'good' generate complex network with certain invariant properties: scale-free networks with strong clustering of nodes within the network.They have also tried to generate random musical compositions from these networks and surprisingly found that scores generated in this manner are also perceived to be 'good' and are qualitatively similar to the specific score from which the generating network was produced.

Section of Mozart's Sonata

The network generated from entire Mozart's Sonata
Building the Network:
A musical score , or even the performance of a particular musician can be represented as a MIDI(musical instrument digital interface) file. The MIDI file encodes a musical composition as a series of events ,where each event includes information describing both the pitch and timing of a note.If a MIDI file is transcribed from a musical score, this information will be precise; if the information is derived from an actual performance there will be more variability. The network is then constructed the following way: Each MIDI event (a single musical note) corresponds to a node in the complex network. Two nodes in the network are linked by an edge if they succeed one another in the musical score. Weight is ascribed to a given edge based on the relative frequency with which the two nodes are adjacent. Direction is based on temporal order.

This transformation was applied to a wide variety of musical composition including the sonatas of Bach and Mozart, Bach's "Well-Tempered Clavier", Chopin's waltzes, Russian folk music and Cantonese pop. In all cases the networks generated by this algorithm exhibit a scale-free distribution (using the maximum likelihood procedure at the 95% confidence interval) on the node degree (that is, the probability of a node having degree k, p(k) = k-gamma ) with degree exponent gamma falling in a narrow range 1 < gamma < 1.7. As a natural consequence of variability in a human performance it was observed that for MIDI files derived from actual rendition, the clustering within the network was much lower (clustering coefficient 0.127 +/- 0.069) in comparison to MIDI derived from musical score (0.37 +/- 0.056). Nonetheless, in all cases the scale exponent falls within a fairly narrow range. Notably, the scale exponents for Russian folk music (1.18) and Cantonese pop (1.01) are significantly lower than those derived from classical music (1.29 ~ 1.67).

Figure on the right: Degree distribution for the same Mozart's sonata network (number of links k versus frequency p(k).

Discussions:
These complex networks therefore encapsulate some features of the underlying musical scores used to generate them. These networks were then used to generate new artificial scores. A random initial node on the network was chosen and randomly followed one of the links from that node to another based on the relative weight of the links. This procedure was repeated and the sequence of nodes that were traversed were recorded. The corresponding notes (both pitch and duration) gave a random score. If one was to then generate a new network from that score it would (asymptotically) be equivalent to the original. The surprising and intriguing consequence of this procedure was that the random scores were both pleasing and qualitatively similar to the original score. That is, random scores generated from the network derived from Chopin's sonatas also sound like Chopin, those derived from the Cantonese pop network also sound like Cantopop . Of course, these scores lacks the large scale structure of the original . But, this does demonstrate that the complex network encapsulates enough information to quantify basic features of a particular composition or style. As with actual composition, large scale structure can be imposed aposteriori and selection from between many candidate random scores can be used to produce "optimal" compositions.

References:

1. X. Liu, C.K. Tse and M. Small, "Composing music with complex networks," International Conference on Complex Sciences: Theory and Applications, (COMPLEX2009), Shanghai, pp. 2196-2205, February 2009.
2. X. Liu, C.K. Tse and M. Small, "Complex network structure of musical compositions: Algorithmic generation of appealing music," Physica A, vol. 389, no. 1, pp. 126-132, January 2010.

Urban Road Usage Patterns

Introduction

In an era of unprecedented global urbanization, society faces a rapidly accelerating demand for mobility, placing immense pressure on urban road networks. This demand manifests in the form of severe traffic congestion, which decreases the roads’ level of service, while at the same time increasing both fuel consumption and traffic-related air pollution.

In this blog post, we look at a methodology, which employs comprehensive mobile phone data to detect patterns of road usage and the origins of the drivers. Thus, providing a basis for better informed transportation planning, including targeted strategies to mitigate congestion. We formalize the problem by counting the observed number of individuals moving from one location to another, which we put forward as the transient origin destination (t-OD) matrix.

To study the distribution of travel demands over a day we divide it into four periods (Morning: 6 am–10 am, Noon & Afternoon: 10 am–4 pm, Evening: 4 pm–8 pm, Night: 8 pm–6 am) and cumulate trips over the total observational period. A trip is defined when the same mobile phone user is observed in two distinct zones within one hour (zones are defined by 892 towers’ service areas in the San Francisco Bay Area and by 750 census tracts in the Boston Area). In the mobile phone data, a user’s location information is lost when he/she does not use his/her phone, but by defining the transient origin and destination with movements within one hour, we can capture the distribution of travel demands. Specifically we calculate the t-OD as:

where A is the number of zones. W is the one-hour total trip production in the studied urban area, a number readily available for most cities. However this number gives no information about the trip distribution between zones, which we can enhance by the information gained via mobile phones. Directly from the mobile phone data we calculate T_ij(n), which is the total number of trips that user nmade between zone i and zone j during the three weeks of study. Via calibrating T_ij(n) for the total population we obtain:

, where N_k is the number of users in zone k. The ratio M scales the trips generated by mobile phone users in each zone to the trips generated by the total population living there: M(k) = N_pop(k)/N_user(k), where N_pop(k) and N_user(k) are the population and the number of mobile phone users in zone k. Furthermore to assign only the fraction of the trips attributed to vehicles, we correct F^all_ij by the vehicle usage rate, which is a given constant for each zone and therefore obtain F^vehicle_ij .

For each mobile phone user that generated the t-OD, we can additionally locate the zone where he or she lives, which we define as the driver source. Connecting t-ODs with driver sources allows us for the first time to take advantage of mobile phone data sets in order to understand urban road usage. In the following, we present the analysis of the road usage characterization in the morning period as a case study.

Results

A road network is defined by the links representing the road segments and the nodes representing the intersections. Using incremental traffic assignment, each trip in the t-OD matrix is assigned to the road network, providing us with estimated traffic flows (Fig.1(a)). The road network in the Bay Area serves a considerable larger number of vehicles per hour (0.73 million) than the one in the Boston (0.54 million). The traffic flow distribution P(V) in each area can be well approximated as the sum of two exponential functions corresponding to two different characteristic volumes of vehicles (Fig. 1a); one is the average traffic flow in their arterial roads (v_A) and the other is the average traffic flow in their highways (v_H). We measure

(R²>0.99) with v_A = 373 (236) vehicles/hour for arterials and v_H = 1,493 (689) vehicles/hour for highways in the Bay Area (Boston numbers within parenthesis, p_A and p_H are the fraction of arterial roads and the fraction of highways). Both road networks have similar number of arterials (~20,000), but the Bay Area with more than double the number of highways than Boston (3,141 highways vs 1,267 in Boston) still receives the double of the average flow in the highways (v_H) and a larger average flow in the arterial roads.

Figure 1: Distributions of traffic flow, betweenness centrality and VOC in the two urban areas.

(a) The one-hour traffic flow V follows a mixed exponential distribution

for both Bay Area and Boston Area, where constants p_A and p_Hare the fraction of arterial roads and the fraction of highways, v_A and v_H is the average traffic flow for arterial roads and highways respectively. (b) The distribution of road segment’s betweenness centrality b_c is well approximated by

, where the power-law distribution approximates arterial roads’ b_c distribution and the exponential distribution approximates highways’ b_c distribution. β_Hdenotes the average b_c of highways and α_A is the scaling exponent for the power-law. (c) The volume over capacity VOC follows an exponential distribution P(VOC) = γe^−VOC/γ with an average VOC = 0.28 for the two areas. Traffic flows in most road segments are well under their designed capacities, whereas a small number of congested segments are detected.

The volume of vehicles served by a road depends on two aspects: the first is the functionality of the road according to its ability to be a connector based on its location in the road network (i.e. betweenness centrality) and the second is the inherent travel demand of the travelers in the city. The betweenness centrality b_c of a road segment is proportional to the number of shortest paths between all pairs of nodes passing through it: we measured b_c by averaging over each pair of nodes, and following the shortest time to destination. The two road networks, analyzed here, have completely different shapes: the Bay Area is more elongated and connects two sides of a bay, while the Boston Area follows a circular shape (Fig. 2a). But both have a similar function in the distribution of b_c: with a broad term corresponding to the arterial roads and an exponential term to the highways, which is at the tail of larger b_c. As Fig. 1b shows, we measure: P(b_c) =p_AP_A(b_c) + p_HP_H(b_c) (R²>0.99), with

for arterial roads and

for highways. The highways in the Bay Area have an average b_c of β_H = 2.6 × 10⁻⁴, whereas a largerβ_H = 4.6 × 10⁻⁴ is found for the Boston Area highways, indicating their different topological structures. Interestingly, despite, the different topologies of the two road networks, the similar shapes of their distribution of traffic flows indicate an inherent mechanism in how people are selecting their routes.

Figure 2: Tracing driver sources via the road usage network.

(a) The colour of a road segment represents its degree K_road. Most residential roads are found to have small K_road, whereas the backbone highways and the downtown arterial roads are shown to have largeK_road. The light blue polygons and the light orange polygons pinpoint the MDS for Hickey Blvd and E Hamilton Ave respectively. The white lines show the links that connect the selected road segment and its MDS. The two road segments have a similar traffic flow V ~400 (vehicles/hour), however, Hickey Blvd only has 12 MDS located nearby, whereas E Hamilton Ave has 51 MDS, not only located in the vicinity ofCampbell City, but also located in a few distant regions pinpointed by our methodology. (b) The degree distribution of driver sources can be approximated by a normal distribution

with µ_source = 1,035.9 (1,017.7), σ_source = 792.2 (512.3), R² = 0.78 (0.91) for Bay Area (Boston Area). (c) The degree distribution of road segments is approximated by a log-normal distribution

with µ_road = 3.71 (3.36) , σ_road = 0.82 (0.72), R² = 0.98 (0.99) for Bay Area (Boston Area).

Notice that only when the traffic flow is greater than a road’s available capacity, the road is congested; the ratio of these two quantities is called Volume over Capacity (VOC) and defines the level of service of a road. Surprisingly, despite the different values in average flows v and average betweeness centrality β, we find the same distribution of VOC (Fig. 1c) in the two metropolitan areas, which follows an exponential distribution with an average VOC given by γ = 0.28 (R²>0.98):

The exponential decay of VOC indicates that for both road networks traffic flows on 98% of the road segments are well below their designed road capacities, whereas a few road segments suffer from congestion, having a VOC > 1. The similarity between the two VOC distributions shows that in both urbanities drivers experience the same level of service, due to utilizing the existing capacities in the similar way.

The traditional difficulty in gathering ODs at large scales has until now limited the comparison of roads in regard to their attractiveness for different driver sources. To capture the massive sources of daily road usage, for each road segment with V > 0, we calculate the fraction of traffic flow generated by each driver source, and rank these sources by their contribution to the traffic flow. Consequently, we define a road segment's major driver sources (MDS) as the top ranked sources that produce 80% of its traffic flow. We next define a bipartite network, which we call the network of road usage, formed by the edges connecting each road segment to their MDS. Hence, the degree of a driver source K_source is the number of road segments for which the driver source is a MDS, and the degree of a road segment K_road is the number of MDS that produce the vehicle flow in this road segment. As Fig. 2b shows, the driver source's degree K_source is normally distributed, centered in < K_source > ~1000 in both Bay Area and Boston Area, implying that drivers from each driver source use a similar number of road segments. In contrast, the road segment's degree K_roadfollows a log-normal distribution (Fig. 2c), where most of the road segments have a degree centered in < K_road > ~20. This indicates that the major usage of a road segment can be linked to surprisingly few driver sources. Indeed, only 6–7% of road segments are in the tail of the log-normal linked to a larger number of MDS, ranging from 100 to 300.

In Fig. 2a we show a road segment's degree K_road in the road network maps of the Bay Area and the Boston Area. Since census tracts and mobile phone towers are designed to serve similar number of population , a road segment's degree K_road quantifies the diversity of the drivers using it. We find that K_road is lowly correlated with traditional measures, such as traffic flow,VOC and betweenness centrality b_c . For example, in Fig. 2a, Hickey Blvd in Daly City and E Hamilton Ave in Campbell City have a similar traffic flow V~400 (vehicles/hour), however, their degrees in the network of road usage are rather different. Hickey Blvd, only has K_road = 12, with MDS distributed nearby, whereas E Hamilton Ave, has K_road = 51, with MDS distributed not only in its vicinity, but also in some distant areas as Palo Alto, Santa Cruz, Ben Lomond and Morgan Hill.

As Fig. 2a shows, the road segments in the tail of the log-normal (K_road > 100) highlight both the highways and the major business districts in both regions. This again implies that K_road can characterize a road segment's role in a transportation network associated with the usage diversity. To better characterize a road's functionality, we classify roads in four groups according to their b_cand K_road in the transportation network (Fig. 3). We define the connectors, as the road segments with the largest 25% of b_c and the attractors as the road segments with the largest 25% of K_road. The other two groups define the highways in the periphery, or peripheral connectors, and the majority of the roads are called local, which have both small b_c and K_road (Fig. 3). By combining b_c and K_road, a new quality in the understanding of urban road usage patterns can be achieved. Future models of distributed flows in urban road networks will benefit by incorporating those ubiquitous usage patterns.

Figure 3: Types of roads defined by b_c and K_road.

The road segments are grouped by their betweenness centrality b_c and degree K_road. The red lines (connectors) represent the road segments with the top 25% of b_c and K_road; they are topologically important and diversely used by drivers. The green lines (peripheral connectors) represent the road segments in the top 25% of b_c, but with low values of K_road; they are topologically important, but less diversely used. The road segments in yellow are those with low values of b_c, but within the top 25% K_road; they behave as attractors to drivers from many sources (attractors). The road segments in grey have the low values of b_c and K_road, they are not topologically important and locally used (locals).

Discussion

Today, as cities are growing at an unparalleled pace, particularly in Asia, South America and Africa, the power of this modeling framework is its ability to dynamically capture the massive sources of daily road usage based solely on mobile phone data and road network data, both of which are readily available in most cities. The values of K_road and b_c together determine a road's functionality. We find that the major traffic flows in congested roads are created by very few driver sources, which can be addressed by our finding that the major usage of most road segments can be linked to their own surprisingly few driver sources. This shows that the representation provided by the network of road usage is very powerful to create new applications, enabling cities to tailor targeted strategies to reduce the average daily travel time compared to a benchmark strategy.

References

1. Batty, M. The size, scale, and shape of cities. Science 319, 769–771 (2008).

2. Barthélemy, M. Spatial networks. Physics Reports 499, 1–101 (2011).

3. Schrank, D. & Lomax, T. Annual urban mobility report (Texas Transportation Institute, 2009).

4. Helbing, D. A section-based queueing-theoretical traffic model for congestion and travel time analysis in networks. J. Phys. A: Math. Gen. 36, 593–598 (2003).

5. Chin, A. T. H. Containing air pollution and traffic congestion: transport policy and the environment in Singapore. Atmospheric Environment 30(5), 787–801 (1996).

6. Shen, W. & Wynter, L. Real-time traffic prediction using GPS data with low sampling rates: a hybrid approach. IBM Research Report RC25230 (2011).

7. Barthélemy, M. & Flammini, A. Modeling urban streets patterns. Phys Rev Lett 100, 138702(2008).

8. González, M. C., Hidalgo, C. A. & Barabási, A.-L. Understanding individual human mobility patterns. Nature 435, 779–782 (2008).