Motifs : fingerprints of graph
What are motifs ?
All networks, including biological networks (e.g., metabolic networks, transcription regulatory networks, protein-protein interaction networks, protein structure networks, neural networks, ecological networks), social networks, technological networks (e.g., computer networks, electrical circuits), etc., can be represented as graphs, which include a wide variety of sub-graphs. One important local property of networks are so-called Network Motifs, which are defined as recurrent and statistically significant sub-graphs or patterns. Motifs, sub-graphs that repeat themselves in a specific network or even among various networks, would be consistent with the tenets of evolutionary theory.
- Subgraphs occurring significantly more often in real networks than in random networks.
- Used to uncover structural design principles of complex networks.
- Motif signatures from different domains may serve as the fingerprint of the respective domains.
- First investigated in computational biology.
- Now widely used metric in social network
Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs.
Three Major step in Motif Detection : - Determining sub-graph count.
- Grouping topologicaly equivalent (Isomorphic) sub-graph into classes.
- Determining which sub-graph classes are displayed at much higher frequency than in random networks.
Motif Determination :
There are some statistical measures that lead us to the probable motifs in the input network.
Frequency : This is the simplest measurement for estimating the significance of a motif. For a given network, assume that Gp is a representative of an isomorphism class involved in that class. The frequency is defined as the number of occurrence Gp in the input network.
Zscore : This measure reflects how randomly the class occurred in the input network. For the assumed motif Gp, this measure is defined as :
where Np is the number which Gp occurred in the input network, is the mean number which Gp occurred in random networks and σ is the standard deviation. The larger Zscore, the more significant is the motif.
Pvalue : This measure indicates the number of random networks in which a motif Gp occurred more often than in the input network, divided by the total number of random networks. Therefore, Pvalue ranges from 0 to 1. The smaller the Pvalue, the more significant is the motif.
There are no exact thresholds for these measures to distinguish a motif, and the more restricted thresholds the more precise is the motif. But according to the experimental results by Milo (Milo et al., 2002), the following conditions may be used to describe a network motif:
Challenges in Motif detection :
Motifs Detection Algorithms :Various solutions have been proposed for the challenging problem of motif discovery. These algorithms can be classified under various paradigms such as exact counting methods,sampling methods, pattern growth methods and so on. Here, a review on computational aspects of major algorithms :
There are some statistical measures that lead us to the probable motifs in the input network.
Frequency : This is the simplest measurement for estimating the significance of a motif. For a given network, assume that Gp is a representative of an isomorphism class involved in that class. The frequency is defined as the number of occurrence Gp in the input network.
Zscore : This measure reflects how randomly the class occurred in the input network. For the assumed motif Gp, this measure is defined as :
where Np is the number which Gp occurred in the input network, is the mean number which Gp occurred in random networks and σ is the standard deviation. The larger Zscore, the more significant is the motif.
Pvalue : This measure indicates the number of random networks in which a motif Gp occurred more often than in the input network, divided by the total number of random networks. Therefore, Pvalue ranges from 0 to 1. The smaller the Pvalue, the more significant is the motif.
There are no exact thresholds for these measures to distinguish a motif, and the more restricted thresholds the more precise is the motif. But according to the experimental results by Milo (Milo et al., 2002), the following conditions may be used to describe a network motif:
- The frequency is larger than four.
- By using 1000 randomized network, the Zscore is larger than one.
- By using 1000 randomized network, the Pvalue is smaller than 0.01.
Challenges in Motif detection :
- Computational Complexity for enumerating all sub-graphs.
- Using suitable sampling method to sample sub-graphs (if sampling is used).
- Choosing appropriate random graph model for comparison.
Motifs Detection Algorithms :Various solutions have been proposed for the challenging problem of motif discovery. These algorithms can be classified under various paradigms such as exact counting methods,sampling methods, pattern growth methods and so on. Here, a review on computational aspects of major algorithms :
mfinder :
mfinder, the first motif-mining tool, implements two kinds of motif finding algorithms: a full enumeration and a sampling method. Until 2004, the only exact counting method for NM detection was the brute-force one proposed by Milo. This algorithm was successful for
discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible.
discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible.
FANMOD :
The sampling bias of Kashtan provided great impetus for designing better algorithms for the NM discovery problem. Although Kashtan tried to settle this drawback by means of a weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated implementation. This tool is one of the most useful ones, as it supports visual
options and also is an efficient algorithm with respect to time. But, it has a limitation on motif size as it does not allow searching for motifs of size 9 or higher because of the way the tool is implemented.
The sampling bias of Kashtan provided great impetus for designing better algorithms for the NM discovery problem. Although Kashtan tried to settle this drawback by means of a weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated implementation. This tool is one of the most useful ones, as it supports visual
options and also is an efficient algorithm with respect to time. But, it has a limitation on motif size as it does not allow searching for motifs of size 9 or higher because of the way the tool is implemented.
KAVOSH :
A recently introduced algorithm named Kavosh aims at improved main memory usage. Kavosh is usable to detect NM in both directed and undirected networks. The main idea of the enumeration is to first find all k-size sub-graphs that a particular node participated in, then remove the node, and subsequently repeat this process for the remaining nodes.
REFERENCES :
(1) wikipedia.
(2) Zahra RM Kashani,Hayedeh Ahrabian,Elahe Elahi,Abbas Nowzari-Dalini, Elnaz S Ansari,Sahar Asadi,Shahin Mohammadi,Falk Schreiber,Ali Masoudi-Nejad Kavosh: a new algorithm for finding network motifs.
No comments:
Post a Comment