A New Network Core Mining Method bad on Node Core Influence Degree
WAN Guofeng
School of Software DaLian JiaoTong University LiaoNing, DaLian, China
Tian Hong
School of Software DaLian JiaoTong University LiaoNing, DaLian, China
Abstract—A major problem in Social Network Analys is and Link Dis covery is the dis covery of core nodes. This paper provides a community core dis covery method bas ed on node core influence degree. This method defines a new standard for mea uring the centrality of the node. Through the node coverage, influence level and differences from the other nodes, we measure the core nodes in the three aspects.
We applied this method to a random generation s et of undirected graph with weight. Compared with the results from degree centrality, the res ults s how the higher accuracy and reasonable in this method.
Keywords-Social Network Analysis; Centrality analysis; Centre Coverage Factor; Difference Factor; Core Influence Degree
I.Introduction
Magnanimous data are being exchanged everyday in the current network (For example: emails, blogs, and forums etc.). We witnesd the Enron Corporation Scandal, the 9/11 Terriorist Attack, the London subway station explosion, and J uly 5th Violent Crimes in Urumqi, China. It is urgent to know how to analyze different internet organizations along with its internal relations from the data and information collected from Internet.
For Internet information, the mutual connection among people reflects what the status and the role someone is in an organization. It is especially critical to find the core members in the organization. Since the actions core members take will affect the trend of the entire organization. Take the Terrorist Attacks as an example, the core members were people who planned this attack and decided the time, the place, and the way. Therefore, eking for and analyzing the core members in an organizations is likely to stop and prevent crimes efficiently.
From the 9/11 Terriorist Attack, people began to arch for the core members from the social networ
k. Centrality Analysis [1] in the SNA (Social Network Analysis for short) is an important personal network structural position pointer. Evaluation of a person important or not, the measure of his/her duties or privileges of the position of superiority, as well as social prestige, etc., we could u it. J itesh Shetty and J afar Adibi ud the Graph Entropy of an important figure in Enron, they have got the good results [2]. Ryan Rowe and Shlomo Hershkop, who designed the Automated Social Hierarchy Detection of the E-mail Network analysis, the network-level visual arch in order to achieve the purpo of the core members [3]; Professor Tang Changjie from Sichuan University in China propod shortest path algorithm SPLINE [4] bad on Six Degrees of Separation [5,6], and on this basis, the u of KMM [7] algorithm tap the core of criminal networks.
We propod a new network core mining method on the basis of the central analytical method in this paper .It is bad on the core influence degree of the node discovery method,using the methods for analyzing and mining communities in the central figure in the network.
II.SNA Synopsis
SNA emphasizes interpersonal relationships, relation intension and using social network structure to explain social phenomena [8].We u “Net” metaphor to explain the relationship between people's lif
e and interaction.Social network is a group of people forming a unique relationship, and it’s the link to pass the material, information, ideas, feelings and other resources.
In SNA, actors as nodes, connections between actors expresd as lines, and it forms a network. There are Clique Analysis, Centrality Analysis and Role Analysis in SNA.
A.Clique Analysis
Subgroup (or cliques) [9] is a small group of people who relations are particularly clo, so that combined into a sub-group. Cliques can be likened to one by one faction, which is an overall index of a network structure, and it also is a particularly uful concept in the rearch organization.短发美女图片
The main difficulty to analyze the subgroup with node degree of organization lies in the choice of the number of nodes: The smaller the number is, the more relaxed conditions are, the more fuzzy groups we will get. The bigger the number is, the stricter the condition is. We will have more possibility to obtain the wrong association classification.
The other kind of method is taking the distance as the foundation to compute clique, such as n-clique, n-clan, n-club described in literature [1]. This kind of method takes two node's relation distances as the measurement standard, to determine which nodes should be classified as a clique.
There is the similar problem in dividing the association with the distance as the standard as well as with the node degree: that is the defined of distance threshold value. Therefore, whether a new standard in this method can be
added to divide the association will be an important job. B. Centrality Analysis
The centricity is the important individual structure position indicator in the network, which is ud to apprai a person to be important or not, judge the superiority or privilege of his/her post position and society reputation. The centricity has three forms: degree centrality, cloness centrality, betweenness centrality.
Degree centrality is to calculate a person’s most important individual pointer in a group network structure [1]. And it’s also the most common method we ud to measure who is the most important core member in this group. Such a person, in the sociological n, is the most social status; in the organizational behavior n, it is the most powerful people. The one who has a high centricity in this group also
has a main position. ()
1i D d n C g
reprents the degree centrality. ()i d n reprents the connection number of this node and the others.1g indicates the largest possible relationship number of the nodes in the network society.
C. Role Analysis
In the network, role is very important, becau the person with different roles will produce different positions in the network. And in great network there are hundreds of thousands nodes, if building a network diagram according to their relationship, we can not understand clearly, it is difficult to study. The role of analysis is to unite the people who play the same or clo roles in the network as a class, analyze the relationship between a group of people and another group of people.刘家拳
III. The Community Core Node Discovery Method Bad
on Node Core Influence Degree The most difficult of discovering "core node" in large-scale network data is that the "core nodes" are just a small part in all nodes. One approach is to filter out all of the normal nodes, remaining "core nodes". Another approach is the method propod in this paper. Obviously, the "core node" has the most influence in the whole network.
A. Related Definitions
We need to consider the number and relation of actors in community network, therefore this method in this paper
propos using undirected graph with weight {,,}G A E W in the network society. Where, A indicates the t of nodes in graph G ; E indicates
the t of edges between nodes in graph G ; W reprents the weight of each edge.
Definition 1 all the nodes and node i A compo i A ’s structure. It denotes ()i A O . ()i A O reprents the number of nodes in i A ’s structure.
Definition 2 The number of the contacts between i A
and all the nodes in i A structure is the degree of node
i A , recorded as ()i d A .
Definition 3 Centre Coverage of node:
()1()(()1)
()()(1)
(1)
(1)
A d A A d A i i i i C A i n n n O O
(1) n is the total number of nodes in network community;
formula
()(1)
d A i n reprents th
e degree centrality; formula
()1(1)
A i n O reprents the percentage of the contacts of i A
and the outside in the whole network. As in (1), ()i C A describes a node in the network community in the depth of coverage and influence.
It is noteworthy that the Centre coverage can describe a node in the network community through the depth of coverage and influence, but it’s hasn’t a range. For different communities with different size, its range varies considerably.
Definition 4 For the [()]i Max C A , when 1
10l d
[()]10l
i Max C A d , Centre Coverage Factor defined as
()
()10
C A i A i l V
(2) Obviously, ()[0,1]i A V . It is one of the indicators to measures the core nodes.
We u cosine similarity to calculate the similarity between node x and y . As in cos(,)x y
x y x y
, “ ” reprents dot product of vectors, 1n
k k k x y x y
, x is the length of vector x, x . It’s clearly that cos(,)x y is a number between [0,1]. Definition 5 Neighborhood Function of node i A , it
reprents ()i
avd A . As in (3), cos(,)
1
()()
n
A A i j j avd A i A i O ¦
(3)
n is the total number of nodes in the network society.
Definition 6 Difference Factor ()i A T reprents the difference between node i A and the other nodes in network community.
()1()A avd A i i T (4)
cos(,)[0,1]i j A A , obviously, ()[0,1]i avd A . In general, the difference between the core members and ordinary members is more than the difference between ordinary members. What’s more, ()i A T is one of the indicators to measures the core nodes.
Definition 7 Core Influence Degree, it reflects not only the Centre Coverage but also the difference.
()0.7()0.3()I A A A i i i V T (5)
As in (5), ()i I A reprents
the Core Influence Degree. The Centre Coverage Factor measures two aspects of the factor, the Difference Factor measures one aspect of the factor, so the weights of them are 0.7 and 0.3. We can e ()[0,1]i I A .
B. Algorithm Description
We must find the relation and the weight between nodes firstly, then calculating the Centre Coverage Factor and the Difference Factor in network community parately. Of cour, the node which has the greatest Core Influence Degree is core node.
We put the node which has MAX Core Influence Degree into the t of core nodes. If this node contacts with more than 90% nodes in network community, the t of core nodes has only one core node; otherwi, we continue to put the node into the t of core nodes which has cond largest Core Influence Degree and then repeat this process. Until the core node in the t of core nodes can
be linked 90% of the network nodes.
The specific description of the algorithm as follows: 1) Calculate ()i C A and ()i A V ; 2) Calculate ()i avd A and ()i A T ; 3) Calculate ()i I A ;
4) Put the node with MAX(()i I A ) into t of core
nodes, then call this node core node;
5) Calculate code number which can be linked of the
network nodes by core node. If more than 90%, continue to 6); el, then delete core node from network community and continue to 4);
6) Finally, the nodes contained by t of core nodes
are core nodes in network community.
This algorithm consists of calculating nodes Center Coverage Factor, Difference Factor and arching the core nodes in network community.
C. Simulation Experiment
Recently, computer-generated network diagram has become a standard test network. We u this method in a random generation t of undirected graph with weight. “Fig. 1” is a random generation network built by UCINET. It simulates the situation of communication between 20 persons.
Figure 1. random generation network
As in Fig. 2, Fig. 3, we can e ()i d A and ()i A O of
Figure 2. ()d A i of each node
Figure 3. ()A i O of each node
Fig .4 shows ()A i T ǃ()A i V ǃ()I A i of each node in this
Figure 4. ()A i
T ǃ()A i V ǃ()I A i of each node
From the result of ()A i T ǃ()A i V ǃ()I A i , Top 5 of
()A i T ǃ()A i V ǃ()I A i can be en in Table I:
Table I.
Top 5 of ()A i
T ǃ()A i V ǃ()I A i
()A i T ()A i V
()I A i
1 A12
A3
A3
云峰山风景区
2 A
3 A17 A17 3 A7
A5
A5
4 A6 A1
5 A15 5 A11
A9 A19
Finally, the core nodes are 3A and 17A , what’s more,
the nodes are contact all nodes in network community. Node 3A and 17A as the center of the network shown in
Fig .5:
Figure 5.
Network Coverage graph of火锅鱼做法
3A ,17A
D. Result analysis
There is no practical situation to compare with the mining result from a random generation t, we can’t explain this method is good. So we need classical algorithm to prove it.
We ud Degree Centrality command of UCINET to analyze the 20 nodes. Table II shows the result:
Table II. Result of Degree Centrality Analysis
Degree NrmDegree Share
A3189 21.165 0.088 A17123 13.774 0.057 A15121 13.55 0.056 A5
117 13.102 0.054 A12117 13.102 0.054 A9113 12.654 0.053 A19113 12.654 0.053 A16110 12.318 0.
051 A13109 12.206 0.051 A2106 11.87 0.049 A7105 11.758 0.049 A4105 11.758 0.049 A1104 11.646 0.048 A6102 11.422 0.047 A20102 11.422 0.047 A14102 11.422 0.047 A1086 9.63 0.04 A1184 9.406 0.039 A1882 9.183 0.038 A8
60 6.719 0.028
3rd nodes and the 4th nodes' order are exchanged, and the 5th nodes are different. The reason is the degree centrality method does not take into account coverage and difference.
12()12A O , it’s the least. But in Degree Centrality Analysis,12A ranked at the 5th place. It shows that degree of node is not the only criterion to determine the core nodes, network coverage and nodes difference are also standards. IV. Conclusion
SNA methods applied to the network information discovery, greatly increasing the efficiency of information mining. Internet community is growing fast, affecting a wide range of organizations. Discover and pay attention to a core member of a network society is an important way to understand the nature of the society. The contribution of this paper is:
l. It measured the core nodes from three aspects: coverage, affect depth and difference.
非正常情况2. It controlled the range of node core influence degree in [0,1], which could applied to all the networks.
There are veral lines of ongoing and future works, such as, the optimization for the time complexity of the method. When analyze the relationships of the nodes in the network with each other, while analyze the content of the linked mantic. In this way, it made the core of discovery method has higher accuracy and better convince.
References
[1]Liu J un, “An Introduction to Social Network Analysis,” Beijing:
Social Sciences Academic Press, 2004, pp. 160-163, pp. 116, pp.
羽毛球男子双打183-247.
[2]J itesh Shetty, J afar Adibi, “Discovering Important Nodes through
Graph Entropy The Ca of Enron Email Databa,” International
Conference on Knowledge Discovery and Data Mining. Chicago,
Illinois, pp. 74-81,2005.
小学二年级
[3]Ryan Rowe, German Creamer, Shlomo Hershkop, Salvatore J Stolfo,
“Automated social hierarchy detection through email network analysis,” International Conference on Knowledge Discovery and
Data Mining. San Jo, California, pp. 109-117, 2007.
[4]Wen Fenlian, Tang Changjie, Qiao Shaojie, Xu Gang, Liu Wei, Liu
Qihong, “The Shortest Path Approach for Mining the Core of
monitored consortium,” [EB/OL]. Sciencepaper online, www./p rocess/download.jsp?file=200607-42.
[5] A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B.
Bhattacharjee, “Measurement and analysis of online social networks,”
Proceedings of the 7th ACM SIGCOMM conference on Internet
measurement, pp. 29–42, 2007.
[6]Debra Swanson, “Six Degrees of Separation,” The Scientist, vol. 14,
no. 7, pp. 32, 2000
[7]Tang Changjie, Liu Wei, Wen Fenlian, Qiao Shaojie, “Three probes
into the social network analysis and consortium information mining-
mining the structure, core and communication behavior of virtual
consortium,” Computer Applications, vol. 26, no. 9, pp. 2020-2023,
2006.
我是一条鱼
[8]Luo J ar-Der, “Social Network Analysis,” Beijing: Social Sciences
Academic Press, 2005, pp. 5-8
[9]Doreian, P. “A note on the Detection of Cliques in V alued Graphs,”
Sociometry, vol. 32, pp. 237-242, 1969.