La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Partagez cette publication

Physica A 387 (2008) 675–684
www.elsevier.com/locate/physa
Empirical analysis of online social networks in the age of Web 2.0
Abstract
Feng Fu, Lianghuan Liu, Long Wang
Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, China Department of Industrial Engineering and Management, College of Engineering, Peking University, Beijing 100871, China
Received 17 July 2007; received in revised form 30 September 2007 Available online 9 October 2007
Today the World Wide Web is undergoing a subtle but profound shift to Web 2.0, to become more of a social web. The use of collaborative technologies such as blogs and social networking site (SNS) leads to instant online community in which people communicate rapidly and conveniently with each other. Moreover, there are growing interest and concern regarding the topological structure of these new online social networks. In this paper, we present empirical analysis of statistical properties of two important Chinese online social networks—a blogging network and an SNS open to college students. They are both emerging in the age of Web 2.0. We demonstrate that both networks possess smallworld and scalefree features already observed in realworld and artificial networks. In addition, we investigate the distribution of topological distance. Furthermore, we study the correlations between degree (in/out) and degree (in/out), clustering coefficient and degree, popularity (in terms of number of page views) and indegree (for the blogging network), respectively. We find that the blogging network shows disassortative mixing pattern, whereas the SNS network is an assortative one. Our research may help us to elucidate the selforganizing structural characteristics of these online social networks embedded in technical forms. c2007 Elsevier B.V. All rights reserved.
PACS:89.75.Hc; 89.75.Fb; 89.65.s
Keywords:Social networks; Blogging networks; Social Networking Site; Topological analysis; (dis)Assortativity
1. Introduction
What a networking life! The development of social collaborative technologies, such as blogs, Wiki, and social networking sites (SNS), results in an extraordinarily fast growing online virtual community, in which people communicate, share information, and keep in touch with each other. Indeed, the concept of Web 2.0, characterized by blogs, SNS, and Wiki, is very popular. Today the World Wide Web (WWW) is undergoing a subtle but profound shift to Web 2.0, becoming more of a social web. In comparison with WWW one decade ago, individuals (grassroots) play more crucial role in the evolution of web today, that is, the active participation of individuals leads to a more diverse online world, regardless of their different colors, beliefs, and countries. Especially, these Web 2.0 sites provide an “Eden” for free minds and ideas to trigger sparks of inspiration. Recently, numerous websites designed in the spirit
Corresponding address: Intelligent Control Laboratory, Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, China. Fax: +86 10 62754388. Email addresses:fufeng@pku.edu.cn(F. Fu),longwang@pku.edu.cn(L. Wang).
03784371/$  see front matter c2007 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2007.10.006
676
F. Fu et al. / Physica A 387 (2008) 675–684
of Web 2.0 emerge like mushrooms after rain. For example, the SNS Facebook is quite popular among American college students. Noteworthy, these online social networks in the age of Web 2.0 provide an opportunity to analyze how trends, ideas and information travel through social communities. Thus empirical analysis of topological structure of the mentioned online social networks will be a reasonable first step to understand the various dynamic processes on top of them, since it is believed that network’s topology significantly affects the dynamics taking place on it. Let us briefly introduce the blog and SNS. Blog, which is short for “web log”, has gained its ground by online community as a new mechanism for communication in recent years [1]. It is often a personal journal maintained on the Web, which is easily and frequently updated by the blogger. In the past few years, blogs are the fastest growing part of the WWW. Social networking site provides service for messaging, sharing information, and communicating [2]. More importantly, it is a flexible and convenient platform for individuals to form and maintain online friendships. SNS sometimes takes advantage of the advancement of understanding of complex networks (see Refs. [3–5] and a recent review [6]), e.g., smallworld effect (“six degree of seperation”) [7], promoting the increase and consolidation of online acquaintanceships. It also in return enhances the popularity of some social networking sites. In blogging networks, the vertices represent the bloggers, and the favorite links pointing from one blog to another denote the directed edges. In SNS, the online users are connected to one another based upon bilateral agreement. Thus this network is a bidirectional one (for simplicity, we view it as undirected one). Interestingly, in blogosphere and SNS, users can conveniently create and join communities or groups who share the same interests and activities. In general, blogging networks and SNS can be viewed as representative online social networks in the age of Web 2.0. Moreover, to our best knowledge, empirical analysis of the structure of online social networks in the age of Web 2.0 is rarely considered in previous investigations. Furthermore, blogs and SNS, as emerging social media, will no doubt grow in importance and popularity. Thus it is meaningful and interesting to scrutinize the structural characteristics of these online social networks in Web 2.0 age. Such study will help us to understand the topological structure of the new fashioned online social networks. Furthermore, it will aid web designers shape the social networking services in a more friendly and functional way. In this paper, we will focus on two Chinese online social networks, emerging in the age of Web 2.0, namely, Sina blogs and Xiaonei SNS [8]. The former is the largest Chinese blog space provider and has more than 2 million registered users in the mainland of China. The latter is the largest and most popular social networking service provider in China. It is only open to college students. A detailed empirical analysis of the statistical properties of these two networks is presented. It is found that both the networks have smallworld and scalefree features already observed in realworld and artificial networks. Further, the correlations between degree (in/out) and degree (in/out), clustering coefficient and degree, popularity (in terms of number of page views) and indegree (for the blogging network), etc. are examined. It is also shown that the blogging network is a disassortative one, whereas the Xiaonei network an assortative one. The rest of this paper is organized as follows. Section2will first introduce the data sets of these two social networks. And then empirical analysis of topological structure of the Sina blogging network and Xiaonei network is successively described in Section3. Some discussions to the results are also provided in the end of this section. Finally, the conclusion remarks and future works are drawn in Section4. 2. Data sets The blogging network is abstracted from Sina blog, a Chinese blog space provider. Such network is composed of blogs (vertices) and favorite links directing from one blog to another (directed edges). We focus our eyesight in this subcommunity of global blogosphere, and thus omit the links out of this community. Sina blogging network was obtained in May 2006, which was crawled down by our designed robot along the directed links. This connected network consists of 200,292 nodes and 901,607 edges. Besides, the number of page views of each blog was also collected. It can be used as an appropriate measure of the blog popularity. The Xiaonei network is obtained from a Chinese social networking site, which is open to college students. Each registered user of Xiaonei has a profile, including her friend list. We are interested in friendships between users within Xiaonei. The friendship is constructed by bilateral agreement. Thus Xiaonei network is bidirectional one (we viewed this network as undirected one). This undirected network is composed of 396,836 nodes and 7,097,144 edges. In what follows, we will present in detail structural analysis of these two online social networks. Some statistical properties, including degree distribution, average shortest path length, as well as degree–degree correlation will be used to reveal the connection characteristics of online social networks in the age of Web 2.0.
F. Fu et al. / Physica A 387 (2008) 675–684
677
τ in Fig. 1. The log–log plot of cumulative indegree distributionP(>k)which follows a powerlaw form askwithτ=1.34. The straight in in in line is linear fit of the distribution in log–log scale.
Table 1 Percentage of blogs with null, 1, 2 and 3 in and outdegrees k=0 1 2 3 In 0 48.4% 18.1% 9.8% Out 32.6% 14.2% 9.7% 7.5% Note that a large fraction of blogs have only small in and outdegrees. Since our blogging network was crawled along the directed links, the indegree of blogs is at least 1.
3. Topological analysis of online social networks
3.1. Structure of blogging network
The connection pattern of Sina blogs is studied by degree distribution. Since the nature of Sina blogging network is directed, the connectivity of each blog has incoming and outgoing connections, namely,kinandkoutrespectively. Noticeably, the indegree could be used as an index of importance of the blogs. InFig. 1, we report the cumulative distribution of indegreeP(>kin), which gives the probability that randomly selected node has more thankinincoming τin links. Clearly,P(>kin)well obeys a powerlaw form askwithτin=1.34±0.001. One can immediately obtain in γin the indegree distributionP(kin)kwithγin=τin+1=2.34±0.001. The cumulative distribution of outdegree in τout wit P(>kout)is shown inFig. 2, where exceptP(>kout)has a flat head, it follows a powerlaw form askouth γout τout=2.60±0.02 for large outdegrees. Similarly, one can easily derive the outdegree distributionP(kout)askout withγout=τout+1=3.60±0.02. Accordingly, our directed blogging network is scalefree, being consistent with previous empirical investigations on realworld and artificial networks. Indeed, power laws are ubiquitous in socio economic systems. Our findings to a certain extent supplement the existent knowledge about complex networks. It further demonstrates that our living world is a heterogeneous one, rather than homogeneous one. Although this scale free feature is not surprising, we will show that our blogging network has some other nontrivial characteristics. The average degreehkiof such blogging network is 9.0, that is to say, for each node in such social networks has an average of 9 neighbors (including incoming and outgoing neighbors). Furthermore, the average in and outdegrees hkini = hkouti =4.5. In our collected population of interconnected blogs, the maximum indegree is 13,341, whereas a majority of blogs just have a few incoming links (seeTable 1). The powerlaw distribution of indegree indicates that many common bloggers preferentially add links to their favorite celebrities’ blogs and such preferential behavior results in the powerlaw distribution just as the Baraba´si–Albert model describes [9]. It is found that a significant fraction of the blogs, that is 32.6%, have no outgoing links to other blogs. Further, considerable fraction of blogs have only a few outgoing connections as shown inTable 1. That is to say, most of the bloggers are unknown to
678
F. Fu et al. / Physica A 387 (2008) 675–684
τout Fig. 2. The log–log plot of cumulative outdegree distributionP(>kout)which has a powerlaw tail askwithτout=2.6. The straight line out has slope2.60 for comparison with the distribution. The inset shows the linear fit to the right skew tail of distribution for large outdegrees.
Table 2 Correlation coefficients for the degrees at either side of an edge r r r r r inin inout outin outout 0.4970.035 0.0410.034 0.113 Negative figures indicate that poorly connected nodes tend to link to highly connected nodes while positive values suggest that nodes with even connectivity are likely to connect to each other.
public (have small indegrees) and they are not active enough in the blogosphere (have small or null outgoing links). As aforementioned, indegree could be adopted as an indication of the blog importance (influenceability), while the number of a blog’s page views also presents its influenceability. Intuitively one could think that large number of page views, which means wide spread of the blogger’s posts, results from large indegree, and vice versa. We will show that in our blogging network, the degreedependent page view exhibits positive correlation with the indegree. Thus in this sense, the indegree is equivalent to the number of page views. Although there are millions of directed connections present in such social network, only about 28.7% of them are symmetric edges (a pair of nodes are connected to each other by two arcs) and most of the symmetric links are between the blogs of bloggers who have been already acquainted with each other in the blogosphere. Therefore this suggests that such blogging network is a fairly asymmetric one—while a blogger tends to link to a famous one, it is seldom the case that the celebrated would link to this blog either. To characterize this asymmetric feature in a more quantitative way, the degree correlation between each side of an edge, i.e. assortativity, provides an insight into the characteristic of local organization of such social networks.Table 2displays the correlation coefficients of different types of degree–degree correlations for the crawled down blogging network. Correlations are measured by the Pearson’s correlation coefficientrfor the degrees at either side of an edge as suggested by Mark Newman [10]: hktokfromi − hktoihkfromi r=q q,(1) 2 2 2 2 hk ktoi hki − h toi − hfromkfromi wherekto,kfromcould be four possible combinations of in and outdegrees of a directed edge. Networks with assortative mixing pattern are those in which nodes with large degree tend to be connected to other nodes with many connections and vice versa. Technical and biological networks are in general disassortative, while social networks are often assortatively mixed as demonstrated by the study on scientific collaboration networks [10]. Blogging network, however, presents disassortative mixing pattern when directions are not considered. Positive mixing rin our case. Positivermeans active bloggers in the community (have largek) is shown forinoutandroutout inout out
F. Fu et al. / Physica A 387 (2008) 675–684
Fig. 3. The plot of shortest path length distribution.
679
tend to associate with those who succeed in promoting themselves in the community (have highkin), while a large rsuggests that the outoutactive bloggers preferentially link to each other. Internet dating community, a kind of social network embedded in a technical one, and peertopeer (P2P) social networks are similar to our case, displaying a significant disassortative mixing pattern [11,12]. The length of average shortest pathhliis calculated, which is the mean of geodesic distance between any pairs that have at least a path (directed chain) connecting them. In this case,hli =6.84. That means on an average one only needs to click 7 times to pass from one blog site to any other in the blogosphere. And the diameterDof this social network which is defined as the maximum of the shortest path length, is 27. We also plot the distribution of shortest path length inFig. 3. Apparently, most of the distances are 6 or 7. Thus the famous phenomenon of “six degrees of separation” is present in spite of the huge size and sparseness of the blogging network. According to the definition of clustering 2Ei coefficient in undirected network,Ci=, that is the ratio between the numberEiof edges that actually exits ki(ki1) between thesekineighbor nodes of nodeiand the total numberki(ki1)/2. Since our blogging network is directed, we should make the network undirected before calculating the clustering coefficient as above. To this end, first, the asymmetric edges (oneway links/arcs) were ignored, and then the left isolated vertices were dismissed from the network. By doing so, the bidirectional (undirected) graph with 122,470 nodes was obtained to compute the clustering coefficient. The mean degree of this undirected networks,hkundirectedi, is 3.28. The clustering coefficient of the whole network is the average of all individualCi’s. We found the clustering coefficientC=0.1490, order of magnitude much higher than that of a corresponding random graph of the same sizeCrand=3.28/122,470=0.0000268. The degreedependent clustering coefficientC(k)is averagingCiover vertices of degreek.Fig. 4plots the distribution of C(k)versuskfor undirected blogging network. For clarity, we add the dashed line with slope1 in the log–log scale. It is hard to declare a clear power law in our case. Nevertheless, the nonflat clustering coefficient distribution shown in the figure suggests that the dependency ofC(k)onkis nontrivial, and thus points to some degree of hierarchy in the network. In many networks, the average clustering coefficientC(k)exhibits a highly nontrivial behavior with a powerlaw decay as a function ofk[13], indicating that the lowdegree nodes generally belong to wellinterconnected communities (corresponding to high clustering coefficient of the lowconnectivity nodes), while highdegree sites are linked to many nodes that may belong to different groups (resulting in small clustering coefficient of the largedegree nodes). This is generally the feature of a nontrivial architecture in which smalldegree vertices are wellclustered around the hubs (high degree vertices), and organized in a hierarchical manner into increasingly large groups. Thus, our blogging network has such generic feature of hierarchy. Overall, it is demonstrated that the average shortest path length is far smaller than the logarithm of the network size in such blogging network. In addition, the network has relatively high clustering coefficient. Hence, the blogging network has smallworld property which is already observed in the previous studies on social and technical networks.
680
F. Fu et al. / Physica A 387 (2008) 675–684
Fig. 4. The plot of degreedependent clustering coefficientC(k)versus degreekin undirected blogging network. The dashed line has slope1.
β Fig. 5. The plot of cumulative page view distributionP(>S)which has a right skew tail asSwithβ=0.87. The plotted straight line’s slope is0.87. The inset shows the distribution detail (including the linear fit) around large values ofS.
The cumulative distribution of blog’s page viewP(>S)is plotted inFig. 5. Interestingly, one can find that the distribution has a flat head for small values of page viewS. For large values ofS, clearly,P(>S)obeys a powerlaw βµ form asSwithβ=0.87±0.0007. Similarly, the page view distribution could be written asP(S)S, whereµ=β+1=1.87±0.0007 for largeS. In our obtained blogging network, only ten blogs’ page views exceed tens of millions, while most of the remanent have only tens of thousands page views. This right skew distribution indicates that most of the readers are attracted by the celebrated blogs and contribute to the increase of page views of those famous blogs by reading and commenting the posts. However, minority of the grassroots’ blogs could gain public attention in the blogosphere. In this sense, some kind of inequality develops: “the rich gets richer” while “the poor gets poorer”. Thus, from this point of view, social technologies not only enhance the communication between distant people, but also facilitate the development of inequality between the celebrated and the commons. From this respective, the blogosphere might be a good paradigm for studying the emergence of inequality in social systems. Finally, we study the correlation between indegree and page view as shown inFig. 6. The degreedependent page view S(kin)is averagingSover vertices of indegreekin. FromFig. 6, it is clear that the page view shows positive correlation with indegree. Intuitively, one can think that both indegree and page view can be used to measure the popularity
F. Fu et al. / Physica A 387 (2008) 675–684
Fig. 6. The plot of degreedependent page viewSversus indegreekin doublelogarithm scale.Sshows positive correlation withk. in in
Table 3 Percentage of nodes with 1, 2, 3, 4, and 5 degrees k=1 2 11.6% 6.2% Note that a large fraction of nodes have only small degrees.
3 4.6%
4 3.7%
681
5 3.3%
(importance) of blogs. Further, larger indegree is more likely to speed up the increase of page views; reciprocally, due to bigger page views, incoming links are more likely to be added to the popular blogs (with larger page views). Thus, the indegree is to a certain extent equivalent of page view in assessment of the blog influenceability. 3.2. Structure of Xiaonei network We present the topological analysis of Xiaonei network in this section. The quantities such as degree distribution, clustering coefficient, average shortest path length, etc. are calculated to capture the features of this online social network. InFig. 7, we report the degree distributionP(k), which gives the probability that randomly selected node has exactlykedges. Clearly, we can see thatP(k)follows two different scalings withk, depending on the specified critical degree valuekc. This distribution including two pronounced powerlaw regions is also found in language γ1γ2 web [14].P(k)obeys a powerlaw formkwithγ1=0.72±0.01 whenk<kc=30. Otherwise,P(k)k, whereγ2=2.12±0.02 fork>kc. The degree distribution above the critical degreekcis consistent with past findings of social networks with the degree exponent 2<< γ 3 [4]. Whereas, for small degreekbelowkc, the scaling exponent ofP(k)is less than two. Considerable fraction of nodes have only low connectivity (seeTable 3). About 68% nodes’ degrees are not more than 30. The average degreehkiis 35.8. The length of average shortest pathhliis 3.72, and the diameterDis 12. The distribution of shortest path length is shown inFig. 8, in which most of the distances are 3 or 4. Besides, the clustering coefficient of this network is C=0.16. We plot the degreedependent clustering coefficientC(k)versus degree inFig. 9. The dependence of 1 C(k)onkis nontrivial (approximatelyC(k)k), indicating some degree of hierarchy in Xiaonei network. The assortativity coefficientrof Xiaonei network is 0.00915, thus it is assortatively mixed. By contrast, the blogging network shows disassortative mixing pattern in degree correlation. Actually, in Xiaonei network, the links between online users are constructed upon bilateral agreement. Therefore, it is more likely that users with same status (same degrees) are connected to each other, which results in positive degree–degree correlation.
3.3. Discussions
Thus far, we have presented our empirical analysis of two representative online social networks 2.0—Sina blogging network and Xiaonei network. Admittedly, the obtained social networks are
in the age of Web only snapshots of
682
F. Fu et al. / Physica A 387 (2008) 675–684
Fig. 7. The degree distributionP(k)of Xiaonei network. The dotted line indicates the critical degreekc=30: fork<kc,P(k)follows a power 0.722.12 law ask, while fork>kc,P(k)obeys a powerlaw ask. The slopes of the left and right straight lines are respectively0.72 and 2.12 for comparison with the degree distribution.
Fig. 8. The plot of shortest path length distribution.
the continuously evolving networks. Nonetheless, the kind of investigation performed here sheds light on the real network topologies of new social media emerging in the Web 2.0 age. Indeed, these new social technologies shape the evolution of virtual online community. These new online social networks attract attentions of researchers from different background. On the other hand, these realistic networks are difficult to obtain (reconstruct) with the existing network models. As a consequence, it is of particular importance to study these two social networks. Based upon the results, we can perform some comparisons between Sina blogging network and Xiaonei network. The nature of former network is directed, whereas the latter is undirected. Both the networks possess smallworld and scalefree features already observed in realworld and artificial networks. Especially, the degree distribution of Xiaonei network has two pronounced powerlaw regions. Although both social networks are embedded in technical forms, they show different mixing patterns in general. Our study confirms that the blogging network shows disassortative degree correlation, er) and by i except that active bloggers are connected between each other (positivoutoutnfluential bloggers (positive logging network, the Xiaonei network rinout). In contrast to b displays a significant assortative correlation. However, previous study on some technical networks suggests that the networks are often disassortatively mixed (see Ref. [10]
F. Fu et al. / Physica A 387 (2008) 675–684
683
Fig. 9. The plot of degreedependent clustering coefficientC(k)versus degreek. A clear powerlaw is absent, butC(k)’s dependence onkis nontrivial. The dashed line has slope1.
for a general review, and detailed study on Internet dating communities [11] and P2P social networks [12]). As a social web, Xiaonei network is different from the blogging network and other online friendship networks, because its edges are bidirectional (based upon bilateral agreement). As a result, it is more likely that popular users (have large degree) become friends of each other. At this point, Xiaonei network, a kind of social network embedded in technical form, is more similar with social networks in real life (e.g. acquaintanceship network, scientific collaboration network) than the blogging network.
4. Concluding remarks
In conclusion, we performed empirical analysis of two Chinese online social networks emerging in the age of Web 2.0—Sina blogging network and Xiaonei network (SNS). We showed that both networks have smallworld and scale free features already observed in realworld and artificial networks. We studied the frequency of shortest path length, demonstrating that the famous law “six degrees of separation” is present in both the networks. We confirmed that for both networks, the clustering coefficient’s dependence on degree is nontrivial, further suggesting some level of hierarchy in topological organizations. Furthermore, we found the distribution of blog’s page views has a power law tail. Interestingly, blog’s page view shows positive correlation with its indegree. Finally, we examined the mixing pattern, namely, degree–degree correlation, for both networks. We found that the blogging network shows disassortative mixing pattern in general, while Xiaonei network is an assortative one. Our case study might help us to understand the topological features of online social network in the age of Web 2.0. On top of the two realistic networks obtained in the study, simulations of dynamic processes can be integrated to investigate spreading process (information, gossip), evolution of cooperation, etc. Work along these lines is in progress.
Acknowledgments
This work was partly supported by the NSFC (60674050 and 60528007), the National (2002CB312200), the National 863 Program (2006AA04Z258) and the 115 project (A2120061303).
References
[1] E. Cohen, B. Krishnamurthy, Comput. Netw. 50 (2006) 615. [2]http://en.wikipedia.org/wiki/Social network service. [3] S.H. Strogatz, Nature 410 (2001) 268. [4] R. Albert, A.L. Baraba´si, Rev. Modern Phys. 74 (2002) 47. [5] M.E.J. Newman, SIAM Review 45 (2003) 167. [6] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang, Phy. Rep. 424 (2006) 175.
973
Program
684
F. Fu et al. / Physica A 387 (2008) 675–684
[7] D.J. Watts, S.H. Strogatz, Nature 393 (1998) 440. [8]http://blog.sina.com.cn, andhttp://www.xiaonei.com. [9] A.L. Baraba´si, R. Albert, Science 281 (1999) 509. [10] M.E.J. Newman, Phys. Rev. Lett. 89 (2002) 208701. [11] P. Holme, C.R. Edling, F. Liljeros, Social Netw. 26 (2004) 155. [12] F. Wang, Y. Moreno, Y. Sun, Phys. Rev. E 73 (2006) 036123. [13] E. Ravasz, A.L. Baraba´si, Phys. Rev. E 67 (2003) 026112. [14] S.N. Dorogovtsev, J.F.F. Mendes, Proc. R. Soc. B 268 (2001) 2603.
Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin