Evolution of the social network of scientific collaborations

mtoledan

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

14 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Evolution of the social network of scientiﬁc collaborations1,2 1 1,2,∗ 1 3 2,4A.L. Baraba´si , H. Jeong , Z. N´eda , E. Ravasz , A. Schubert , T. Vicsek1Department of Physics, University of Notre Dame, Notre Dame, IN 46556, USA2 Collegium Budapest, Institute of Advanced Study, Budapest, Hungary3 Bibliometric Service, Library of the Hungarian Academy of Sciences, Budapest, Hungary4Department of Biological Physics, E¨otv¨os Lor´and University, Budapest, Hungary(Last revised February 1, 2008)The co-authorship network of scientists represents a prototype of complex evolving networks. Inaddition, it oﬀers one of the most extensive database to date on social networks. By mapping theelectronic database containing all relevant journals in mathematics and neuro-science for an eight-yearperiod(1991-98), weinferthedynamicandthestructuralmechanismsthatgoverntheevolutionandtopology ofthiscomplexsystem. Threecomplementaryapproachesallowustoobtainadetailedcharacterization. First, empirical measurements allow us to uncover the topological measures thatcharacterize the network at a given moment, as well as the time evolution of these quantities.The results indicate that the network is scale-free, and that the network evolution is governed bypreferential attachment, aﬀecting both internal and external links. However, in contrast with mostmodel predictions the average degree increases in time, and the node separation decreases. Second,we propose a simple model that ...

Informations

Publié par	mtoledan
Publié le	03 octobre 2011
Nombre de lectures	109
Langue	English

Extrait

se advances we choose tothey are restricted to rather small systems, and often investigate in detail the collaboration network of scien-view networks as static graphs, whose nodes are individ- tists.uals and links represent various quantiﬁable social inter- Recently Newman has taken an important step to-actions. wards applying modern network ideas to collaborationIn contrast, recent approaches with methodology networks [10,11]. He studied several large database fo-rooted in statistical physics focus on large networks, cusing on several ﬁelds of research over a ﬁve year pe-searching for universalities both in the topology of the riod,establishingthatcollaborationnetworkshavealltheweb and in the dynamics governing it’s evolution. These general ingredients of small world networks: they have acombined theoretical and empirical results have opened 1 arXiv:cond-mat/0104162v1 [cond-mat.soft] 10 Apr 2001surprisinglyshortnode-to-nodedistanceandalargeclus- the WWW. tering coeﬃcient [10], much largerthan the one expected Our work stands on three pillars. First, we use direct from a random Erdo˝s-R´enyi type network of similar size measurementsontheavailabledatatouncoverthemech- and average connectivity. Furthermore, the degree dis- anismofnetworkevolution. Thisimpliesdeterminingthe tribution appears to follow a power law [11]. diﬀerent parameters and uncovering the various compet- Our study takes a diﬀerent, but complementary ap- ing processes present in the system. Second, building proach to collaboration networks than that followed by on the mechanisms and parameters revealed by the mea- Newman. We view collaboration networks as prototype surements we construct a model that allows us to inves- of evolving networks, where the accent is on dynamics tigate the large scale topology the system, as well as its and evolution. Indeed, the co-authorship network con- dynamical features. The predictions oﬀered by a contin- stantly expands by the addition of new authors to the uum theory of the model allow us to explain some of the database, as well as the addition of new internal links results that were uncovered by ours, as well Newman’s representingpapers co-authoredby authorsthat wereal- measurements. Thethirdandﬁnalstep willinvolvecom- readypartofthe database. The topologicalpropertiesof puter simulations of the model, serving several purposes: these networks are determined by these dynamical and (i) It allows us to investigate quantities that could not growth processes. Consequently, in order to understand be extracted from the continuum theory; (ii) Veriﬁes the their topology, we ﬁrst need to understand the dynam- predictionsofthecontinuumtheory;(iii)Allowsustoun- ical process that determines their evolution. In this as- derstandthe natureofthe measurementswecanperform pect Newman’s study focuses on the static properties of on the network, explaining some apparent discrepancies the collaboration graph, while our work investigates the between the theoretical and the experimental results. dynamical properties of these networks. We show that such dynamical approach can explain many of the static II. DATABASES: CO-AUTHORSHIP INtopological features seen in the collaboration graph. MATHEMATICS AND NEURO-SCIENCEIt is important to emphasize that the properties of the co-authorship network are not unique. The WWW is also a complex evolving network, where nodes and links For each research ﬁeld whose practitioners collaborate areadded(andremoved)ataveryhighrate,thenetwork in publications one can deﬁne a co-authorship network topology being profoundly determined by these dynami- which is a reﬂection of the professional links between the calfeatures[3,20,21,25]. TheactornetworkofHollywood scientists. In this network the nodes are the scientists is very similar to the co-authorship network, because it and two scientists are linked if they wrote a paper to- grows through the addition of new nodes (actors) and gether. In order to get information on the topology of a new links (movies linking existing actors) [2,4,14]. Sim- scientiﬁcco-authorshipweboneneedsacompletedataset ilarly, the nontrivial scaling properties of many cellular ofthe published papers, ideally from the birth ofthe dis- [23], ecological [24] or business networks are all deter- cipline until today. However, computer databases cover mined by dynamical processes that contributed to the at most the past several decades. Thus any study of this emergence of these networks. So why single out the col- kind needs to be limited to only a recent segment of the laboration network as a case study? A number of fac- database. This will impose unexpected challenges, that tors have contributed to this choice. First we needed a need to be addressed, since such limited data availability network for which the dynamical evolution is explicitly is a general feature of most networks. available. That is, in addition to a map of the network The databases considered by us contain article titles topology, it is important to know the time at which the and authors of all relevant journals in the ﬁeld of mathe- nodes and links have been added to the network, crucial matics (M) and neuro-science (NS), published in the pe- forrevealingthenetworkdynamics. Thisrequirementre- riod 1991-98. We have chosen these two ﬁelds for several duces the currently available databases to two systems: reasons. A ﬁrst factor was the size of the database: bio- the actor network, where we can follow the dynamics logicalsciencesorphysicsareordersofmagnitude larger, by recording the year of the movie release, and the col- toolargetoaddresstheirpropertieswithreasonablecom- laboration network for which the paper publication year puting resources. Second, the selected two ﬁelds oﬀer allows us to track the time evolution. Of these two, the suﬃcientdiversityby displayingdiﬀerentpublishing pat- co-authorship data is closer to a prototypical evolving terns: in NS collaborationis intense, while mathematics, network than the Hollywood actor database for the fol- although there is increasing tendency towards collabora- lowing reasons: in the science collaboration network the tion [26], is still a basically single investigator ﬁeld. co-authorship decision is made entirely by the authors, In mathematics our database contains 70,975 diﬀerent i.e. decisionmakingisdelegatedtothelevelofindividual authors and 70,901 papers for an interval spanning eight nodes. In contrast, for actors the decision often lies with years. In NS the number of diﬀerent authors is 209,293 the casting director, a level higher than the node. While and the number of published papers is 210,750. A com- in the long run this diﬀerence is not particularly impor- plete statistics for the two considered database is sum- tant, the collaboration network is still closer in spirit to marized in Fig. 1, where we plot the cumulative number a prototypicalevolvingnetworksuch as social systemsor 2of papers and authors for the period 1991-98. We con- the parameters that are crucial to the understanding of sider ”newauthor”an authorwho wasnotpresentin the the processes which determine the network topology, of- database from 1991 up to a given year. feringinputfortheconstructionofanappropriatemodel. (a) A. Degree distribution follows a power-law 0.3 NS 2 0.2 A quantity that has been much studied lately for vari-NSM 0.1 ous networks is the degree distribution, P(k), giving the t probability that a randomly selected node has k links. 1 NetworksforwhichP(k)hasapower-lawtail, areknown M as scale-freenetworks[3,13]. Onthe otherhand, classical network models, including the Erdo˝s-R´enyi [27,28] and the Watts and Strogatz [4] models have an exponentially 0 decaying P(k) and are collectively known as exponential1991 1992 1993 1994 1995 1996 1997 1998 t networks. ThedegreedistributionsofboththeMandNS data indicate that collaboration networks are scale-free. (b) 0.4 The power-law tail is evident from the raw, uniformly 2 NS binned data (Fig. 2a,b), but the scaling regime is better NS0.2 seen on the plot that uses logarithmic binning, reducing M the noise in the tail (Fig. 2c). The cumulative data with t logarithmic binning indicates γ = 2.4 and γ = 2.1M NS 1 for the two databases [29]. M 5 5 10 10 (b)(a) 1998 1998 1993 19934 4 slope −210 10 slope −20 slope −3 slope −31991 1992 1993 1994 1995 1996 1997 1998 t 3 3 10 10 FIG. 1. (a) Cumulative number of papers for the M and 2 2 10 10 NS databases in the period 1991-98. The inset shows the 1 1number of papers published each year. (b) Cumulative num- 10 10 M NS ber of authors (nodes) for the M and NS databases in the 0 0 10 10 0 1 2 3 0 1 2 3period 1991-98. The inset shows the number of new authors 10 10 10 10 10 10 10 10 kadded each year. (c) 010 Before proceeding we need to clarify a few method- ological issues that aﬀect the data analysis. First, in −2 10the database the authors are represented by their sur- NSname and initials of ﬁrst and middle name, thus there is a source of error in distinguishing some of them. Two −4 10 diﬀerent authors with the same initials and surname will M appeartobethesamenodeinthedatabase. Thiserroris −610important mainly for scientists of Chinese and Japanese 0 1 2 310 10 10 10 descent. Second, seldom a given author uses one or two k initials in diﬀerentpublications, andin suchcaseshe/she FIG. 2. Degree distribution for the (a) M and (b) NS will appear as separate nodes. Newman [10] showed that database,showingthedatabasedonthecumulativeresultsup the error introduced by those problems is of the order toyeas 1993 (×)and 1998 (•). (c) Degree distribution shown of a few percents. Our results are also aﬀected by these with logarithmic binning computed from the full dataset cu- methodological limitations, but we do not expect that it mulative up to 1998. The lines correspond do the best ﬁts, will have a signiﬁcant impact on our results. and have the slope 2.1 (NS, dotted) and 2.4 (M, dashed). III. DATA ANALYSIS We will