Evolution of the social network of scientific collaborations
14 pages
English

Evolution of the social network of scientific collaborations

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
14 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Evolution of the social network of scientific collaborations1,2 1 1,2,∗ 1 3 2,4A.L. Baraba´si , H. Jeong , Z. N´eda , E. Ravasz , A. Schubert , T. Vicsek1Department of Physics, University of Notre Dame, Notre Dame, IN 46556, USA2 Collegium Budapest, Institute of Advanced Study, Budapest, Hungary3 Bibliometric Service, Library of the Hungarian Academy of Sciences, Budapest, Hungary4Department of Biological Physics, E¨otv¨os Lor´and University, Budapest, Hungary(Last revised February 1, 2008)The co-authorship network of scientists represents a prototype of complex evolving networks. Inaddition, it offers one of the most extensive database to date on social networks. By mapping theelectronic database containing all relevant journals in mathematics and neuro-science for an eight-yearperiod(1991-98), weinferthedynamicandthestructuralmechanismsthatgoverntheevolutionandtopology ofthiscomplexsystem. Threecomplementaryapproachesallowustoobtainadetailedcharacterization. First, empirical measurements allow us to uncover the topological measures thatcharacterize the network at a given moment, as well as the time evolution of these quantities.The results indicate that the network is scale-free, and that the network evolution is governed bypreferential attachment, affecting both internal and external links. However, in contrast with mostmodel predictions the average degree increases in time, and the node separation decreases. Second,we propose a simple model that ...

Informations

Publié par
Publié le 03 octobre 2011
Nombre de lectures 109
Langue English

Extrait

se advances we choose tothey are restricted to rather small systems, and often investigate in detail the collaboration network of scien-view networks as static graphs, whose nodes are individ- tists.uals and links represent various quantifiable social inter- Recently Newman has taken an important step to-actions. wards applying modern network ideas to collaborationIn contrast, recent approaches with methodology networks [10,11]. He studied several large database fo-rooted in statistical physics focus on large networks, cusing on several fields of research over a five year pe-searching for universalities both in the topology of the riod,establishingthatcollaborationnetworkshavealltheweb and in the dynamics governing it’s evolution. These general ingredients of small world networks: they have acombined theoretical and empirical results have opened 1 arXiv:cond-mat/0104162v1 [cond-mat.soft] 10 Apr 2001 surprisinglyshortnode-to-nodedistanceandalargeclus- the WWW. tering coefficient [10], much largerthan the one expected Our work stands on three pillars. First, we use direct from a random Erdo˝s-R´enyi type network of similar size measurementsontheavailabledatatouncoverthemech- and average connectivity. Furthermore, the degree dis- anismofnetworkevolution. Thisimpliesdeterminingthe tribution appears to follow a power law [11]. different parameters and uncovering the various compet- Our study takes a different, but complementary ap- ing processes present in the system. Second, building proach to collaboration networks than that followed by on the mechanisms and parameters revealed by the mea- Newman. We view collaboration networks as prototype surements we construct a model that allows us to inves- of evolving networks, where the accent is on dynamics tigate the large scale topology the system, as well as its and evolution. Indeed, the co-authorship network con- dynamical features. The predictions offered by a contin- stantly expands by the addition of new authors to the uum theory of the model allow us to explain some of the database, as well as the addition of new internal links results that were uncovered by ours, as well Newman’s representingpapers co-authoredby authorsthat wereal- measurements. Thethirdandfinalstep willinvolvecom- readypartofthe database. The topologicalpropertiesof puter simulations of the model, serving several purposes: these networks are determined by these dynamical and (i) It allows us to investigate quantities that could not growth processes. Consequently, in order to understand be extracted from the continuum theory; (ii) Verifies the their topology, we first need to understand the dynam- predictionsofthecontinuumtheory;(iii)Allowsustoun- ical process that determines their evolution. In this as- derstandthe natureofthe measurementswecanperform pect Newman’s study focuses on the static properties of on the network, explaining some apparent discrepancies the collaboration graph, while our work investigates the between the theoretical and the experimental results. dynamical properties of these networks. We show that such dynamical approach can explain many of the static II. DATABASES: CO-AUTHORSHIP INtopological features seen in the collaboration graph. MATHEMATICS AND NEURO-SCIENCEIt is important to emphasize that the properties of the co-authorship network are not unique. The WWW is also a complex evolving network, where nodes and links For each research field whose practitioners collaborate areadded(andremoved)ataveryhighrate,thenetwork in publications one can define a co-authorship network topology being profoundly determined by these dynami- which is a reflection of the professional links between the calfeatures[3,20,21,25]. TheactornetworkofHollywood scientists. In this network the nodes are the scientists is very similar to the co-authorship network, because it and two scientists are linked if they wrote a paper to- grows through the addition of new nodes (actors) and gether. In order to get information on the topology of a new links (movies linking existing actors) [2,4,14]. Sim- scientificco-authorshipweboneneedsacompletedataset ilarly, the nontrivial scaling properties of many cellular ofthe published papers, ideally from the birth ofthe dis- [23], ecological [24] or business networks are all deter- cipline until today. However, computer databases cover mined by dynamical processes that contributed to the at most the past several decades. Thus any study of this emergence of these networks. So why single out the col- kind needs to be limited to only a recent segment of the laboration network as a case study? A number of fac- database. This will impose unexpected challenges, that tors have contributed to this choice. First we needed a need to be addressed, since such limited data availability network for which the dynamical evolution is explicitly is a general feature of most networks. available. That is, in addition to a map of the network The databases considered by us contain article titles topology, it is important to know the time at which the and authors of all relevant journals in the field of mathe- nodes and links have been added to the network, crucial matics (M) and neuro-science (NS), published in the pe- forrevealingthenetworkdynamics. Thisrequirementre- riod 1991-98. We have chosen these two fields for several duces the currently available databases to two systems: reasons. A first factor was the size of the database: bio- the actor network, where we can follow the dynamics logicalsciencesorphysicsareordersofmagnitude larger, by recording the year of the movie release, and the col- toolargetoaddresstheirpropertieswithreasonablecom- laboration network for which the paper publication year puting resources. Second, the selected two fields offer allows us to track the time evolution. Of these two, the sufficientdiversityby displayingdifferentpublishing pat- co-authorship data is closer to a prototypical evolving terns: in NS collaborationis intense, while mathematics, network than the Hollywood actor database for the fol- although there is increasing tendency towards collabora- lowing reasons: in the science collaboration network the tion [26], is still a basically single investigator field. co-authorship decision is made entirely by the authors, In mathematics our database contains 70,975 different i.e. decisionmakingisdelegatedtothelevelofindividual authors and 70,901 papers for an interval spanning eight nodes. In contrast, for actors the decision often lies with years. In NS the number of different authors is 209,293 the casting director, a level higher than the node. While and the number of published papers is 210,750. A com- in the long run this difference is not particularly impor- plete statistics for the two considered database is sum- tant, the collaboration network is still closer in spirit to marized in Fig. 1, where we plot the cumulative number a prototypicalevolvingnetworksuch as social systemsor 2 of papers and authors for the period 1991-98. We con- the parameters that are crucial to the understanding of sider ”newauthor”an authorwho wasnotpresentin the the processes which determine the network topology, of- database from 1991 up to a given year. feringinputfortheconstructionofanappropriatemodel. (a) A. Degree distribution follows a power-law 0.3 NS 2 0.2 A quantity that has been much studied lately for vari-NSM 0.1 ous networks is the degree distribution, P(k), giving the t probability that a randomly selected node has k links. 1 NetworksforwhichP(k)hasapower-lawtail, areknown M as scale-freenetworks[3,13]. Onthe otherhand, classical network models, including the Erdo˝s-R´enyi [27,28] and the Watts and Strogatz [4] models have an exponentially 0 decaying P(k) and are collectively known as exponential1991 1992 1993 1994 1995 1996 1997 1998 t networks. ThedegreedistributionsofboththeMandNS data indicate that collaboration networks are scale-free. (b) 0.4 The power-law tail is evident from the raw, uniformly 2 NS binned data (Fig. 2a,b), but the scaling regime is better NS0.2 seen on the plot that uses logarithmic binning, reducing M the noise in the tail (Fig. 2c). The cumulative data with t logarithmic binning indicates γ = 2.4 and γ = 2.1M NS 1 for the two databases [29]. M 5 5 10 10 (b)(a) 1998 1998 1993 19934 4 slope −210 10 slope −20 slope −3 slope −31991 1992 1993 1994 1995 1996 1997 1998 t 3 3 10 10 FIG. 1. (a) Cumulative number of papers for the M and 2 2 10 10 NS databases in the period 1991-98. The inset shows the 1 1number of papers published each year. (b) Cumulative num- 10 10 M NS ber of authors (nodes) for the M and NS databases in the 0 0 10 10 0 1 2 3 0 1 2 3period 1991-98. The inset shows the number of new authors 10 10 10 10 10 10 10 10 kadded each year. (c) 010 Before proceeding we need to clarify a few method- ological issues that affect the data analysis. First, in −2 10the database the authors are represented by their sur- NSname and initials of first and middle name, thus there is a source of error in distinguishing some of them. Two −4 10 different authors with the same initials and surname will M appeartobethesamenodeinthedatabase. Thiserroris −610important mainly for scientists of Chinese and Japanese 0 1 2 310 10 10 10 descent. Second, seldom a given author uses one or two k initials in differentpublications, andin suchcaseshe/she FIG. 2. Degree distribution for the (a) M and (b) NS will appear as separate nodes. Newman [10] showed that database,showingthedatabasedonthecumulativeresultsup the error introduced by those problems is of the order toyeas 1993 (×)and 1998 (•). (c) Degree distribution shown of a few percents. Our results are also affected by these with logarithmic binning computed from the full dataset cu- methodological limitations, but we do not expect that it mulative up to 1998. The lines correspond do the best fits, will have a significant impact on our results. and have the slope 2.1 (NS, dotted) and 2.4 (M, dashed). III. DATA ANALYSIS We will
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents