Alumni Network Analysis

Neil Rubens1,2, Martha Russell1,3, Rafael Perez2, Jukka Huhtamäki1,4, Kaisa Still1,5, Dain Kaplan6, Toshio Okamoto2

*http://innovation-ecosystems.org/alumni-network1 Innovation Ecosystems Network, Media X, Stanford University, USA2 University of Electro-Communications, Japan§3 HSTAR Institute, Media X, Stanford University, USA4 Tampere University of Technology, Finland5 VTT Technical Research Centre, Finland**6 Tokyo Institute of Technology, JapanAbstract—Alumni connections are important resources that contribute to university evaluation. Even though alumni connections represent networks, they have been mostly evaluated as tabular data (e.g. by providing average salary, employment rate, etc.). This ironically disregards all qualities of a network, from which an alumni network gets its name. It is desirable to evaluate an alumni network as a network, because networks have the potential to provide very insightful information. Evaluation of alumni networks as a network has not been feasible in the past due to data fragmentation (neither universities nor companies willing to share meaningfully significant data in its entirety). Recently the feasibility of such analysis has changed, due to new trends towards democratization of information, accelerated by the Web 2.0 user-generated content phenomenon and crowd-sourcing mentality. Utilizing web-crawlers, we actively harvested data and assembled a dataset on alumni in leadership positions in technology-based industries. Moreover, we included a high proportion of startup companies, which allowed us to evaluate alumni networks with respect to entrepreneurial as well as technology involvement. We show that by analyzing alumni connections as networks, it is possible to uncover new patterns, as well as provide a new way of examining the old.

Index Terms—alumni networks, university metrics, network analysis, network visualization, entrepreneurship, engineering

I. INTRODUCTION

Rankings of universities and their programs abound. Some rankings are based on numerical scores; some are based on expert judgment. Generally, the rankings are endorsed by those at the top and held in suspect by some of the others. The factors that contribute to the preeminence of educational institutions are complex, and controversies surround nearly every ranking. The top-ranked institutions leverage rankings in their recruitment of faculty and students, in their appeals to donors, in outreach to prospective employers of their graduates, and in their requests for program and research funding. Around the world, national innovation policy groups use measures of alumni impact in their analysis and policy recommendations.

Data on alumni has been used to estimate quality and impact of educational institutions. Tabular data about individuals’ starting salaries, employment rate, and donation have been used to determine averages and comparisons. Some analyses refer conceptually to the network of an institution’s alumni, even though the analyses disregard the relationships qualities of a network, from which an alumni network gets its name. Our objective, therefore, is to provide a much overdue evaluation of the relationship characteristics of alumni networks. We take two complementary approaches for this analysis: (1) visual: by providing a visualization of the network for a comprehensive and explorative view; and (2) numerical: by providing metrics that capture salient features of the network.

In defense of traditional approaches, the network analysis of alumni has been hindered by lack of data suitable for network analysis. Gathering data about alumni is time-intensive; and the limited data available on alumni is considered a precious resource and is closely guarded by universities. To exacerbate the problem, the release of corporate information about employees and their education is limited, as well. The available data lacks standardized units of measure, is disjointed, and is problematic for analysis. It is therefore no wonder that universities have used their data in a very limited manner, namely, self-benchmarking or in support of the fund-raising efforts of their development offices.

Due to recent trends towards democratization of knowledge and information, accelerated by the Web 2.0 user-generated content phenomenon and crowd-sourcing mentality, a significant amount of data on alumni is becoming available, though still scattered throughout the web. Utilizing web-crawlers, we actively harvested data and assembled a dataset on alumni in executive, investor and board level positions in technology-based industries (including many startup companies) and the service sectors that support them [1] (Section mboxII - C:
¯ mbox).Thiscombinationprovidesinformationaboutbothentrepreneurialandtechnologicalinvolvementofalumni.Inordertocapturenetworkproperties,wehavecollectednotonlydataaboutthedirectuniversityalumniconnections,butalsodataabouttheiremploymenthistories,companyinformation,financialorganizations,investmentactivities,andmostimportantlyrelations∕linksthatinterconnecttheseentities.We propose a novel approach to evaluate the connectivity of alumni based on their leadership roles in technology-based businesses. As the available data increases, the proposed approach can be more widely applied to gain broader and deeper insights into alumni networks. The goal of this paper is to demonstrate ways in which this could be accomplished. For example, we investigate the role an alumni network plays in enhancing a personal network (Section mboxIII - B mbox),comparingthealumninetworksofdifferentuniversities(Section mboxIII - A mbox),etc.Therestofthispaperfocusesonprovidingabriefoverviewofthesepossibilities. II. METHOD A. Conceptual Approach Our approach is based on a network analysis of alumni in leadership roles in technology-based companies, the service agencies that support them, and their investment firms. The term network refers to a pattern of relationships created by the interconnection of several actors. Alumni and their affiliated organizations are of special interest for this analysis. By leadership we refer to individuals who are key personnel, executives, or board members of technology-based companies or the service organizations that support them. In some cases individuals have invested in companies – a role we also designate as a leadership role. B. Network Analysis The analysis of the overall network structure, the character of the network, and the roles of the network actors are several of the measures of interest in network analysis. In this analysis there are four types of actors – alumni, the educational institutions from which they have graduated, and the companies in which they have held leadership roles – executive, board, investor. Investment firms and venture capital firms are segmented as a unique class of company. The network of relationships between actors, shown as nodes, can be modeled either as one-mode or two-mode. In one-mode networks all the nodes are of same type. In an alumni network, for example, connections could be established among alumni who attended the same educational institution. In two-mode networks, there are two types of nodes – alumni and the organizations in which they have participated (educational institutions, companies, investment and venture firms). Edges represent relationships between nodes: a person is an alumni of educational institution, a person works at a company, person invested in a company or in a financial organization that invested in a company, etc. Edges may be directed, in which an arrow indicates the direction of the relationship, or undirected. Edge direction depends on semantics e.g. person works at a company or company employs a person. From our web-crawled data, this clarity might be determined with deeper analysis; in the analysis reported here, we consider both directions and represent the relationships through undirected edges. C. Data The Innovation Ecosystems (IEN) Dataset [1] is a collection of over 140,000 records built by web-crawling English language, socially constructed data about technology-oriented companies; it is updated quarterly. As of August 2010, it includes data about more than 44,000 companies (including a high proportion of startup companies), their executives and board personnel (over 60,000 records), investment organizations (over 5,100 records), and financial transactions totaling over US$ 410 billion. People included in the dataset are the key employees in their respective companies (e.g. founders, executives, lead engineers, etc.), members of boards of advisors, or investors. We have further enhanced this data by adding data about 2,100 educational institutions and 5,800 personal educational affiliations; these additions were obtained from the biographical references and notes describing the individuals. Note, that the dataset inherits both advantages and disadvantages of socially constructed data. Some of the advantages are large coverage, timeliness, and community verification of data quality. Some of the disadvantages are potentially erroneous data and public bias (vs. the editorial bias often extant in traditional data settings). III. ANALYSIS Two different yet complimentary approaches were used to analyze the data: (1) preparing a graphic representation of the network for visual analysis of patterns and (2) analyzing numeric features of the networks. The number of patterns could be very large, and what one reader finds most interesting may be of lesser interest to others. We highlight some of the patterns that we have found interesting and encourage readers to discover additional patterns based on their personal preferences and objectives. This analysis is intended to be demonstrative. Since our data describe leadership roles in technology-based businesses, it is not intended to be documentary or prescriptive. Nonetheless, due to a large number records and to the interconnected nature of the records, this analysis does identify several interesting patterns. It poses several opportunities for further research. A. Visual Analysis


PIC

PIC (a) Stanford University.

PIC (b) Harvard University.
PIC (c) MIT.
PIC (d) UC Berkeley.
Figure 1: Intra-University Networks. Networks of the above universities are expanded in a breadth first manner up to the depth of 2, (showing university, alumni and companies they are associated with through employment, investment or other activities) (Section ??). Size of the node reflects degree of the node (scaled logarithmically).

The use of graphic images to represent network configurations is important because it allows investigators to gain new insights into the patterning of connections [2]. We use Gephi [3] for graph visualization and layout. We performed network layout in two stages: (1) cluster-based stage, (2) relation-based compacting stage. In the cluster-based stage we use OpenOrd layout algorithm [4], since it produces layout that allows to better distinguish clusters based on the interconnections between the nodes. We then apply ForceAtlas [3], to compact the graph (nodes that are connected are pulled closer together). The network figures are embedded in the document by using scalable vector graphics (svg), so it is possible to look at network details by zooming in.

A graphic analysis of intra-university alumni networks is produced in the following manner. First, the university node is selected, then all of the nodes that are connected to it are added (in this case, the university’s alumni), finally all of the nodes to which alumni are connected are added (in this case, the companies at which alumni have worked, invested in, or served on advisory boards). In the intra-university network, other universities may also be present (a person may be an alum of several universities). A noticeable portion of alumni from MIT have associations with other universities, Harvard alumni to a lesser degree, followed by Stanford, and finally UC Berkeley alumni show the fewest associations with other universities. The inter-university relationships are described further in Section ??.

Intra-university networks were produced for Stanford University, Harvard University, University of California (Berkeley), and MIT. The alumni networks produced for each independent educational institution were equally interesting, in comparison Figure 1. Three differences are immediately apparent across the intra-university networks. First, number of company nodes in relation to number of alumni nodes differs. Stanford University has a significantly higher ratio of companies per alumni in leadership roles, followed by Harvard, trailed by Berkeley, and MIT. A high ratio of company nodes indicates that alumni have been involved with multiple companies - either through employment, advisory or investment activities. In addition, the number of highly connected alumni (large nodes with many connections located on the perimeter) differs significantly between universities (we further explore this in Section mboxIII - B:
¯ mbox).Oneparticularcharacteristicofhighlyconnectedalumnistandsout,namelytheircollaborationpatterns.Stanfordsdenselyconnectedalumniarehighlylikelytocollaboratewithfellowalumni(indicatedbythecompanynodesbeingpulledawayfromhighlyconnectedalumnustowardsotherless - connectedalumniinthecenter).Inthenetworksofotheruniversities,thecollaborationofhighlyconnectedindividualswiththeirfellowalumniisevidencedbuttoalesserdegree.Harvard alumni appear active in leadership positions in technology-based startups, even more so than MIT alumni (Figure 1b vs. 1c). A possible explanation for the relatively lower level of MIT alumni may be attributable to the focus of this dataset on leadership positions in the organizations. While engineers play a key role, they often do so in a technology development capacity rather than in the leadership positions that are visible in public relations communications. Some support for this explanation may be seen in the Figure 2 relatively large distance between Microsoft and University of Washington, even though a large number of engineers at Microsoft are indeed from University of Washington.



PIC

PIC

Figure 2: Inter-University Network (between Stanford, Harvard, MIT, Berkeley) (Section ??). Network is obtained by starting with the nodes of the above mentioned universities and performing a breadth-first expansion up to the depth of 3.

A graphic representation of the alumni-based inter-relations between universities, shown in Figure 2, was produced as follows. Four universities were selected for analysis: Stanford University, University of California (Berkeley), Harvard University and Massachusetts Institute of Technology (MIT). From these nodes we have performed breadth-first expansion up to the depth of three: 1st level being alumni of these corresponding universities, 2nd level are companies with which alumni have relations, and 3rd level entities/nodes that are linked to previous levels including financial organizations, company employees, etc. Since we are primarily interested in relationships among alumni, all of other entities are faded out, except for the above mentioned universities and their alumni. In addition, we glimpse at the relations between alumni and investment firms (a very important factor for entrepreneurism). Therefore, nodes of financial organizations are not faded out.

Two distinct groups – universities (in the lower left corner) and financials (in the upper right corner) – are visible in the subdued edges in Figure 2. The distance from the universities to the cloud of ‘financial’ clusters also varies. In particular, Stanford and Berkeley are rather close to the financial cloud. This may be explained by the geographical proximity of these universities to one of the largest sources of venture funding – Silicon Valley. While universities themselves are not embedded within the financial clusters, a noticeable proportion of alumni are deeply connected within the financial clusters by having direct or indirect relations with multiple financial organizations. Stanford has the largest number of alumni connected to the financial cluster, followed by Harvard (even though university itself is relatively distant from the financial cluster); followed by Berkeley, and only a few alumni from MIT. The proximity between alumni and their alma matters appear to differ significantly. Berkeley alumni tend to be clustered together, MIT to somewhat lesser degree, and Stanford and Harvard alumni are rather dispersed. Proximity between universities differs as well. Stanford and Berkeley are close together (many alumni hold leadership positions in the same companies). One of the likely explanations for this network proximity is the geographical proximity of both universities to Silicon Valley where many of the investment firms and startup companies are located. Harvard and MIT do not appear to have as strong relations with other universities in these settings.


PIC

PIC Figure 3: Universities within the Business Network (partial snapshot) (Section ??). Note that nodes locations differ significantly from Figure 2 due to additional forces exerted by a very large number of nodes and links of the complete network (144,685 nodes and 129,423 links). For better visibility of the entity types except for universities are faded out.


Through alumni, universities become indirectly linked to a variety of business entities – technology-based companies, the service organizations that support them, and investment firms. The positions of universities within the technology-based business network Figure 3 are determined by their direct links only to the alumni. However the proximity and location of universities within the business network Figure 3 differ from those of the inter-university network [fig:Inter-University-Network]. It should be noted that a large number of nodes and links that were not included in the inter-university network are, in fact, included in the full network layout of the nodes. The cluster and forced based layout algorithms used in this analysis produce nodes that have many interconnections and tend to be close together. Moreover, both the direct and indirect links influence position of nodes within the network. Hence, the patterns of nodes differs significantly between the Business Network and the Inter-University Network.

Let us look at the proximity between universities and companies. While Microsoft and Yahoo are close to many major universities, Google appears to be distant from them. Discovering the precise explanation for this warrants further investigation, but let us suggest two hypotheses. As we briefly discussed in Section ?? and Section mboxII - C:
¯ mbox,ourdatasetisfocusedonkeypeoplewithinthecompany(e.g.mentionedinpressreleases).Unlikemanyofothercompanies,Googletendstogivecredittoitsengineers,e.g.namesofengineersarementionedinpressreleases.Inaddition,Googlehadexperiencedveryrapidemployeegrowth,whichhasrequiredestablishingrelationshipswithmanyuniversitiestomeethiringgoals. B. Data Analysis In addition to examining networks visually, we use several network measures to reveal the characteristics and patterns of the underlying network. One of the biggest advantages of numerical analysis of network data is the ability to analyze very large networks; in visual analysis patterns in large networks quickly become difficult to discern (Figure 2, 3). For the numerical analysis, we used the full set of data as described in Section mboxII - C mbox;constructednetworkcontains144,685nodesand129,423links;includingover2,100educationalinstitutions.Duetospacelimitations,wehaveselectedtoreportnetworkpropertiesof20universitieswiththelargestnumberofalumniincludedinourdataset. Network metrics numerically express characteristics and patterns of the underlying network. For this analysis, we have chosen to use the following network measures: centrality (betweenness centrality and closeness centrality), and eccentricity. Centrality reflects the relative importance of a node within the graph. Betweenness and closeness centrality are typical measures of centrality [53]. Betweenness centrality can be thought of as a kind of bridge/broker score, a measure of how much the connections between other nodes in the network would be disrupted by removing that node. If an alum has a very novel and highly desired expertise, s/he may provide crucial and rather exclusive links for doing business in that domain. More precisely betweenness centrality measures how frequently a node appears on shortest paths between nodes in the network. On the other hand, closeness centrality indicates how ‘central’ the node is, i.e. how close the node is to all of the other nodes in the network. More precisely, it is the inverse of the average distances from the nodes to all other nodes within the network. For example, a person that knows many other people directly or indirectly (not necessarily as exclusively as in betweenness centrality) would have high closeness centrality. Another feature of the network that we use is eccentricity. Eccentricity is somewhat inverse of the centrality measure, indicating the isolation distance, i.e. how far a node is from the node most distant from it in the graph. While it is recognized that institutional size may play a role in the numbers of alumni taking leadership positions in technology-based companies, as well as in the number of relationships established among alumni, the metrics in this analysis were not normalized for size of the educational institution for the reasons outlined bellow (supplemental, alumnus-centered metrics are provided in Section ??). Normalizing with respect to number of alumni would diminish the influence of networks with many members. The size of the network does matter, so it may not be warranted to disregard it. In addition, the relationship between the number of alumni and network metrics is non-linear and is different for each metric, so there appears to be no clear argument in favor of normalization. For these reasons we have used raw numbers. Several properties of alumni are essential components of these networks. To investigate properties of alumni networks, we have removed universities nodes from the network, using university affiliation as a property of alumni rather than as a connection (in the later Section ??, we analyze and compare it with the full network), this way we can examine the actual connections that alumni have. Table ?? shows numeric values for number of alumni records, median betweenness centrality, closeness centrality, and eccentricity (median value is resilient to outliers, and is indicative of a typical alum in our data). In an examination of median eccentricity, universities cluster into two groups. One group has a median eccentricity of about 20, another group has eccentricity close to 3. The low value of eccentricity (i.e. 3) occurs when the longest path goes through only three nodes. This is indicative of the network being fragmented (i.e. not interconnected). Similar patterns are revealed with respect to closeness centrality. Figure 4a shows the graph of alumni’s betweenness centrality. People with maximum centrality have similar values, which then drop very quickly following the long-tailed distribution. Stanford University has a significant higher centrality score (many people w/ very high betweenness centrality), with the other group of universities led by Harvard but rather close to each other. In Section ??, we looked at the network metrics of alumni without the presence of their affiliated universities. By adding the university node, connecting it to the alumni, and comparing how this effects metrics, we can see the potential network effect of a university and its alumni. The actual impact would most likely be smaller, since the assumption that all alumni are connected to each other through university (made for this analysis) is not realistic. Table ?? shows numeric values for number of alumni records, median betweenness centrality, closeness centrality, and eccentricity. Comparing this data with Table ?? and Section ??, a very interesting revelation is the potential that university network has to remedy low eccentricity of 3 and bring it to par with the rest of the universities (eccentricity close to 20). High eccentricity implies that a person can follow a long path of connections and be indirectly connected to a larger network, and by the same token, a university connection may have a smaller relationship influence for alumni with high eccentricity. The presence or absence of the university nodes in the network analysis has similar affect on median closeness centrality. Figure 4b shows the graph of alumni betweenness centrality. Interestingly, in comparison with the network in which university was not present, Figure 4a; the order of the most ‘connected’ individuals remained remarkably similar (only Oxford and University of Washington changed places); As in Figure 4b, Stanford still has a significantly higher metric. However, Harvard takes a strong second position. Overall the magnitude of betweenness centrality changed by a factor of 1.8; showing that even ‘connected’ individuals may significantly benefit from the university network.


PIC (a) Betweenness Centrality of Alumni Nodes in Networks (University and alumni nodes are disconnected).

PIC (b) Betweenness Centrality of Alumni Nodes in Networks (University and alumni nodes are connected).
Figure 4: Betweenness Centrality (y-axis) of alumni nodes in the network (connected to the university node (b)), and (disconnected from the university node (a)). The x-axis corresponds to alumni ordered in descending order of centrality for each of the universities. Note that the scale of y-axis differs between (a) and (b).


PIC



PIC


IV. CONCLUSION

This analysis proposed and demonstrated a new methodological lens for understanding the role alumni play in the reputation of educational institutions. Using four well-known universities as examples, this demonstration has shown how online web-crawling can produce data about alumni and their alma maters, as well as about the companies they have founded, funded and led. Network metrics and social network analysis were used to demonstrate alumni-centric, institution-centric, and business cluster-centric analyses. Suggestions were made for extension of these data-gathering and network analysis approaches. Recent applications of network analysis have demonstrated their power in understanding social norms, inter-firm relationships, and influence. Continuing developments combining network analysis and machine learning are opening opportunities for predictive methods as well. Alumni and their connections – to several educational institutions and to several business entities – provide a data domain with deep dimension. Alumni networks can be both simple and complex. The complexity of the relationships among alumni active in technology-based business affords extensive inquiry into many-mode, small and large-scale, directed and time-scaled. The authors invite collaboration on these frontiers. Legends, novels, and films have been made about the academic cohort – a graduating class, a student course project team, a laboratory group. Experiences at educational institutions are often profound and memorable. The connections formed through those experiences – through processes of self discovery, learning, creativity, invention, collaboration – enable the personal and professional contributions of graduates. Educational institutions stand to benefit substantially from better understanding the connections of their alumni. Insights from this understanding could be leveraged to guide the curricula and enrichment programs that comprise the educational experience. The power of visualization can be harnessed to develop a shared mental model, among faculty, administrators and donors, toward which resources will be applied. This might include curricular and extracurricular attention to students’ personal and professional network development in a global environment, in which life-long and life-wide learning yields competitive advantage.

REFERENCES

[1] N. Rubens, K. Still, J. Huhtamaki, and M. G. Russell, “Leveraging social media for analysis of innovation players and their moves,” tech. rep., Media X, Stanford University, Feb. 2010. [2] L. C. Freeman, Encyclopedia of Complexity and Systems Science, ch. Methods of Social Network Visualization. Berlin: Springer, 2009. [3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source software for exploring and manipulating networks,” 2009. [4] S. Martin, W. M. Brown, R. Klavans and K. Boyack, “OpenOrd: An Open-Source Toolbox for Large Graph Layout,” in SPIE Conference on Visualization and Data Analysis (VDA), 2011. [5] D. Hansen, B. Shneiderman, and M. Smith, Analyzing Social Networks with NodeXL: Insights from a Connected World. Morgan Kaufmann, 2010.