Social Network Analysis with sna package

Post on 03-Jan-2017

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

JSS Journal of Statistical SoftwareFebruary 2008 Volume 24 Issue 6 httpwwwjstatsoftorg

Social Network Analysis with sna

Carter T ButtsUniversity of California Irvine

Abstract

Modern social network analysismdashthe analysis of relational data arising from socialsystemsmdashis a computationally intensive area of research Here we provide an overview ofa software package which provides support for a range of network analytic functionalitywithin the R statistical computing environment General categories of currently supportedfunctionality are described and brief examples of package syntax and usage are shown

Keywords social network analysis graphs sna statnet R

1 Introduction and overview

Far more so than many other domains of social science modern social network analysis (SNA)is a computationally intensive affair Techniques based on eigensolutions (eg eigenvector andBonacich centrality multidimensional scaling) combinatorial optimization (eg permutationsearch in equivalence analysis structural distancecovariance calculation) shortest-path com-putation (eg betweenness centrality network diameter) and Monte Carlo integration (egQAP and CUG tests) are central to the practice of SNA and indeed the overwhelming ma-jority of current research in this area could not be performed without access to inexpensivecomputational tools

This dependence on computation for research in social network analysis has helped to spawn awide array of software packages to perform network analytic tasks From generalist tools suchas UCINET (Borgatti et al 1999) Pajek (Batagelj and Mrvar 2007) STRUCTURE (Burt1991) StOCNET (Huisman and van Duijn 2003) MultiNet (Richards and Seary 2006) andGRADAP (Stokman and Van Veen 1981) to more specialized applications such as netdraw(Borgatti 2007) SIENA (Snijders 2001) and KrackPlot (Krackhardt et al 1994) (to name afew) a variety of software solutions are available for the network analyst While each of thesepackages has its own assets there continues to be a need for network analysis software whichis simultaneously

2 Social Network Analysis with sna

1 General in coverage incorporating a range of different network analytic techniques

2 Easily extensible to allow for the timely incorporation of new methods andor refine-ments

3 Well-integrated with general purpose statistical computational and visualization toolsso as to facilitate the use of network analysis in conjunction with both end-user exten-sions and broader social science methodology

4 Based on an open codebase which is available for inspection (and hence emulationcorrection and improvement) by the network community

5 Portable to allow use by researchers on a variety of computing platforms and

6 Freely available to network researchers so as to encourage its use among the widestpossible range of scientists practitioners and students

This ldquowish listrdquo of attributes would seem to be a great deal to ask of any single standaloneprogram the emergence of open statistical computing platforms such as R (R DevelopmentCore Team 2007) however has provided a feasible means of realizing such objectives UsingR (which is itself free software in the Stallmanian sense see Stallman 2002) researcherscan easily produce and share packages which supply specialized functionality but which areinteroperable with other statistical computing tools In this vein the sna package was createdas a mechanism for fulfilling the above objectives within the R environment Additionalmotivations for the introduction of sna were to encourage the migration of the social networkcommunity to open source andor free software solutions to facilitate the creation of a sharedframework for dissemination of new methodological developments to further the developmentof statistical network analysis methods by network analysts and to ease the integration ofnetwork methods with those of ldquostandardrdquo statistical analysis

11 Package history

sna began life as a loose collection of S routines (called ldquoVarious Useful Tools for NetworkAnalysis in Srdquo or networkStools) written by the author which were disseminated locallyto social network researchers in and around the research community at Carnegie MellonUniversity and the University of Pittsburgh The first external use of the toolkit of which theauthor is aware was the netlogit analysis employed by Ingram and Roberts (2000) The firstversion of the collection to be generally disseminated (version 01) was released in August of2000 with the first R package version (sna version 03) appearing in May of 2001 Multiplereleases followed over subsequent years with the package reaching the ldquo10rdquo landmark inAugust of 2005 Development has been ongoing as of the time of this writing the package ison version 15

12 sna and statnet

As noted above a major goal in introducing sna was the creation of a foundation for ongoingdevelopment of tools within the network analysis community The statnet project (Handcocket al 2003) represents the latest incarnation of that objective (much as BioConductor Gentle-man et al 2004 serves as a site for tool development within the bioinformatics community)

Journal of Statistical Software 3

in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

13 Functionality

At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

Functions for positional and role analysis including structural equivalence and block-modeling

Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

4 Social Network Analysis with sna

custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

14 Terminology and data representation

As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments

    2 Social Network Analysis with sna

    1 General in coverage incorporating a range of different network analytic techniques

    2 Easily extensible to allow for the timely incorporation of new methods andor refine-ments

    3 Well-integrated with general purpose statistical computational and visualization toolsso as to facilitate the use of network analysis in conjunction with both end-user exten-sions and broader social science methodology

    4 Based on an open codebase which is available for inspection (and hence emulationcorrection and improvement) by the network community

    5 Portable to allow use by researchers on a variety of computing platforms and

    6 Freely available to network researchers so as to encourage its use among the widestpossible range of scientists practitioners and students

    This ldquowish listrdquo of attributes would seem to be a great deal to ask of any single standaloneprogram the emergence of open statistical computing platforms such as R (R DevelopmentCore Team 2007) however has provided a feasible means of realizing such objectives UsingR (which is itself free software in the Stallmanian sense see Stallman 2002) researcherscan easily produce and share packages which supply specialized functionality but which areinteroperable with other statistical computing tools In this vein the sna package was createdas a mechanism for fulfilling the above objectives within the R environment Additionalmotivations for the introduction of sna were to encourage the migration of the social networkcommunity to open source andor free software solutions to facilitate the creation of a sharedframework for dissemination of new methodological developments to further the developmentof statistical network analysis methods by network analysts and to ease the integration ofnetwork methods with those of ldquostandardrdquo statistical analysis

    11 Package history

    sna began life as a loose collection of S routines (called ldquoVarious Useful Tools for NetworkAnalysis in Srdquo or networkStools) written by the author which were disseminated locallyto social network researchers in and around the research community at Carnegie MellonUniversity and the University of Pittsburgh The first external use of the toolkit of which theauthor is aware was the netlogit analysis employed by Ingram and Roberts (2000) The firstversion of the collection to be generally disseminated (version 01) was released in August of2000 with the first R package version (sna version 03) appearing in May of 2001 Multiplereleases followed over subsequent years with the package reaching the ldquo10rdquo landmark inAugust of 2005 Development has been ongoing as of the time of this writing the package ison version 15

    12 sna and statnet

    As noted above a major goal in introducing sna was the creation of a foundation for ongoingdevelopment of tools within the network analysis community The statnet project (Handcocket al 2003) represents the latest incarnation of that objective (much as BioConductor Gentle-man et al 2004 serves as a site for tool development within the bioinformatics community)

    Journal of Statistical Software 3

    in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

    13 Functionality

    At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

    Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

    Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

    Functions for positional and role analysis including structural equivalence and block-modeling

    Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

    Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

    Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

    Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

    Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

    4 Social Network Analysis with sna

    custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

    The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

    14 Terminology and data representation

    As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

    It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

    Journal of Statistical Software 5

    literature cited within the package manual

    With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

    Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

    In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

    Importing relational data into R

    Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

    Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

    6 Social Network Analysis with sna

    2 Package highlights

    Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

    21 Random graph generation

    sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

    In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

    Example

    To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

    Rgt library(sna)

    Rgt setseed(1913)

    As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

    Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

    Rgt gden(g)

    [1] 01000000 08666667 05333333

    1rgraph can also be employed to simulate valued graphs via a resampling procedure

    Journal of Statistical Software 7

    gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

    Rgt gp lt- sapply((110) 10 rep 10)

    Rgt g lt- rgraph(10 tprob = gp)

    Rgt g

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

    Rgt apply(g 2 mean)

    [1] 00 02 03 03 06 03 06 07 08 09

    Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

    Rgt g lt- rgnm(5 10 12)

    Rgt apply(g 1 sum)

    [1] 12 12 12 12 12

    As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

    8 Social Network Analysis with sna

    or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

    Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

    Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

    Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

    Rgt k10

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

    Rgt t10

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

    Rgt n10

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

    Journal of Statistical Software 9

    [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

    When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

    Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

    Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

    [1] 01482828

    Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

    [1] 004646465

    Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

    [1] 08052525

    By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

    More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

    Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

    + d = 015))

    10 Social Network Analysis with sna

    with the magnitude of the specified effects depending on the exact choice of parameters

    Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

    Rgt g lt- matrix(0 10 10)

    Rgt g[1] lt- 1

    Rgt g2 lt- rewirews(g 05)[1]

    Rgt g2

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

    Rgt sum(g - g2) == 0

    [1] TRUE

    Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

    Rgt g3 lt- rmperm(g2)

    Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

    [1] TRUE

    Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

    22 Visualization and data manipulation

    Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

    Journal of Statistical Software 11

    extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

    Rgt g lt- rgraph(5)

    Rgt evaledgeperturbation(g 1 2 centralization betweenness)

    [1] 007291667

    Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

    Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

    In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

    Neighborhood and ego net functions

    The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

    12 Social Network Analysis with sna

    is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

    While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

    In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

    To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

    Rgt g lt- rgraph(10 tp = 15 9)

    Rgt gin lt- egoextract(g neighborhood = in)

    Rgt gout lt- egoextract(g neighborhood = out)

    Rgt gcomb lt- egoextract(g neighborhood = combined)

    Rgt gcomb[13]

    $`1`[1] [2] [3] [4]

    [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

    $`2`[1] [2] [3] [4]

    [1] 0 1 0 0[2] 1 0 0 0

    Journal of Statistical Software 13

    [3] 1 0 0 0[4] 1 0 1 0

    $`3`[1] [2] [3] [4]

    [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

    Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

    [1] TRUE

    Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

    [1] TRUE

    Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

    [1] TRUE

    Rgt egosize lt- sapply(gcomb NROW)

    Rgt if(any(egosize gt 2))

    + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

    1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

    8 9 10000000000 008333333 000000000

    Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

    Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

    Rgt g lt- rgraph(6)

    Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

    [1] TRUE

    14 Social Network Analysis with sna

    Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

    [1] TRUE

    Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

    + cmode = freeman) 2)

    [1] TRUE

    Rgt gapply(g c(1 2) 16 mean)

    [1] 400 300 300 550 325 325

    Rgt gapply(g c(1 2) 16 mean distance = 2)

    [1] 40 38 36 34 32 30

    To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

    Rgt g lt- rgraph(10 tp = 29)

    Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

    Rgt par(mfrow=c(33))

    Rgt for(i in 19)

    + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

    Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

    + partial = FALSE)

    Rgt par(mfrow = c(3 3))

    Rgt for(i in 19)

    + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

    Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

    Visualization

    Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

    Journal of Statistical Software 15

    Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

    Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

    Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

    Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

    in the ith panel iff vprime belongs to the ith order partial neighborhood of v

    and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

    While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

    Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

    16 Social Network Analysis with sna

    Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

    Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

    Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

    Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

    elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

    All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

    gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

    Rgt g lt- rgraph(5 diag = TRUE)

    Journal of Statistical Software 17

    Default Curved Edges MDS Layout

    Circular Layout Sociomatrix

    1

    2

    3

    4

    5

    1 2 3 4 5

    1

    2

    3

    4

    5

    Multiple Options

    1

    2

    3

    4

    5

    Figure 3 Sample visualizations using gplot with multiple layout and display options

    Rgt par(mfrow = c(2 3))

    Rgt gplot(g main = Default)

    Rgt gplot(g usecurv = TRUE main = Curved Edges)

    Rgt gplot(g mode = mds main = MDS Layout)

    Rgt gplot(g mode = circle main = Circular Layout)

    Rgt plotsociomatrix(g main = Sociomatrix)

    Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

    + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

    + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

    Output from the above is shown in Figure 3

    Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

    Rgt gplot3d(rgws(1 5 3 1 0))

    Rgt gplot3d(rgws(1 5 3 1 005))

    18 Social Network Analysis with sna

    Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

    Rgt gplot3d(rgws(1 5 3 1 02))

    Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

    As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

    Rgt par(mfrow = c(1 3))

    Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

    + xlab = ylab = main = gplotvertex Example)

    Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

    + col = 110 sides = 312 radius = 01)

    Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

    Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

    Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

    + xlab = ylab = main = gplotloop Example)

    Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

    + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

    + arrowhead = TRUE)

    Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

    + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

    The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

    23 Descriptive indices

    The literature of social network analysis is rich with descriptive indices of various sorts

    gplot3d1gif
    Media File (imagegif)
    gplot3d2gif
    Media File (imagegif)
    gplot3d3gif
    Media File (imagegif)

    Journal of Statistical Software 19

    minus15 minus10 minus05 00 05 10 15

    minus15

    minus10

    minus05

    00

    05

    10

    15

    gplotvertex Example

    10 12 14 16 18 20

    10

    12

    14

    16

    18

    20

    gplotarrow Example

    minus2 minus1 0 1 2

    minus2minus1

    01

    2

    gplotloop Example

    Figure 5 Examples of the use of gplot supplemental functions

    all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

    Node-level indices

    Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

    sum(vprimevprimeprime)subV v

    gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

    G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

    equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

    vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

    20 Social Network Analysis with sna

    closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

    Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

    An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

    Journal of Statistical Software 21

    the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

    To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

    Rgt dat lt- rgraph(10)

    Rgt degree(dat cmode = indegree)

    [1] 4 4 8 2 4 5 4 4 3 6

    Rgt degree(dat cmode = outdegree)

    [1] 6 3 5 2 5 4 4 4 5 6

    Rgt degree(dat)

    [1] 10 7 13 4 9 9 8 8 8 12

    Rgt closeness(dat)

    [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

    Rgt betweenness(dat)

    [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

    Rgt stresscent(dat)

    [1] 21 6 27 1 14 15 6 7 7 21

    Rgt graphcent(dat)

    [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

    Rgt evcent(dat)

    [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

    22 Social Network Analysis with sna

    Rgt infocent(dat)

    [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

    As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

    Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

    [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

    Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

    + evcent(dat rescale = TRUE)) lt 1e-10)

    [1] TRUE

    Rgt bonpow(dat exponent = -05)

    [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

    As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

    Rgt memb lt- sample(13 10 replace = TRUE)

    Rgt summary(brokerage(dat memb))

    Gould-Fernandez Brokerage Analysis

    Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

    w_I 50000 58638 27314 -03162 07518

    Journal of Statistical Software 23

    w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

    Individual Properties (by Group)

    Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

    [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

    b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

    Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

    [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

    t[1] -07838541[2] 14877951

    Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

    [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

    b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

    Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

    24 Social Network Analysis with sna

    for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

    Graph-level indices

    Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

    C(G) =|V |sumi=1

    [(maxvisinV

    c (vG))minus c (vi G)

    ] (1)

    ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

    C(G) = |V | [clowast(G)minus c(G)] (2)

    where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

    i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

    2For instance when all vertices are automorphically equivalent

    Journal of Statistical Software 25

    centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

    although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

    In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

    The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

    Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

    Rgt gden(g)

    [1] 006666667 031111111 054444444 072222222 093333333

    Rgt grecip(g)

    [1] 08666667 03777778 04888889 06666667 08666667

    Rgt grecip(g measure = edgewise)

    [1] 00000000 00000000 05306122 07692308 09285714

    Rgt grecip(g) == 1 - hierarchy(g)

    [1] TRUE TRUE TRUE TRUE TRUE

    Rgt gtrans(g)

    [1] 10000000 02957746 05047619 06809651 09326923

    Rgt gtrans(g measure = weakcensus)

    3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

    26 Social Network Analysis with sna

    [1] 0 21 106 254 582

    Rgt connectedness(g)

    [1] 04666667 10000000 10000000 10000000 10000000

    Rgt efficiency(g)

    [1] 100000000 076543210 050617284 030864198 007407407

    Rgt hierarchy(g measure = krackhardt)

    [1] 10 02 00 00 00

    Rgt lubness(g)

    [1] 02 10 10 10 10

    centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

    Rgt centralization(g degree cmode = outdegree)

    [1] 01728395

    Rgt centralization(g betweenness)

    [1] 0

    Rgt apply(g 1 centralization degree cmode = outdegree)

    [1] 017283951 027160494 038271605 006172840 007407407

    Rgt apply(g 1 centralization betweenness)

    [1] 0000000000 0135802469 0043467078 0021237507 0004151969

    As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

    Journal of Statistical Software 27

    Rgt o2scent lt- function(dat tmaxdev = FALSE )

    + n lt- NROW(dat)

    + if(tmaxdev)

    + return((n-1) choose(n-1 2))

    + odeg lt- degree(dat cmode = outdegree)

    + choose(odeg 2)

    +

    Rgt apply(g 1 centralization o2scent)

    [1] 002160494 020370370 054012346 008950617 014506173

    Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

    24 Connectivity and subgraph statistics

    Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

    sumNj=1

    sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

    is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

    At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

    28 Social Network Analysis with sna

    subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

    Example

    To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

    Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

    Rgt apply(dyadcensus(g1) 2 mean)

    Mut Asym Null100 1284 3116

    Rgt apply(triadcensus(g1) 2 mean)

    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

    Journal of Statistical Software 29

    120C 210 300030 000 000

    Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

    Rgt apply(dyadcensus(g2) 2 mean)

    Mut Asym Null884 926 2690

    Rgt apply(triadcensus(g2) 2 mean)

    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

    Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

    Rgt apply(dyadcensus(g3) 2 mean)

    Mut Asym Null894 2044 1562

    Rgt apply(triadcensus(g3) 2 mean)

    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

    Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

    + dyadictabulation = bylength)$pathcount

    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

    Rgt kcyclecensus(g3[1] maxlen = 5

    + cyclecomembership = bylength)$cyclecount

    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

    30 Social Network Analysis with sna

    Rgt componentdist(g3[1])

    $membership[1] 1 1 1 1 1 1 1 1 1 1

    $csize[1] 10

    $cdist[1] 0 0 0 0 0 0 0 0 0 1

    Rgt structurestatistics(g3[1])

    0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

    In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

    Rgt g4 lt- g1[12]

    Rgt g4[2] lt- g2[1]

    Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

    + g1 = 1 g2 = 2)

    Rgt summary(cug)

    CUG Test Results

    Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

    Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

    Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

    Rgt summary(cug)

    Journal of Statistical Software 31

    CUG Test Results

    Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

    Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

    A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

    25 Position and role analysis

    The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

    In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

    32 Social Network Analysis with sna

    This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

    After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

    The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

    Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

    Example

    To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

    Journal of Statistical Software 33

    with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

    Rgt gp lt- sapply(runif(20 0 1) rep 20)

    Rgt g lt- rgraph(20 tprob = gp)

    Rgt eq lt- equivclust(g)

    Rgt b lt- blockmodel(g eq h = 15)

    Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

    Rgt ge

    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

    26 Exploratory edge set comparison

    One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

    cov(GH) =

    sum(ij)

    (AG

    ij minus microG

    )(AH

    ij minus microH

    )|V | (|V | minus 1)

    (3)

    34 Social Network Analysis with sna

    where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

    (ij)AXij is the graph mean The graph variance is then cov(GG)

    and the graph correlation ρ(GH) = cov(GH)radic

    cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

    The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

    Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

    In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

    Journal of Statistical Software 35

    Example

    We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

    Rgt g1 lt- rgraph(5)

    Rgt g2 lt -rgraph(5)

    Rgt g3 lt- rmperm(g2)

    Rgt gcor(g1 g2)

    [1] -01336306

    Rgt gcor(g1 g3)

    [1] 008908708

    Rgt gcor(g2 g3)

    [1] -04583333

    Rgt gscor(g1 g2 reps = 1e5)

    [1] 05345225

    Rgt gscor(g1 g3 reps = 1e5)

    [1] 05345225

    Rgt gscor(g2 g3 reps = 1e5)

    [1] 1

    Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

    Rgt x lt- rgraph(20 4)

    Rgt y lt- x[1] + 4 x[2] + 2 x[3]

    Rgt nl lt- netlm(y x)

    Rgt summary(nl)

    36 Social Network Analysis with sna

    OLS Network Model

    Residuals0 25 50 75 100

    -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

    CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

    (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

    Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

    Test Diagnostics

    Null Hypothesis qapReplications 1000Coefficient Distribution Summary

    (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

    As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

    Rgt x lt- rgraph(20 4)

    Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

    Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

    Rgt y lt- rgraph(20 tprob = yp)

    Rgt nl lt- netlogit(y x)

    Rgt summary(nl)

    Network Logit Model

    Coefficients

    Journal of Statistical Software 37

    Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

    Goodness of Fit Statistics

    Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

    3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

    (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

    Contingency Table (predicted (rows) x actual (cols))

    0 10 0 01 39 341

    Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

    Test Diagnostics

    Null Hypothesis qapReplications 1000Distribution Summary

    (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

    It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

    38 Social Network Analysis with sna

    parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

    27 Network inference and process models

    A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

    Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

    Journal of Statistical Software 39

    of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

    )prodk

    (1minusPr(Bk)

    )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

    While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

    y =

    (wsum

    i=1

    θiWi

    )y + Xβ + ε (4)

    ε =

    (zsum

    i=1

    ψiZi

    )ε+ ν (5)

    where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

    40 Social Network Analysis with sna

    Example

    To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

    Rgt g lt- rgraph(20)

    Rgt ep lt- rbeta(20 1 25)

    Rgt em lt- rbeta(20 15 25)

    Rgt dat lt- array(dim = c(20 20 20))

    Rgt for(i in 120)

    + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

    Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

    Rgt pem lt- matrix(nrow = 20 ncol = 2)

    Rgt pem[1] lt- 2

    Rgt pem[2] lt- 11

    Rgt pep lt- matrix(nrow = 20 ncol = 2)

    Rgt pep[1] lt- 2

    Rgt pep[2] lt- 11

    Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

    + epprior = pep burntime = 300 draws = 100)

    Rgt summary(b)

    Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

    Multiple Error Probability Model

    Marginal Posterior Network Distribution

    a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

    Journal of Statistical Software 41

    a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

    a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

    Marginal Posterior Global Error Distribution

    e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

    Marginal Posterior Error Distribution (by observer)

    Probability of False Negatives (e^-)

    Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

    42 Social Network Analysis with sna

    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

    Probability of False Positives (e^+)

    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

    MCMC Diagnostics

    Replicate Chains 5Burn Time 300

    Journal of Statistical Software 43

    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

    Max 1003116Med 09992194IQR 00004545115

    Rgt cor(em apply(b$em 2 median))

    [1] 09187894

    Rgt cor(ep apply(b$ep 2 median))

    [1] 0971649

    Rgt mean(apply(b$net c(2 3) median) == g)

    [1] 1

    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

    Rgt mean(consensus(dat method = LASintersection) == g)

    [1] 07725

    Rgt mean(consensus(dat method = LASunion) == g)

    [1] 0905

    Rgt mean(consensus(dat method = centralgraph) == g)

    [1] 09575

    Rgt mean(consensus(dat method = romneybatchelder) == g)

    44 Social Network Analysis with sna

    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

    Rgt w1 lt- rgraph(50)

    Rgt w2 lt- rgraph(50)

    Rgt x lt- matrix(rnorm(50 5) 50 5)

    Rgt r1 lt- 02

    Rgt r2 lt- 03

    Rgt sigma lt- 01

    Rgt beta lt- rnorm(5)

    Rgt nu lt- rnorm(50 0 sigma)

    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

    Rgt fit lt- lnam(y x w1 w2)

    Rgt summary(fit)

    Calllnam(y = y x = x W1 = w1 W2 = w2)

    ResidualsMin 1Q Median 3Q Max

    -052052 -018305 001156 015557 062082

    CoefficientsEstimate Std Error Z value Pr(gt|z|)

    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

    Journal of Statistical Software 45

    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

    Estimate Std ErrorSigma 009597 922e-05

    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

    3 Closing comments

    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

    Acknowledgments

    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

    46 Social Network Analysis with sna

    minus3 minus2 minus1 0 1 2

    minus3minus2

    minus10

    12

    Fitted vs Observed Values

    y

    y

    minus3 minus2 minus1 0 1 2

    minus02

    minus01

    00

    01

    02

    Fitted Values vs Estimated Disturbances

    y

    ν

    minus2 minus1 0 1 2

    minus04

    minus02

    00

    02

    04

    06

    Normal QminusQ Residual Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    Net Influence Plot

    Figure 6 Plot method output for lnam

    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

    References

    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

    Journal of Statistical Software 47

    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

    48 Social Network Analysis with sna

    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

    Journal of Statistical Software 49

    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

    50 Social Network Analysis with sna

    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

    Journal of Statistical Software 51

    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

    Affiliation

    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

    • Introduction and overview
      • Package history
      • sna and statnet
      • Functionality
      • Terminology and data representation
        • Importing relational data into R
            • Package highlights
              • Random graph generation
                • Example
                  • Visualization and data manipulation
                    • Neighborhood and ego net functions
                    • Visualization
                      • Descriptive indices
                        • Node-level indices
                        • Graph-level indices
                          • Connectivity and subgraph statistics
                            • Example
                              • Position and role analysis
                                • Example
                                  • Exploratory edge set comparison
                                    • Example
                                      • Network inference and process models
                                        • Example
                                            • Closing comments

      Journal of Statistical Software 3

      in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

      13 Functionality

      At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

      Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

      Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

      Functions for positional and role analysis including structural equivalence and block-modeling

      Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

      Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

      Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

      Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

      Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

      4 Social Network Analysis with sna

      custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

      The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

      14 Terminology and data representation

      As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

      It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

      Journal of Statistical Software 5

      literature cited within the package manual

      With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

      Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

      In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

      Importing relational data into R

      Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

      Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

      6 Social Network Analysis with sna

      2 Package highlights

      Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

      21 Random graph generation

      sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

      In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

      Example

      To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

      Rgt library(sna)

      Rgt setseed(1913)

      As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

      Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

      Rgt gden(g)

      [1] 01000000 08666667 05333333

      1rgraph can also be employed to simulate valued graphs via a resampling procedure

      Journal of Statistical Software 7

      gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

      Rgt gp lt- sapply((110) 10 rep 10)

      Rgt g lt- rgraph(10 tprob = gp)

      Rgt g

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

      Rgt apply(g 2 mean)

      [1] 00 02 03 03 06 03 06 07 08 09

      Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

      Rgt g lt- rgnm(5 10 12)

      Rgt apply(g 1 sum)

      [1] 12 12 12 12 12

      As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

      8 Social Network Analysis with sna

      or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

      Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

      Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

      Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

      Rgt k10

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

      Rgt t10

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

      Rgt n10

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

      Journal of Statistical Software 9

      [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

      When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

      Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

      Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

      [1] 01482828

      Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

      [1] 004646465

      Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

      [1] 08052525

      By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

      More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

      Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

      + d = 015))

      10 Social Network Analysis with sna

      with the magnitude of the specified effects depending on the exact choice of parameters

      Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

      Rgt g lt- matrix(0 10 10)

      Rgt g[1] lt- 1

      Rgt g2 lt- rewirews(g 05)[1]

      Rgt g2

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

      Rgt sum(g - g2) == 0

      [1] TRUE

      Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

      Rgt g3 lt- rmperm(g2)

      Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

      [1] TRUE

      Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

      22 Visualization and data manipulation

      Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

      Journal of Statistical Software 11

      extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

      Rgt g lt- rgraph(5)

      Rgt evaledgeperturbation(g 1 2 centralization betweenness)

      [1] 007291667

      Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

      Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

      In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

      Neighborhood and ego net functions

      The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

      12 Social Network Analysis with sna

      is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

      While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

      In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

      To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

      Rgt g lt- rgraph(10 tp = 15 9)

      Rgt gin lt- egoextract(g neighborhood = in)

      Rgt gout lt- egoextract(g neighborhood = out)

      Rgt gcomb lt- egoextract(g neighborhood = combined)

      Rgt gcomb[13]

      $`1`[1] [2] [3] [4]

      [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

      $`2`[1] [2] [3] [4]

      [1] 0 1 0 0[2] 1 0 0 0

      Journal of Statistical Software 13

      [3] 1 0 0 0[4] 1 0 1 0

      $`3`[1] [2] [3] [4]

      [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

      Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

      [1] TRUE

      Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

      [1] TRUE

      Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

      [1] TRUE

      Rgt egosize lt- sapply(gcomb NROW)

      Rgt if(any(egosize gt 2))

      + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

      1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

      8 9 10000000000 008333333 000000000

      Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

      Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

      Rgt g lt- rgraph(6)

      Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

      [1] TRUE

      14 Social Network Analysis with sna

      Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

      [1] TRUE

      Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

      + cmode = freeman) 2)

      [1] TRUE

      Rgt gapply(g c(1 2) 16 mean)

      [1] 400 300 300 550 325 325

      Rgt gapply(g c(1 2) 16 mean distance = 2)

      [1] 40 38 36 34 32 30

      To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

      Rgt g lt- rgraph(10 tp = 29)

      Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

      Rgt par(mfrow=c(33))

      Rgt for(i in 19)

      + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

      Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

      + partial = FALSE)

      Rgt par(mfrow = c(3 3))

      Rgt for(i in 19)

      + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

      Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

      Visualization

      Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

      Journal of Statistical Software 15

      Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

      Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

      Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

      Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

      in the ith panel iff vprime belongs to the ith order partial neighborhood of v

      and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

      While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

      Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

      16 Social Network Analysis with sna

      Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

      Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

      Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

      Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

      elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

      All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

      gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

      Rgt g lt- rgraph(5 diag = TRUE)

      Journal of Statistical Software 17

      Default Curved Edges MDS Layout

      Circular Layout Sociomatrix

      1

      2

      3

      4

      5

      1 2 3 4 5

      1

      2

      3

      4

      5

      Multiple Options

      1

      2

      3

      4

      5

      Figure 3 Sample visualizations using gplot with multiple layout and display options

      Rgt par(mfrow = c(2 3))

      Rgt gplot(g main = Default)

      Rgt gplot(g usecurv = TRUE main = Curved Edges)

      Rgt gplot(g mode = mds main = MDS Layout)

      Rgt gplot(g mode = circle main = Circular Layout)

      Rgt plotsociomatrix(g main = Sociomatrix)

      Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

      + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

      + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

      Output from the above is shown in Figure 3

      Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

      Rgt gplot3d(rgws(1 5 3 1 0))

      Rgt gplot3d(rgws(1 5 3 1 005))

      18 Social Network Analysis with sna

      Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

      Rgt gplot3d(rgws(1 5 3 1 02))

      Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

      As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

      Rgt par(mfrow = c(1 3))

      Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

      + xlab = ylab = main = gplotvertex Example)

      Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

      + col = 110 sides = 312 radius = 01)

      Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

      Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

      Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

      + xlab = ylab = main = gplotloop Example)

      Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

      + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

      + arrowhead = TRUE)

      Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

      + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

      The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

      23 Descriptive indices

      The literature of social network analysis is rich with descriptive indices of various sorts

      gplot3d1gif
      Media File (imagegif)
      gplot3d2gif
      Media File (imagegif)
      gplot3d3gif
      Media File (imagegif)

      Journal of Statistical Software 19

      minus15 minus10 minus05 00 05 10 15

      minus15

      minus10

      minus05

      00

      05

      10

      15

      gplotvertex Example

      10 12 14 16 18 20

      10

      12

      14

      16

      18

      20

      gplotarrow Example

      minus2 minus1 0 1 2

      minus2minus1

      01

      2

      gplotloop Example

      Figure 5 Examples of the use of gplot supplemental functions

      all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

      Node-level indices

      Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

      sum(vprimevprimeprime)subV v

      gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

      G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

      equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

      vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

      20 Social Network Analysis with sna

      closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

      Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

      An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

      Journal of Statistical Software 21

      the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

      To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

      Rgt dat lt- rgraph(10)

      Rgt degree(dat cmode = indegree)

      [1] 4 4 8 2 4 5 4 4 3 6

      Rgt degree(dat cmode = outdegree)

      [1] 6 3 5 2 5 4 4 4 5 6

      Rgt degree(dat)

      [1] 10 7 13 4 9 9 8 8 8 12

      Rgt closeness(dat)

      [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

      Rgt betweenness(dat)

      [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

      Rgt stresscent(dat)

      [1] 21 6 27 1 14 15 6 7 7 21

      Rgt graphcent(dat)

      [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

      Rgt evcent(dat)

      [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

      22 Social Network Analysis with sna

      Rgt infocent(dat)

      [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

      As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

      Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

      [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

      Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

      + evcent(dat rescale = TRUE)) lt 1e-10)

      [1] TRUE

      Rgt bonpow(dat exponent = -05)

      [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

      As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

      Rgt memb lt- sample(13 10 replace = TRUE)

      Rgt summary(brokerage(dat memb))

      Gould-Fernandez Brokerage Analysis

      Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

      w_I 50000 58638 27314 -03162 07518

      Journal of Statistical Software 23

      w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

      Individual Properties (by Group)

      Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

      [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

      b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

      Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

      [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

      t[1] -07838541[2] 14877951

      Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

      [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

      b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

      Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

      24 Social Network Analysis with sna

      for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

      Graph-level indices

      Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

      C(G) =|V |sumi=1

      [(maxvisinV

      c (vG))minus c (vi G)

      ] (1)

      ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

      C(G) = |V | [clowast(G)minus c(G)] (2)

      where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

      i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

      2For instance when all vertices are automorphically equivalent

      Journal of Statistical Software 25

      centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

      although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

      In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

      The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

      Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

      Rgt gden(g)

      [1] 006666667 031111111 054444444 072222222 093333333

      Rgt grecip(g)

      [1] 08666667 03777778 04888889 06666667 08666667

      Rgt grecip(g measure = edgewise)

      [1] 00000000 00000000 05306122 07692308 09285714

      Rgt grecip(g) == 1 - hierarchy(g)

      [1] TRUE TRUE TRUE TRUE TRUE

      Rgt gtrans(g)

      [1] 10000000 02957746 05047619 06809651 09326923

      Rgt gtrans(g measure = weakcensus)

      3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

      26 Social Network Analysis with sna

      [1] 0 21 106 254 582

      Rgt connectedness(g)

      [1] 04666667 10000000 10000000 10000000 10000000

      Rgt efficiency(g)

      [1] 100000000 076543210 050617284 030864198 007407407

      Rgt hierarchy(g measure = krackhardt)

      [1] 10 02 00 00 00

      Rgt lubness(g)

      [1] 02 10 10 10 10

      centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

      Rgt centralization(g degree cmode = outdegree)

      [1] 01728395

      Rgt centralization(g betweenness)

      [1] 0

      Rgt apply(g 1 centralization degree cmode = outdegree)

      [1] 017283951 027160494 038271605 006172840 007407407

      Rgt apply(g 1 centralization betweenness)

      [1] 0000000000 0135802469 0043467078 0021237507 0004151969

      As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

      Journal of Statistical Software 27

      Rgt o2scent lt- function(dat tmaxdev = FALSE )

      + n lt- NROW(dat)

      + if(tmaxdev)

      + return((n-1) choose(n-1 2))

      + odeg lt- degree(dat cmode = outdegree)

      + choose(odeg 2)

      +

      Rgt apply(g 1 centralization o2scent)

      [1] 002160494 020370370 054012346 008950617 014506173

      Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

      24 Connectivity and subgraph statistics

      Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

      sumNj=1

      sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

      is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

      At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

      28 Social Network Analysis with sna

      subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

      Example

      To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

      Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

      Rgt apply(dyadcensus(g1) 2 mean)

      Mut Asym Null100 1284 3116

      Rgt apply(triadcensus(g1) 2 mean)

      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

      Journal of Statistical Software 29

      120C 210 300030 000 000

      Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

      Rgt apply(dyadcensus(g2) 2 mean)

      Mut Asym Null884 926 2690

      Rgt apply(triadcensus(g2) 2 mean)

      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

      Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

      Rgt apply(dyadcensus(g3) 2 mean)

      Mut Asym Null894 2044 1562

      Rgt apply(triadcensus(g3) 2 mean)

      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

      Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

      + dyadictabulation = bylength)$pathcount

      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

      Rgt kcyclecensus(g3[1] maxlen = 5

      + cyclecomembership = bylength)$cyclecount

      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

      30 Social Network Analysis with sna

      Rgt componentdist(g3[1])

      $membership[1] 1 1 1 1 1 1 1 1 1 1

      $csize[1] 10

      $cdist[1] 0 0 0 0 0 0 0 0 0 1

      Rgt structurestatistics(g3[1])

      0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

      In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

      Rgt g4 lt- g1[12]

      Rgt g4[2] lt- g2[1]

      Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

      + g1 = 1 g2 = 2)

      Rgt summary(cug)

      CUG Test Results

      Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

      Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

      Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

      Rgt summary(cug)

      Journal of Statistical Software 31

      CUG Test Results

      Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

      Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

      A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

      25 Position and role analysis

      The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

      In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

      32 Social Network Analysis with sna

      This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

      After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

      The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

      Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

      Example

      To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

      Journal of Statistical Software 33

      with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

      Rgt gp lt- sapply(runif(20 0 1) rep 20)

      Rgt g lt- rgraph(20 tprob = gp)

      Rgt eq lt- equivclust(g)

      Rgt b lt- blockmodel(g eq h = 15)

      Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

      Rgt ge

      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

      26 Exploratory edge set comparison

      One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

      cov(GH) =

      sum(ij)

      (AG

      ij minus microG

      )(AH

      ij minus microH

      )|V | (|V | minus 1)

      (3)

      34 Social Network Analysis with sna

      where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

      (ij)AXij is the graph mean The graph variance is then cov(GG)

      and the graph correlation ρ(GH) = cov(GH)radic

      cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

      The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

      Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

      In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

      Journal of Statistical Software 35

      Example

      We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

      Rgt g1 lt- rgraph(5)

      Rgt g2 lt -rgraph(5)

      Rgt g3 lt- rmperm(g2)

      Rgt gcor(g1 g2)

      [1] -01336306

      Rgt gcor(g1 g3)

      [1] 008908708

      Rgt gcor(g2 g3)

      [1] -04583333

      Rgt gscor(g1 g2 reps = 1e5)

      [1] 05345225

      Rgt gscor(g1 g3 reps = 1e5)

      [1] 05345225

      Rgt gscor(g2 g3 reps = 1e5)

      [1] 1

      Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

      Rgt x lt- rgraph(20 4)

      Rgt y lt- x[1] + 4 x[2] + 2 x[3]

      Rgt nl lt- netlm(y x)

      Rgt summary(nl)

      36 Social Network Analysis with sna

      OLS Network Model

      Residuals0 25 50 75 100

      -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

      CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

      (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

      Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

      Test Diagnostics

      Null Hypothesis qapReplications 1000Coefficient Distribution Summary

      (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

      As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

      Rgt x lt- rgraph(20 4)

      Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

      Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

      Rgt y lt- rgraph(20 tprob = yp)

      Rgt nl lt- netlogit(y x)

      Rgt summary(nl)

      Network Logit Model

      Coefficients

      Journal of Statistical Software 37

      Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

      Goodness of Fit Statistics

      Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

      3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

      (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

      Contingency Table (predicted (rows) x actual (cols))

      0 10 0 01 39 341

      Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

      Test Diagnostics

      Null Hypothesis qapReplications 1000Distribution Summary

      (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

      It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

      38 Social Network Analysis with sna

      parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

      27 Network inference and process models

      A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

      Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

      Journal of Statistical Software 39

      of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

      )prodk

      (1minusPr(Bk)

      )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

      While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

      y =

      (wsum

      i=1

      θiWi

      )y + Xβ + ε (4)

      ε =

      (zsum

      i=1

      ψiZi

      )ε+ ν (5)

      where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

      40 Social Network Analysis with sna

      Example

      To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

      Rgt g lt- rgraph(20)

      Rgt ep lt- rbeta(20 1 25)

      Rgt em lt- rbeta(20 15 25)

      Rgt dat lt- array(dim = c(20 20 20))

      Rgt for(i in 120)

      + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

      Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

      Rgt pem lt- matrix(nrow = 20 ncol = 2)

      Rgt pem[1] lt- 2

      Rgt pem[2] lt- 11

      Rgt pep lt- matrix(nrow = 20 ncol = 2)

      Rgt pep[1] lt- 2

      Rgt pep[2] lt- 11

      Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

      + epprior = pep burntime = 300 draws = 100)

      Rgt summary(b)

      Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

      Multiple Error Probability Model

      Marginal Posterior Network Distribution

      a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

      Journal of Statistical Software 41

      a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

      a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

      Marginal Posterior Global Error Distribution

      e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

      Marginal Posterior Error Distribution (by observer)

      Probability of False Negatives (e^-)

      Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

      42 Social Network Analysis with sna

      o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

      Probability of False Positives (e^+)

      Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

      MCMC Diagnostics

      Replicate Chains 5Burn Time 300

      Journal of Statistical Software 43

      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

      Max 1003116Med 09992194IQR 00004545115

      Rgt cor(em apply(b$em 2 median))

      [1] 09187894

      Rgt cor(ep apply(b$ep 2 median))

      [1] 0971649

      Rgt mean(apply(b$net c(2 3) median) == g)

      [1] 1

      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

      Rgt mean(consensus(dat method = LASintersection) == g)

      [1] 07725

      Rgt mean(consensus(dat method = LASunion) == g)

      [1] 0905

      Rgt mean(consensus(dat method = centralgraph) == g)

      [1] 09575

      Rgt mean(consensus(dat method = romneybatchelder) == g)

      44 Social Network Analysis with sna

      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

      Rgt w1 lt- rgraph(50)

      Rgt w2 lt- rgraph(50)

      Rgt x lt- matrix(rnorm(50 5) 50 5)

      Rgt r1 lt- 02

      Rgt r2 lt- 03

      Rgt sigma lt- 01

      Rgt beta lt- rnorm(5)

      Rgt nu lt- rnorm(50 0 sigma)

      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

      Rgt fit lt- lnam(y x w1 w2)

      Rgt summary(fit)

      Calllnam(y = y x = x W1 = w1 W2 = w2)

      ResidualsMin 1Q Median 3Q Max

      -052052 -018305 001156 015557 062082

      CoefficientsEstimate Std Error Z value Pr(gt|z|)

      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

      Journal of Statistical Software 45

      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

      Estimate Std ErrorSigma 009597 922e-05

      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

      3 Closing comments

      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

      Acknowledgments

      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

      46 Social Network Analysis with sna

      minus3 minus2 minus1 0 1 2

      minus3minus2

      minus10

      12

      Fitted vs Observed Values

      y

      y

      minus3 minus2 minus1 0 1 2

      minus02

      minus01

      00

      01

      02

      Fitted Values vs Estimated Disturbances

      y

      ν

      minus2 minus1 0 1 2

      minus04

      minus02

      00

      02

      04

      06

      Normal QminusQ Residual Plot

      Theoretical Quantiles

      Sam

      ple

      Qua

      ntile

      s

      Net Influence Plot

      Figure 6 Plot method output for lnam

      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

      References

      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

      Journal of Statistical Software 47

      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

      48 Social Network Analysis with sna

      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

      Journal of Statistical Software 49

      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

      50 Social Network Analysis with sna

      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

      Journal of Statistical Software 51

      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

      Affiliation

      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

      • Introduction and overview
        • Package history
        • sna and statnet
        • Functionality
        • Terminology and data representation
          • Importing relational data into R
              • Package highlights
                • Random graph generation
                  • Example
                    • Visualization and data manipulation
                      • Neighborhood and ego net functions
                      • Visualization
                        • Descriptive indices
                          • Node-level indices
                          • Graph-level indices
                            • Connectivity and subgraph statistics
                              • Example
                                • Position and role analysis
                                  • Example
                                    • Exploratory edge set comparison
                                      • Example
                                        • Network inference and process models
                                          • Example
                                              • Closing comments

        4 Social Network Analysis with sna

        custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

        The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

        14 Terminology and data representation

        As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

        It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

        Journal of Statistical Software 5

        literature cited within the package manual

        With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

        Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

        In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

        Importing relational data into R

        Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

        Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

        6 Social Network Analysis with sna

        2 Package highlights

        Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

        21 Random graph generation

        sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

        In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

        Example

        To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

        Rgt library(sna)

        Rgt setseed(1913)

        As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

        Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

        Rgt gden(g)

        [1] 01000000 08666667 05333333

        1rgraph can also be employed to simulate valued graphs via a resampling procedure

        Journal of Statistical Software 7

        gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

        Rgt gp lt- sapply((110) 10 rep 10)

        Rgt g lt- rgraph(10 tprob = gp)

        Rgt g

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

        Rgt apply(g 2 mean)

        [1] 00 02 03 03 06 03 06 07 08 09

        Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

        Rgt g lt- rgnm(5 10 12)

        Rgt apply(g 1 sum)

        [1] 12 12 12 12 12

        As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

        8 Social Network Analysis with sna

        or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

        Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

        Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

        Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

        Rgt k10

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

        Rgt t10

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

        Rgt n10

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

        Journal of Statistical Software 9

        [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

        When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

        Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

        Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

        [1] 01482828

        Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

        [1] 004646465

        Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

        [1] 08052525

        By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

        More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

        Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

        + d = 015))

        10 Social Network Analysis with sna

        with the magnitude of the specified effects depending on the exact choice of parameters

        Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

        Rgt g lt- matrix(0 10 10)

        Rgt g[1] lt- 1

        Rgt g2 lt- rewirews(g 05)[1]

        Rgt g2

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

        Rgt sum(g - g2) == 0

        [1] TRUE

        Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

        Rgt g3 lt- rmperm(g2)

        Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

        [1] TRUE

        Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

        22 Visualization and data manipulation

        Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

        Journal of Statistical Software 11

        extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

        Rgt g lt- rgraph(5)

        Rgt evaledgeperturbation(g 1 2 centralization betweenness)

        [1] 007291667

        Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

        Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

        In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

        Neighborhood and ego net functions

        The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

        12 Social Network Analysis with sna

        is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

        While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

        In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

        To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

        Rgt g lt- rgraph(10 tp = 15 9)

        Rgt gin lt- egoextract(g neighborhood = in)

        Rgt gout lt- egoextract(g neighborhood = out)

        Rgt gcomb lt- egoextract(g neighborhood = combined)

        Rgt gcomb[13]

        $`1`[1] [2] [3] [4]

        [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

        $`2`[1] [2] [3] [4]

        [1] 0 1 0 0[2] 1 0 0 0

        Journal of Statistical Software 13

        [3] 1 0 0 0[4] 1 0 1 0

        $`3`[1] [2] [3] [4]

        [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

        Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

        [1] TRUE

        Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

        [1] TRUE

        Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

        [1] TRUE

        Rgt egosize lt- sapply(gcomb NROW)

        Rgt if(any(egosize gt 2))

        + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

        1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

        8 9 10000000000 008333333 000000000

        Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

        Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

        Rgt g lt- rgraph(6)

        Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

        [1] TRUE

        14 Social Network Analysis with sna

        Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

        [1] TRUE

        Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

        + cmode = freeman) 2)

        [1] TRUE

        Rgt gapply(g c(1 2) 16 mean)

        [1] 400 300 300 550 325 325

        Rgt gapply(g c(1 2) 16 mean distance = 2)

        [1] 40 38 36 34 32 30

        To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

        Rgt g lt- rgraph(10 tp = 29)

        Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

        Rgt par(mfrow=c(33))

        Rgt for(i in 19)

        + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

        Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

        + partial = FALSE)

        Rgt par(mfrow = c(3 3))

        Rgt for(i in 19)

        + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

        Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

        Visualization

        Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

        Journal of Statistical Software 15

        Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

        Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

        Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

        Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

        in the ith panel iff vprime belongs to the ith order partial neighborhood of v

        and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

        While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

        Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

        16 Social Network Analysis with sna

        Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

        Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

        Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

        Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

        elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

        All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

        gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

        Rgt g lt- rgraph(5 diag = TRUE)

        Journal of Statistical Software 17

        Default Curved Edges MDS Layout

        Circular Layout Sociomatrix

        1

        2

        3

        4

        5

        1 2 3 4 5

        1

        2

        3

        4

        5

        Multiple Options

        1

        2

        3

        4

        5

        Figure 3 Sample visualizations using gplot with multiple layout and display options

        Rgt par(mfrow = c(2 3))

        Rgt gplot(g main = Default)

        Rgt gplot(g usecurv = TRUE main = Curved Edges)

        Rgt gplot(g mode = mds main = MDS Layout)

        Rgt gplot(g mode = circle main = Circular Layout)

        Rgt plotsociomatrix(g main = Sociomatrix)

        Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

        + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

        + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

        Output from the above is shown in Figure 3

        Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

        Rgt gplot3d(rgws(1 5 3 1 0))

        Rgt gplot3d(rgws(1 5 3 1 005))

        18 Social Network Analysis with sna

        Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

        Rgt gplot3d(rgws(1 5 3 1 02))

        Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

        As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

        Rgt par(mfrow = c(1 3))

        Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

        + xlab = ylab = main = gplotvertex Example)

        Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

        + col = 110 sides = 312 radius = 01)

        Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

        Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

        Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

        + xlab = ylab = main = gplotloop Example)

        Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

        + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

        + arrowhead = TRUE)

        Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

        + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

        The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

        23 Descriptive indices

        The literature of social network analysis is rich with descriptive indices of various sorts

        gplot3d1gif
        Media File (imagegif)
        gplot3d2gif
        Media File (imagegif)
        gplot3d3gif
        Media File (imagegif)

        Journal of Statistical Software 19

        minus15 minus10 minus05 00 05 10 15

        minus15

        minus10

        minus05

        00

        05

        10

        15

        gplotvertex Example

        10 12 14 16 18 20

        10

        12

        14

        16

        18

        20

        gplotarrow Example

        minus2 minus1 0 1 2

        minus2minus1

        01

        2

        gplotloop Example

        Figure 5 Examples of the use of gplot supplemental functions

        all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

        Node-level indices

        Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

        sum(vprimevprimeprime)subV v

        gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

        G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

        equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

        vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

        20 Social Network Analysis with sna

        closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

        Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

        An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

        Journal of Statistical Software 21

        the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

        To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

        Rgt dat lt- rgraph(10)

        Rgt degree(dat cmode = indegree)

        [1] 4 4 8 2 4 5 4 4 3 6

        Rgt degree(dat cmode = outdegree)

        [1] 6 3 5 2 5 4 4 4 5 6

        Rgt degree(dat)

        [1] 10 7 13 4 9 9 8 8 8 12

        Rgt closeness(dat)

        [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

        Rgt betweenness(dat)

        [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

        Rgt stresscent(dat)

        [1] 21 6 27 1 14 15 6 7 7 21

        Rgt graphcent(dat)

        [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

        Rgt evcent(dat)

        [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

        22 Social Network Analysis with sna

        Rgt infocent(dat)

        [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

        As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

        Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

        [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

        Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

        + evcent(dat rescale = TRUE)) lt 1e-10)

        [1] TRUE

        Rgt bonpow(dat exponent = -05)

        [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

        As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

        Rgt memb lt- sample(13 10 replace = TRUE)

        Rgt summary(brokerage(dat memb))

        Gould-Fernandez Brokerage Analysis

        Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

        w_I 50000 58638 27314 -03162 07518

        Journal of Statistical Software 23

        w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

        Individual Properties (by Group)

        Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

        [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

        b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

        Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

        [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

        t[1] -07838541[2] 14877951

        Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

        [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

        b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

        Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

        24 Social Network Analysis with sna

        for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

        Graph-level indices

        Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

        C(G) =|V |sumi=1

        [(maxvisinV

        c (vG))minus c (vi G)

        ] (1)

        ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

        C(G) = |V | [clowast(G)minus c(G)] (2)

        where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

        i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

        2For instance when all vertices are automorphically equivalent

        Journal of Statistical Software 25

        centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

        although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

        In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

        The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

        Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

        Rgt gden(g)

        [1] 006666667 031111111 054444444 072222222 093333333

        Rgt grecip(g)

        [1] 08666667 03777778 04888889 06666667 08666667

        Rgt grecip(g measure = edgewise)

        [1] 00000000 00000000 05306122 07692308 09285714

        Rgt grecip(g) == 1 - hierarchy(g)

        [1] TRUE TRUE TRUE TRUE TRUE

        Rgt gtrans(g)

        [1] 10000000 02957746 05047619 06809651 09326923

        Rgt gtrans(g measure = weakcensus)

        3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

        26 Social Network Analysis with sna

        [1] 0 21 106 254 582

        Rgt connectedness(g)

        [1] 04666667 10000000 10000000 10000000 10000000

        Rgt efficiency(g)

        [1] 100000000 076543210 050617284 030864198 007407407

        Rgt hierarchy(g measure = krackhardt)

        [1] 10 02 00 00 00

        Rgt lubness(g)

        [1] 02 10 10 10 10

        centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

        Rgt centralization(g degree cmode = outdegree)

        [1] 01728395

        Rgt centralization(g betweenness)

        [1] 0

        Rgt apply(g 1 centralization degree cmode = outdegree)

        [1] 017283951 027160494 038271605 006172840 007407407

        Rgt apply(g 1 centralization betweenness)

        [1] 0000000000 0135802469 0043467078 0021237507 0004151969

        As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

        Journal of Statistical Software 27

        Rgt o2scent lt- function(dat tmaxdev = FALSE )

        + n lt- NROW(dat)

        + if(tmaxdev)

        + return((n-1) choose(n-1 2))

        + odeg lt- degree(dat cmode = outdegree)

        + choose(odeg 2)

        +

        Rgt apply(g 1 centralization o2scent)

        [1] 002160494 020370370 054012346 008950617 014506173

        Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

        24 Connectivity and subgraph statistics

        Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

        sumNj=1

        sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

        is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

        At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

        28 Social Network Analysis with sna

        subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

        Example

        To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

        Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

        Rgt apply(dyadcensus(g1) 2 mean)

        Mut Asym Null100 1284 3116

        Rgt apply(triadcensus(g1) 2 mean)

        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

        Journal of Statistical Software 29

        120C 210 300030 000 000

        Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

        Rgt apply(dyadcensus(g2) 2 mean)

        Mut Asym Null884 926 2690

        Rgt apply(triadcensus(g2) 2 mean)

        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

        Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

        Rgt apply(dyadcensus(g3) 2 mean)

        Mut Asym Null894 2044 1562

        Rgt apply(triadcensus(g3) 2 mean)

        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

        Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

        + dyadictabulation = bylength)$pathcount

        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

        Rgt kcyclecensus(g3[1] maxlen = 5

        + cyclecomembership = bylength)$cyclecount

        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

        30 Social Network Analysis with sna

        Rgt componentdist(g3[1])

        $membership[1] 1 1 1 1 1 1 1 1 1 1

        $csize[1] 10

        $cdist[1] 0 0 0 0 0 0 0 0 0 1

        Rgt structurestatistics(g3[1])

        0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

        In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

        Rgt g4 lt- g1[12]

        Rgt g4[2] lt- g2[1]

        Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

        + g1 = 1 g2 = 2)

        Rgt summary(cug)

        CUG Test Results

        Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

        Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

        Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

        Rgt summary(cug)

        Journal of Statistical Software 31

        CUG Test Results

        Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

        Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

        A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

        25 Position and role analysis

        The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

        In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

        32 Social Network Analysis with sna

        This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

        After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

        The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

        Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

        Example

        To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

        Journal of Statistical Software 33

        with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

        Rgt gp lt- sapply(runif(20 0 1) rep 20)

        Rgt g lt- rgraph(20 tprob = gp)

        Rgt eq lt- equivclust(g)

        Rgt b lt- blockmodel(g eq h = 15)

        Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

        Rgt ge

        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

        26 Exploratory edge set comparison

        One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

        cov(GH) =

        sum(ij)

        (AG

        ij minus microG

        )(AH

        ij minus microH

        )|V | (|V | minus 1)

        (3)

        34 Social Network Analysis with sna

        where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

        (ij)AXij is the graph mean The graph variance is then cov(GG)

        and the graph correlation ρ(GH) = cov(GH)radic

        cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

        The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

        Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

        In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

        Journal of Statistical Software 35

        Example

        We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

        Rgt g1 lt- rgraph(5)

        Rgt g2 lt -rgraph(5)

        Rgt g3 lt- rmperm(g2)

        Rgt gcor(g1 g2)

        [1] -01336306

        Rgt gcor(g1 g3)

        [1] 008908708

        Rgt gcor(g2 g3)

        [1] -04583333

        Rgt gscor(g1 g2 reps = 1e5)

        [1] 05345225

        Rgt gscor(g1 g3 reps = 1e5)

        [1] 05345225

        Rgt gscor(g2 g3 reps = 1e5)

        [1] 1

        Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

        Rgt x lt- rgraph(20 4)

        Rgt y lt- x[1] + 4 x[2] + 2 x[3]

        Rgt nl lt- netlm(y x)

        Rgt summary(nl)

        36 Social Network Analysis with sna

        OLS Network Model

        Residuals0 25 50 75 100

        -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

        CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

        (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

        Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

        Test Diagnostics

        Null Hypothesis qapReplications 1000Coefficient Distribution Summary

        (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

        As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

        Rgt x lt- rgraph(20 4)

        Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

        Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

        Rgt y lt- rgraph(20 tprob = yp)

        Rgt nl lt- netlogit(y x)

        Rgt summary(nl)

        Network Logit Model

        Coefficients

        Journal of Statistical Software 37

        Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

        Goodness of Fit Statistics

        Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

        3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

        (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

        Contingency Table (predicted (rows) x actual (cols))

        0 10 0 01 39 341

        Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

        Test Diagnostics

        Null Hypothesis qapReplications 1000Distribution Summary

        (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

        It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

        38 Social Network Analysis with sna

        parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

        27 Network inference and process models

        A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

        Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

        Journal of Statistical Software 39

        of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

        )prodk

        (1minusPr(Bk)

        )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

        While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

        y =

        (wsum

        i=1

        θiWi

        )y + Xβ + ε (4)

        ε =

        (zsum

        i=1

        ψiZi

        )ε+ ν (5)

        where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

        40 Social Network Analysis with sna

        Example

        To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

        Rgt g lt- rgraph(20)

        Rgt ep lt- rbeta(20 1 25)

        Rgt em lt- rbeta(20 15 25)

        Rgt dat lt- array(dim = c(20 20 20))

        Rgt for(i in 120)

        + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

        Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

        Rgt pem lt- matrix(nrow = 20 ncol = 2)

        Rgt pem[1] lt- 2

        Rgt pem[2] lt- 11

        Rgt pep lt- matrix(nrow = 20 ncol = 2)

        Rgt pep[1] lt- 2

        Rgt pep[2] lt- 11

        Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

        + epprior = pep burntime = 300 draws = 100)

        Rgt summary(b)

        Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

        Multiple Error Probability Model

        Marginal Posterior Network Distribution

        a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

        Journal of Statistical Software 41

        a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

        a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

        Marginal Posterior Global Error Distribution

        e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

        Marginal Posterior Error Distribution (by observer)

        Probability of False Negatives (e^-)

        Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

        42 Social Network Analysis with sna

        o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

        Probability of False Positives (e^+)

        Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

        MCMC Diagnostics

        Replicate Chains 5Burn Time 300

        Journal of Statistical Software 43

        Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

        Max 1003116Med 09992194IQR 00004545115

        Rgt cor(em apply(b$em 2 median))

        [1] 09187894

        Rgt cor(ep apply(b$ep 2 median))

        [1] 0971649

        Rgt mean(apply(b$net c(2 3) median) == g)

        [1] 1

        Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

        Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

        Rgt mean(consensus(dat method = LASintersection) == g)

        [1] 07725

        Rgt mean(consensus(dat method = LASunion) == g)

        [1] 0905

        Rgt mean(consensus(dat method = centralgraph) == g)

        [1] 09575

        Rgt mean(consensus(dat method = romneybatchelder) == g)

        44 Social Network Analysis with sna

        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

        Rgt w1 lt- rgraph(50)

        Rgt w2 lt- rgraph(50)

        Rgt x lt- matrix(rnorm(50 5) 50 5)

        Rgt r1 lt- 02

        Rgt r2 lt- 03

        Rgt sigma lt- 01

        Rgt beta lt- rnorm(5)

        Rgt nu lt- rnorm(50 0 sigma)

        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

        Rgt fit lt- lnam(y x w1 w2)

        Rgt summary(fit)

        Calllnam(y = y x = x W1 = w1 W2 = w2)

        ResidualsMin 1Q Median 3Q Max

        -052052 -018305 001156 015557 062082

        CoefficientsEstimate Std Error Z value Pr(gt|z|)

        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

        Journal of Statistical Software 45

        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

        Estimate Std ErrorSigma 009597 922e-05

        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

        3 Closing comments

        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

        Acknowledgments

        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

        46 Social Network Analysis with sna

        minus3 minus2 minus1 0 1 2

        minus3minus2

        minus10

        12

        Fitted vs Observed Values

        y

        y

        minus3 minus2 minus1 0 1 2

        minus02

        minus01

        00

        01

        02

        Fitted Values vs Estimated Disturbances

        y

        ν

        minus2 minus1 0 1 2

        minus04

        minus02

        00

        02

        04

        06

        Normal QminusQ Residual Plot

        Theoretical Quantiles

        Sam

        ple

        Qua

        ntile

        s

        Net Influence Plot

        Figure 6 Plot method output for lnam

        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

        References

        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

        Journal of Statistical Software 47

        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

        48 Social Network Analysis with sna

        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

        Journal of Statistical Software 49

        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

        50 Social Network Analysis with sna

        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

        Journal of Statistical Software 51

        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

        Affiliation

        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

        • Introduction and overview
          • Package history
          • sna and statnet
          • Functionality
          • Terminology and data representation
            • Importing relational data into R
                • Package highlights
                  • Random graph generation
                    • Example
                      • Visualization and data manipulation
                        • Neighborhood and ego net functions
                        • Visualization
                          • Descriptive indices
                            • Node-level indices
                            • Graph-level indices
                              • Connectivity and subgraph statistics
                                • Example
                                  • Position and role analysis
                                    • Example
                                      • Exploratory edge set comparison
                                        • Example
                                          • Network inference and process models
                                            • Example
                                                • Closing comments

          Journal of Statistical Software 5

          literature cited within the package manual

          With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

          Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

          In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

          Importing relational data into R

          Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

          Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

          6 Social Network Analysis with sna

          2 Package highlights

          Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

          21 Random graph generation

          sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

          In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

          Example

          To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

          Rgt library(sna)

          Rgt setseed(1913)

          As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

          Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

          Rgt gden(g)

          [1] 01000000 08666667 05333333

          1rgraph can also be employed to simulate valued graphs via a resampling procedure

          Journal of Statistical Software 7

          gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

          Rgt gp lt- sapply((110) 10 rep 10)

          Rgt g lt- rgraph(10 tprob = gp)

          Rgt g

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

          Rgt apply(g 2 mean)

          [1] 00 02 03 03 06 03 06 07 08 09

          Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

          Rgt g lt- rgnm(5 10 12)

          Rgt apply(g 1 sum)

          [1] 12 12 12 12 12

          As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

          8 Social Network Analysis with sna

          or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

          Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

          Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

          Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

          Rgt k10

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

          Rgt t10

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

          Rgt n10

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

          Journal of Statistical Software 9

          [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

          When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

          Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

          Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

          [1] 01482828

          Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

          [1] 004646465

          Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

          [1] 08052525

          By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

          More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

          Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

          + d = 015))

          10 Social Network Analysis with sna

          with the magnitude of the specified effects depending on the exact choice of parameters

          Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

          Rgt g lt- matrix(0 10 10)

          Rgt g[1] lt- 1

          Rgt g2 lt- rewirews(g 05)[1]

          Rgt g2

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

          Rgt sum(g - g2) == 0

          [1] TRUE

          Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

          Rgt g3 lt- rmperm(g2)

          Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

          [1] TRUE

          Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

          22 Visualization and data manipulation

          Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

          Journal of Statistical Software 11

          extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

          Rgt g lt- rgraph(5)

          Rgt evaledgeperturbation(g 1 2 centralization betweenness)

          [1] 007291667

          Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

          Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

          In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

          Neighborhood and ego net functions

          The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

          12 Social Network Analysis with sna

          is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

          While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

          In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

          To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

          Rgt g lt- rgraph(10 tp = 15 9)

          Rgt gin lt- egoextract(g neighborhood = in)

          Rgt gout lt- egoextract(g neighborhood = out)

          Rgt gcomb lt- egoextract(g neighborhood = combined)

          Rgt gcomb[13]

          $`1`[1] [2] [3] [4]

          [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

          $`2`[1] [2] [3] [4]

          [1] 0 1 0 0[2] 1 0 0 0

          Journal of Statistical Software 13

          [3] 1 0 0 0[4] 1 0 1 0

          $`3`[1] [2] [3] [4]

          [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

          Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

          [1] TRUE

          Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

          [1] TRUE

          Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

          [1] TRUE

          Rgt egosize lt- sapply(gcomb NROW)

          Rgt if(any(egosize gt 2))

          + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

          1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

          8 9 10000000000 008333333 000000000

          Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

          Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

          Rgt g lt- rgraph(6)

          Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

          [1] TRUE

          14 Social Network Analysis with sna

          Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

          [1] TRUE

          Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

          + cmode = freeman) 2)

          [1] TRUE

          Rgt gapply(g c(1 2) 16 mean)

          [1] 400 300 300 550 325 325

          Rgt gapply(g c(1 2) 16 mean distance = 2)

          [1] 40 38 36 34 32 30

          To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

          Rgt g lt- rgraph(10 tp = 29)

          Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

          Rgt par(mfrow=c(33))

          Rgt for(i in 19)

          + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

          Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

          + partial = FALSE)

          Rgt par(mfrow = c(3 3))

          Rgt for(i in 19)

          + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

          Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

          Visualization

          Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

          Journal of Statistical Software 15

          Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

          Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

          Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

          Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

          in the ith panel iff vprime belongs to the ith order partial neighborhood of v

          and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

          While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

          Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

          16 Social Network Analysis with sna

          Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

          Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

          Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

          Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

          elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

          All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

          gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

          Rgt g lt- rgraph(5 diag = TRUE)

          Journal of Statistical Software 17

          Default Curved Edges MDS Layout

          Circular Layout Sociomatrix

          1

          2

          3

          4

          5

          1 2 3 4 5

          1

          2

          3

          4

          5

          Multiple Options

          1

          2

          3

          4

          5

          Figure 3 Sample visualizations using gplot with multiple layout and display options

          Rgt par(mfrow = c(2 3))

          Rgt gplot(g main = Default)

          Rgt gplot(g usecurv = TRUE main = Curved Edges)

          Rgt gplot(g mode = mds main = MDS Layout)

          Rgt gplot(g mode = circle main = Circular Layout)

          Rgt plotsociomatrix(g main = Sociomatrix)

          Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

          + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

          + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

          Output from the above is shown in Figure 3

          Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

          Rgt gplot3d(rgws(1 5 3 1 0))

          Rgt gplot3d(rgws(1 5 3 1 005))

          18 Social Network Analysis with sna

          Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

          Rgt gplot3d(rgws(1 5 3 1 02))

          Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

          As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

          Rgt par(mfrow = c(1 3))

          Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

          + xlab = ylab = main = gplotvertex Example)

          Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

          + col = 110 sides = 312 radius = 01)

          Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

          Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

          Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

          + xlab = ylab = main = gplotloop Example)

          Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

          + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

          + arrowhead = TRUE)

          Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

          + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

          The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

          23 Descriptive indices

          The literature of social network analysis is rich with descriptive indices of various sorts

          gplot3d1gif
          Media File (imagegif)
          gplot3d2gif
          Media File (imagegif)
          gplot3d3gif
          Media File (imagegif)

          Journal of Statistical Software 19

          minus15 minus10 minus05 00 05 10 15

          minus15

          minus10

          minus05

          00

          05

          10

          15

          gplotvertex Example

          10 12 14 16 18 20

          10

          12

          14

          16

          18

          20

          gplotarrow Example

          minus2 minus1 0 1 2

          minus2minus1

          01

          2

          gplotloop Example

          Figure 5 Examples of the use of gplot supplemental functions

          all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

          Node-level indices

          Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

          sum(vprimevprimeprime)subV v

          gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

          G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

          equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

          vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

          20 Social Network Analysis with sna

          closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

          Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

          An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

          Journal of Statistical Software 21

          the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

          To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

          Rgt dat lt- rgraph(10)

          Rgt degree(dat cmode = indegree)

          [1] 4 4 8 2 4 5 4 4 3 6

          Rgt degree(dat cmode = outdegree)

          [1] 6 3 5 2 5 4 4 4 5 6

          Rgt degree(dat)

          [1] 10 7 13 4 9 9 8 8 8 12

          Rgt closeness(dat)

          [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

          Rgt betweenness(dat)

          [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

          Rgt stresscent(dat)

          [1] 21 6 27 1 14 15 6 7 7 21

          Rgt graphcent(dat)

          [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

          Rgt evcent(dat)

          [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

          22 Social Network Analysis with sna

          Rgt infocent(dat)

          [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

          As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

          Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

          [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

          Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

          + evcent(dat rescale = TRUE)) lt 1e-10)

          [1] TRUE

          Rgt bonpow(dat exponent = -05)

          [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

          As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

          Rgt memb lt- sample(13 10 replace = TRUE)

          Rgt summary(brokerage(dat memb))

          Gould-Fernandez Brokerage Analysis

          Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

          w_I 50000 58638 27314 -03162 07518

          Journal of Statistical Software 23

          w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

          Individual Properties (by Group)

          Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

          [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

          b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

          Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

          [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

          t[1] -07838541[2] 14877951

          Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

          [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

          b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

          Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

          24 Social Network Analysis with sna

          for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

          Graph-level indices

          Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

          C(G) =|V |sumi=1

          [(maxvisinV

          c (vG))minus c (vi G)

          ] (1)

          ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

          C(G) = |V | [clowast(G)minus c(G)] (2)

          where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

          i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

          2For instance when all vertices are automorphically equivalent

          Journal of Statistical Software 25

          centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

          although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

          In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

          The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

          Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

          Rgt gden(g)

          [1] 006666667 031111111 054444444 072222222 093333333

          Rgt grecip(g)

          [1] 08666667 03777778 04888889 06666667 08666667

          Rgt grecip(g measure = edgewise)

          [1] 00000000 00000000 05306122 07692308 09285714

          Rgt grecip(g) == 1 - hierarchy(g)

          [1] TRUE TRUE TRUE TRUE TRUE

          Rgt gtrans(g)

          [1] 10000000 02957746 05047619 06809651 09326923

          Rgt gtrans(g measure = weakcensus)

          3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

          26 Social Network Analysis with sna

          [1] 0 21 106 254 582

          Rgt connectedness(g)

          [1] 04666667 10000000 10000000 10000000 10000000

          Rgt efficiency(g)

          [1] 100000000 076543210 050617284 030864198 007407407

          Rgt hierarchy(g measure = krackhardt)

          [1] 10 02 00 00 00

          Rgt lubness(g)

          [1] 02 10 10 10 10

          centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

          Rgt centralization(g degree cmode = outdegree)

          [1] 01728395

          Rgt centralization(g betweenness)

          [1] 0

          Rgt apply(g 1 centralization degree cmode = outdegree)

          [1] 017283951 027160494 038271605 006172840 007407407

          Rgt apply(g 1 centralization betweenness)

          [1] 0000000000 0135802469 0043467078 0021237507 0004151969

          As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

          Journal of Statistical Software 27

          Rgt o2scent lt- function(dat tmaxdev = FALSE )

          + n lt- NROW(dat)

          + if(tmaxdev)

          + return((n-1) choose(n-1 2))

          + odeg lt- degree(dat cmode = outdegree)

          + choose(odeg 2)

          +

          Rgt apply(g 1 centralization o2scent)

          [1] 002160494 020370370 054012346 008950617 014506173

          Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

          24 Connectivity and subgraph statistics

          Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

          sumNj=1

          sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

          is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

          At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

          28 Social Network Analysis with sna

          subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

          Example

          To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

          Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

          Rgt apply(dyadcensus(g1) 2 mean)

          Mut Asym Null100 1284 3116

          Rgt apply(triadcensus(g1) 2 mean)

          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

          Journal of Statistical Software 29

          120C 210 300030 000 000

          Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

          Rgt apply(dyadcensus(g2) 2 mean)

          Mut Asym Null884 926 2690

          Rgt apply(triadcensus(g2) 2 mean)

          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

          Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

          Rgt apply(dyadcensus(g3) 2 mean)

          Mut Asym Null894 2044 1562

          Rgt apply(triadcensus(g3) 2 mean)

          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

          Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

          + dyadictabulation = bylength)$pathcount

          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

          Rgt kcyclecensus(g3[1] maxlen = 5

          + cyclecomembership = bylength)$cyclecount

          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

          30 Social Network Analysis with sna

          Rgt componentdist(g3[1])

          $membership[1] 1 1 1 1 1 1 1 1 1 1

          $csize[1] 10

          $cdist[1] 0 0 0 0 0 0 0 0 0 1

          Rgt structurestatistics(g3[1])

          0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

          In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

          Rgt g4 lt- g1[12]

          Rgt g4[2] lt- g2[1]

          Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

          + g1 = 1 g2 = 2)

          Rgt summary(cug)

          CUG Test Results

          Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

          Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

          Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

          Rgt summary(cug)

          Journal of Statistical Software 31

          CUG Test Results

          Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

          Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

          A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

          25 Position and role analysis

          The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

          In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

          32 Social Network Analysis with sna

          This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

          After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

          The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

          Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

          Example

          To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

          Journal of Statistical Software 33

          with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

          Rgt gp lt- sapply(runif(20 0 1) rep 20)

          Rgt g lt- rgraph(20 tprob = gp)

          Rgt eq lt- equivclust(g)

          Rgt b lt- blockmodel(g eq h = 15)

          Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

          Rgt ge

          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

          26 Exploratory edge set comparison

          One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

          cov(GH) =

          sum(ij)

          (AG

          ij minus microG

          )(AH

          ij minus microH

          )|V | (|V | minus 1)

          (3)

          34 Social Network Analysis with sna

          where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

          (ij)AXij is the graph mean The graph variance is then cov(GG)

          and the graph correlation ρ(GH) = cov(GH)radic

          cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

          The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

          Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

          In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

          Journal of Statistical Software 35

          Example

          We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

          Rgt g1 lt- rgraph(5)

          Rgt g2 lt -rgraph(5)

          Rgt g3 lt- rmperm(g2)

          Rgt gcor(g1 g2)

          [1] -01336306

          Rgt gcor(g1 g3)

          [1] 008908708

          Rgt gcor(g2 g3)

          [1] -04583333

          Rgt gscor(g1 g2 reps = 1e5)

          [1] 05345225

          Rgt gscor(g1 g3 reps = 1e5)

          [1] 05345225

          Rgt gscor(g2 g3 reps = 1e5)

          [1] 1

          Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

          Rgt x lt- rgraph(20 4)

          Rgt y lt- x[1] + 4 x[2] + 2 x[3]

          Rgt nl lt- netlm(y x)

          Rgt summary(nl)

          36 Social Network Analysis with sna

          OLS Network Model

          Residuals0 25 50 75 100

          -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

          CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

          (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

          Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

          Test Diagnostics

          Null Hypothesis qapReplications 1000Coefficient Distribution Summary

          (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

          As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

          Rgt x lt- rgraph(20 4)

          Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

          Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

          Rgt y lt- rgraph(20 tprob = yp)

          Rgt nl lt- netlogit(y x)

          Rgt summary(nl)

          Network Logit Model

          Coefficients

          Journal of Statistical Software 37

          Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

          Goodness of Fit Statistics

          Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

          3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

          (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

          Contingency Table (predicted (rows) x actual (cols))

          0 10 0 01 39 341

          Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

          Test Diagnostics

          Null Hypothesis qapReplications 1000Distribution Summary

          (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

          It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

          38 Social Network Analysis with sna

          parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

          27 Network inference and process models

          A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

          Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

          Journal of Statistical Software 39

          of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

          )prodk

          (1minusPr(Bk)

          )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

          While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

          y =

          (wsum

          i=1

          θiWi

          )y + Xβ + ε (4)

          ε =

          (zsum

          i=1

          ψiZi

          )ε+ ν (5)

          where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

          40 Social Network Analysis with sna

          Example

          To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

          Rgt g lt- rgraph(20)

          Rgt ep lt- rbeta(20 1 25)

          Rgt em lt- rbeta(20 15 25)

          Rgt dat lt- array(dim = c(20 20 20))

          Rgt for(i in 120)

          + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

          Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

          Rgt pem lt- matrix(nrow = 20 ncol = 2)

          Rgt pem[1] lt- 2

          Rgt pem[2] lt- 11

          Rgt pep lt- matrix(nrow = 20 ncol = 2)

          Rgt pep[1] lt- 2

          Rgt pep[2] lt- 11

          Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

          + epprior = pep burntime = 300 draws = 100)

          Rgt summary(b)

          Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

          Multiple Error Probability Model

          Marginal Posterior Network Distribution

          a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

          Journal of Statistical Software 41

          a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

          a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

          Marginal Posterior Global Error Distribution

          e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

          Marginal Posterior Error Distribution (by observer)

          Probability of False Negatives (e^-)

          Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

          42 Social Network Analysis with sna

          o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

          Probability of False Positives (e^+)

          Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

          MCMC Diagnostics

          Replicate Chains 5Burn Time 300

          Journal of Statistical Software 43

          Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

          Max 1003116Med 09992194IQR 00004545115

          Rgt cor(em apply(b$em 2 median))

          [1] 09187894

          Rgt cor(ep apply(b$ep 2 median))

          [1] 0971649

          Rgt mean(apply(b$net c(2 3) median) == g)

          [1] 1

          Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

          Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

          Rgt mean(consensus(dat method = LASintersection) == g)

          [1] 07725

          Rgt mean(consensus(dat method = LASunion) == g)

          [1] 0905

          Rgt mean(consensus(dat method = centralgraph) == g)

          [1] 09575

          Rgt mean(consensus(dat method = romneybatchelder) == g)

          44 Social Network Analysis with sna

          Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

          For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

          As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

          Rgt w1 lt- rgraph(50)

          Rgt w2 lt- rgraph(50)

          Rgt x lt- matrix(rnorm(50 5) 50 5)

          Rgt r1 lt- 02

          Rgt r2 lt- 03

          Rgt sigma lt- 01

          Rgt beta lt- rnorm(5)

          Rgt nu lt- rnorm(50 0 sigma)

          Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

          Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

          Rgt fit lt- lnam(y x w1 w2)

          Rgt summary(fit)

          Calllnam(y = y x = x W1 = w1 W2 = w2)

          ResidualsMin 1Q Median 3Q Max

          -052052 -018305 001156 015557 062082

          CoefficientsEstimate Std Error Z value Pr(gt|z|)

          X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

          Journal of Statistical Software 45

          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

          Estimate Std ErrorSigma 009597 922e-05

          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

          3 Closing comments

          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

          Acknowledgments

          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

          46 Social Network Analysis with sna

          minus3 minus2 minus1 0 1 2

          minus3minus2

          minus10

          12

          Fitted vs Observed Values

          y

          y

          minus3 minus2 minus1 0 1 2

          minus02

          minus01

          00

          01

          02

          Fitted Values vs Estimated Disturbances

          y

          ν

          minus2 minus1 0 1 2

          minus04

          minus02

          00

          02

          04

          06

          Normal QminusQ Residual Plot

          Theoretical Quantiles

          Sam

          ple

          Qua

          ntile

          s

          Net Influence Plot

          Figure 6 Plot method output for lnam

          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

          References

          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

          Journal of Statistical Software 47

          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

          48 Social Network Analysis with sna

          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

          Journal of Statistical Software 49

          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

          50 Social Network Analysis with sna

          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

          Journal of Statistical Software 51

          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

          Affiliation

          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

          • Introduction and overview
            • Package history
            • sna and statnet
            • Functionality
            • Terminology and data representation
              • Importing relational data into R
                  • Package highlights
                    • Random graph generation
                      • Example
                        • Visualization and data manipulation
                          • Neighborhood and ego net functions
                          • Visualization
                            • Descriptive indices
                              • Node-level indices
                              • Graph-level indices
                                • Connectivity and subgraph statistics
                                  • Example
                                    • Position and role analysis
                                      • Example
                                        • Exploratory edge set comparison
                                          • Example
                                            • Network inference and process models
                                              • Example
                                                  • Closing comments

            6 Social Network Analysis with sna

            2 Package highlights

            Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

            21 Random graph generation

            sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

            In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

            Example

            To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

            Rgt library(sna)

            Rgt setseed(1913)

            As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

            Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

            Rgt gden(g)

            [1] 01000000 08666667 05333333

            1rgraph can also be employed to simulate valued graphs via a resampling procedure

            Journal of Statistical Software 7

            gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

            Rgt gp lt- sapply((110) 10 rep 10)

            Rgt g lt- rgraph(10 tprob = gp)

            Rgt g

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

            Rgt apply(g 2 mean)

            [1] 00 02 03 03 06 03 06 07 08 09

            Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

            Rgt g lt- rgnm(5 10 12)

            Rgt apply(g 1 sum)

            [1] 12 12 12 12 12

            As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

            8 Social Network Analysis with sna

            or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

            Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

            Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

            Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

            Rgt k10

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

            Rgt t10

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

            Rgt n10

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

            Journal of Statistical Software 9

            [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

            When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

            Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

            Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

            [1] 01482828

            Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

            [1] 004646465

            Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

            [1] 08052525

            By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

            More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

            Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

            + d = 015))

            10 Social Network Analysis with sna

            with the magnitude of the specified effects depending on the exact choice of parameters

            Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

            Rgt g lt- matrix(0 10 10)

            Rgt g[1] lt- 1

            Rgt g2 lt- rewirews(g 05)[1]

            Rgt g2

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

            Rgt sum(g - g2) == 0

            [1] TRUE

            Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

            Rgt g3 lt- rmperm(g2)

            Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

            [1] TRUE

            Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

            22 Visualization and data manipulation

            Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

            Journal of Statistical Software 11

            extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

            Rgt g lt- rgraph(5)

            Rgt evaledgeperturbation(g 1 2 centralization betweenness)

            [1] 007291667

            Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

            Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

            In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

            Neighborhood and ego net functions

            The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

            12 Social Network Analysis with sna

            is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

            While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

            In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

            To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

            Rgt g lt- rgraph(10 tp = 15 9)

            Rgt gin lt- egoextract(g neighborhood = in)

            Rgt gout lt- egoextract(g neighborhood = out)

            Rgt gcomb lt- egoextract(g neighborhood = combined)

            Rgt gcomb[13]

            $`1`[1] [2] [3] [4]

            [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

            $`2`[1] [2] [3] [4]

            [1] 0 1 0 0[2] 1 0 0 0

            Journal of Statistical Software 13

            [3] 1 0 0 0[4] 1 0 1 0

            $`3`[1] [2] [3] [4]

            [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

            Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

            [1] TRUE

            Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

            [1] TRUE

            Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

            [1] TRUE

            Rgt egosize lt- sapply(gcomb NROW)

            Rgt if(any(egosize gt 2))

            + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

            1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

            8 9 10000000000 008333333 000000000

            Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

            Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

            Rgt g lt- rgraph(6)

            Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

            [1] TRUE

            14 Social Network Analysis with sna

            Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

            [1] TRUE

            Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

            + cmode = freeman) 2)

            [1] TRUE

            Rgt gapply(g c(1 2) 16 mean)

            [1] 400 300 300 550 325 325

            Rgt gapply(g c(1 2) 16 mean distance = 2)

            [1] 40 38 36 34 32 30

            To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

            Rgt g lt- rgraph(10 tp = 29)

            Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

            Rgt par(mfrow=c(33))

            Rgt for(i in 19)

            + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

            Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

            + partial = FALSE)

            Rgt par(mfrow = c(3 3))

            Rgt for(i in 19)

            + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

            Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

            Visualization

            Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

            Journal of Statistical Software 15

            Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

            Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

            Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

            Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

            in the ith panel iff vprime belongs to the ith order partial neighborhood of v

            and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

            While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

            Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

            16 Social Network Analysis with sna

            Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

            Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

            Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

            Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

            elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

            All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

            gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

            Rgt g lt- rgraph(5 diag = TRUE)

            Journal of Statistical Software 17

            Default Curved Edges MDS Layout

            Circular Layout Sociomatrix

            1

            2

            3

            4

            5

            1 2 3 4 5

            1

            2

            3

            4

            5

            Multiple Options

            1

            2

            3

            4

            5

            Figure 3 Sample visualizations using gplot with multiple layout and display options

            Rgt par(mfrow = c(2 3))

            Rgt gplot(g main = Default)

            Rgt gplot(g usecurv = TRUE main = Curved Edges)

            Rgt gplot(g mode = mds main = MDS Layout)

            Rgt gplot(g mode = circle main = Circular Layout)

            Rgt plotsociomatrix(g main = Sociomatrix)

            Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

            + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

            + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

            Output from the above is shown in Figure 3

            Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

            Rgt gplot3d(rgws(1 5 3 1 0))

            Rgt gplot3d(rgws(1 5 3 1 005))

            18 Social Network Analysis with sna

            Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

            Rgt gplot3d(rgws(1 5 3 1 02))

            Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

            As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

            Rgt par(mfrow = c(1 3))

            Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

            + xlab = ylab = main = gplotvertex Example)

            Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

            + col = 110 sides = 312 radius = 01)

            Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

            Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

            Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

            + xlab = ylab = main = gplotloop Example)

            Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

            + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

            + arrowhead = TRUE)

            Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

            + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

            The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

            23 Descriptive indices

            The literature of social network analysis is rich with descriptive indices of various sorts

            gplot3d1gif
            Media File (imagegif)
            gplot3d2gif
            Media File (imagegif)
            gplot3d3gif
            Media File (imagegif)

            Journal of Statistical Software 19

            minus15 minus10 minus05 00 05 10 15

            minus15

            minus10

            minus05

            00

            05

            10

            15

            gplotvertex Example

            10 12 14 16 18 20

            10

            12

            14

            16

            18

            20

            gplotarrow Example

            minus2 minus1 0 1 2

            minus2minus1

            01

            2

            gplotloop Example

            Figure 5 Examples of the use of gplot supplemental functions

            all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

            Node-level indices

            Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

            sum(vprimevprimeprime)subV v

            gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

            G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

            equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

            vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

            20 Social Network Analysis with sna

            closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

            Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

            An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

            Journal of Statistical Software 21

            the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

            To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

            Rgt dat lt- rgraph(10)

            Rgt degree(dat cmode = indegree)

            [1] 4 4 8 2 4 5 4 4 3 6

            Rgt degree(dat cmode = outdegree)

            [1] 6 3 5 2 5 4 4 4 5 6

            Rgt degree(dat)

            [1] 10 7 13 4 9 9 8 8 8 12

            Rgt closeness(dat)

            [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

            Rgt betweenness(dat)

            [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

            Rgt stresscent(dat)

            [1] 21 6 27 1 14 15 6 7 7 21

            Rgt graphcent(dat)

            [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

            Rgt evcent(dat)

            [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

            22 Social Network Analysis with sna

            Rgt infocent(dat)

            [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

            As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

            Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

            [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

            Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

            + evcent(dat rescale = TRUE)) lt 1e-10)

            [1] TRUE

            Rgt bonpow(dat exponent = -05)

            [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

            As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

            Rgt memb lt- sample(13 10 replace = TRUE)

            Rgt summary(brokerage(dat memb))

            Gould-Fernandez Brokerage Analysis

            Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

            w_I 50000 58638 27314 -03162 07518

            Journal of Statistical Software 23

            w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

            Individual Properties (by Group)

            Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

            [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

            b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

            Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

            [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

            t[1] -07838541[2] 14877951

            Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

            [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

            b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

            Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

            24 Social Network Analysis with sna

            for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

            Graph-level indices

            Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

            C(G) =|V |sumi=1

            [(maxvisinV

            c (vG))minus c (vi G)

            ] (1)

            ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

            C(G) = |V | [clowast(G)minus c(G)] (2)

            where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

            i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

            2For instance when all vertices are automorphically equivalent

            Journal of Statistical Software 25

            centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

            although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

            In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

            The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

            Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

            Rgt gden(g)

            [1] 006666667 031111111 054444444 072222222 093333333

            Rgt grecip(g)

            [1] 08666667 03777778 04888889 06666667 08666667

            Rgt grecip(g measure = edgewise)

            [1] 00000000 00000000 05306122 07692308 09285714

            Rgt grecip(g) == 1 - hierarchy(g)

            [1] TRUE TRUE TRUE TRUE TRUE

            Rgt gtrans(g)

            [1] 10000000 02957746 05047619 06809651 09326923

            Rgt gtrans(g measure = weakcensus)

            3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

            26 Social Network Analysis with sna

            [1] 0 21 106 254 582

            Rgt connectedness(g)

            [1] 04666667 10000000 10000000 10000000 10000000

            Rgt efficiency(g)

            [1] 100000000 076543210 050617284 030864198 007407407

            Rgt hierarchy(g measure = krackhardt)

            [1] 10 02 00 00 00

            Rgt lubness(g)

            [1] 02 10 10 10 10

            centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

            Rgt centralization(g degree cmode = outdegree)

            [1] 01728395

            Rgt centralization(g betweenness)

            [1] 0

            Rgt apply(g 1 centralization degree cmode = outdegree)

            [1] 017283951 027160494 038271605 006172840 007407407

            Rgt apply(g 1 centralization betweenness)

            [1] 0000000000 0135802469 0043467078 0021237507 0004151969

            As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

            Journal of Statistical Software 27

            Rgt o2scent lt- function(dat tmaxdev = FALSE )

            + n lt- NROW(dat)

            + if(tmaxdev)

            + return((n-1) choose(n-1 2))

            + odeg lt- degree(dat cmode = outdegree)

            + choose(odeg 2)

            +

            Rgt apply(g 1 centralization o2scent)

            [1] 002160494 020370370 054012346 008950617 014506173

            Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

            24 Connectivity and subgraph statistics

            Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

            sumNj=1

            sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

            is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

            At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

            28 Social Network Analysis with sna

            subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

            Example

            To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

            Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

            Rgt apply(dyadcensus(g1) 2 mean)

            Mut Asym Null100 1284 3116

            Rgt apply(triadcensus(g1) 2 mean)

            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

            Journal of Statistical Software 29

            120C 210 300030 000 000

            Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

            Rgt apply(dyadcensus(g2) 2 mean)

            Mut Asym Null884 926 2690

            Rgt apply(triadcensus(g2) 2 mean)

            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

            Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

            Rgt apply(dyadcensus(g3) 2 mean)

            Mut Asym Null894 2044 1562

            Rgt apply(triadcensus(g3) 2 mean)

            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

            Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

            + dyadictabulation = bylength)$pathcount

            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

            Rgt kcyclecensus(g3[1] maxlen = 5

            + cyclecomembership = bylength)$cyclecount

            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

            30 Social Network Analysis with sna

            Rgt componentdist(g3[1])

            $membership[1] 1 1 1 1 1 1 1 1 1 1

            $csize[1] 10

            $cdist[1] 0 0 0 0 0 0 0 0 0 1

            Rgt structurestatistics(g3[1])

            0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

            In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

            Rgt g4 lt- g1[12]

            Rgt g4[2] lt- g2[1]

            Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

            + g1 = 1 g2 = 2)

            Rgt summary(cug)

            CUG Test Results

            Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

            Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

            Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

            Rgt summary(cug)

            Journal of Statistical Software 31

            CUG Test Results

            Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

            Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

            A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

            25 Position and role analysis

            The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

            In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

            32 Social Network Analysis with sna

            This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

            After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

            The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

            Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

            Example

            To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

            Journal of Statistical Software 33

            with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

            Rgt gp lt- sapply(runif(20 0 1) rep 20)

            Rgt g lt- rgraph(20 tprob = gp)

            Rgt eq lt- equivclust(g)

            Rgt b lt- blockmodel(g eq h = 15)

            Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

            Rgt ge

            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

            26 Exploratory edge set comparison

            One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

            cov(GH) =

            sum(ij)

            (AG

            ij minus microG

            )(AH

            ij minus microH

            )|V | (|V | minus 1)

            (3)

            34 Social Network Analysis with sna

            where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

            (ij)AXij is the graph mean The graph variance is then cov(GG)

            and the graph correlation ρ(GH) = cov(GH)radic

            cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

            The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

            Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

            In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

            Journal of Statistical Software 35

            Example

            We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

            Rgt g1 lt- rgraph(5)

            Rgt g2 lt -rgraph(5)

            Rgt g3 lt- rmperm(g2)

            Rgt gcor(g1 g2)

            [1] -01336306

            Rgt gcor(g1 g3)

            [1] 008908708

            Rgt gcor(g2 g3)

            [1] -04583333

            Rgt gscor(g1 g2 reps = 1e5)

            [1] 05345225

            Rgt gscor(g1 g3 reps = 1e5)

            [1] 05345225

            Rgt gscor(g2 g3 reps = 1e5)

            [1] 1

            Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

            Rgt x lt- rgraph(20 4)

            Rgt y lt- x[1] + 4 x[2] + 2 x[3]

            Rgt nl lt- netlm(y x)

            Rgt summary(nl)

            36 Social Network Analysis with sna

            OLS Network Model

            Residuals0 25 50 75 100

            -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

            CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

            (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

            Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

            Test Diagnostics

            Null Hypothesis qapReplications 1000Coefficient Distribution Summary

            (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

            As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

            Rgt x lt- rgraph(20 4)

            Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

            Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

            Rgt y lt- rgraph(20 tprob = yp)

            Rgt nl lt- netlogit(y x)

            Rgt summary(nl)

            Network Logit Model

            Coefficients

            Journal of Statistical Software 37

            Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

            Goodness of Fit Statistics

            Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

            3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

            (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

            Contingency Table (predicted (rows) x actual (cols))

            0 10 0 01 39 341

            Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

            Test Diagnostics

            Null Hypothesis qapReplications 1000Distribution Summary

            (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

            It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

            38 Social Network Analysis with sna

            parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

            27 Network inference and process models

            A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

            Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

            Journal of Statistical Software 39

            of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

            )prodk

            (1minusPr(Bk)

            )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

            While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

            y =

            (wsum

            i=1

            θiWi

            )y + Xβ + ε (4)

            ε =

            (zsum

            i=1

            ψiZi

            )ε+ ν (5)

            where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

            40 Social Network Analysis with sna

            Example

            To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

            Rgt g lt- rgraph(20)

            Rgt ep lt- rbeta(20 1 25)

            Rgt em lt- rbeta(20 15 25)

            Rgt dat lt- array(dim = c(20 20 20))

            Rgt for(i in 120)

            + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

            Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

            Rgt pem lt- matrix(nrow = 20 ncol = 2)

            Rgt pem[1] lt- 2

            Rgt pem[2] lt- 11

            Rgt pep lt- matrix(nrow = 20 ncol = 2)

            Rgt pep[1] lt- 2

            Rgt pep[2] lt- 11

            Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

            + epprior = pep burntime = 300 draws = 100)

            Rgt summary(b)

            Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

            Multiple Error Probability Model

            Marginal Posterior Network Distribution

            a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

            Journal of Statistical Software 41

            a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

            a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

            Marginal Posterior Global Error Distribution

            e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

            Marginal Posterior Error Distribution (by observer)

            Probability of False Negatives (e^-)

            Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

            42 Social Network Analysis with sna

            o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

            Probability of False Positives (e^+)

            Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

            MCMC Diagnostics

            Replicate Chains 5Burn Time 300

            Journal of Statistical Software 43

            Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

            Max 1003116Med 09992194IQR 00004545115

            Rgt cor(em apply(b$em 2 median))

            [1] 09187894

            Rgt cor(ep apply(b$ep 2 median))

            [1] 0971649

            Rgt mean(apply(b$net c(2 3) median) == g)

            [1] 1

            Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

            Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

            Rgt mean(consensus(dat method = LASintersection) == g)

            [1] 07725

            Rgt mean(consensus(dat method = LASunion) == g)

            [1] 0905

            Rgt mean(consensus(dat method = centralgraph) == g)

            [1] 09575

            Rgt mean(consensus(dat method = romneybatchelder) == g)

            44 Social Network Analysis with sna

            Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

            For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

            As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

            Rgt w1 lt- rgraph(50)

            Rgt w2 lt- rgraph(50)

            Rgt x lt- matrix(rnorm(50 5) 50 5)

            Rgt r1 lt- 02

            Rgt r2 lt- 03

            Rgt sigma lt- 01

            Rgt beta lt- rnorm(5)

            Rgt nu lt- rnorm(50 0 sigma)

            Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

            Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

            Rgt fit lt- lnam(y x w1 w2)

            Rgt summary(fit)

            Calllnam(y = y x = x W1 = w1 W2 = w2)

            ResidualsMin 1Q Median 3Q Max

            -052052 -018305 001156 015557 062082

            CoefficientsEstimate Std Error Z value Pr(gt|z|)

            X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

            Journal of Statistical Software 45

            X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

            Estimate Std ErrorSigma 009597 922e-05

            Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

            Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

            In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

            3 Closing comments

            The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

            Acknowledgments

            The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

            46 Social Network Analysis with sna

            minus3 minus2 minus1 0 1 2

            minus3minus2

            minus10

            12

            Fitted vs Observed Values

            y

            y

            minus3 minus2 minus1 0 1 2

            minus02

            minus01

            00

            01

            02

            Fitted Values vs Estimated Disturbances

            y

            ν

            minus2 minus1 0 1 2

            minus04

            minus02

            00

            02

            04

            06

            Normal QminusQ Residual Plot

            Theoretical Quantiles

            Sam

            ple

            Qua

            ntile

            s

            Net Influence Plot

            Figure 6 Plot method output for lnam

            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

            References

            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

            Journal of Statistical Software 47

            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

            48 Social Network Analysis with sna

            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

            Journal of Statistical Software 49

            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

            50 Social Network Analysis with sna

            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

            Journal of Statistical Software 51

            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

            Affiliation

            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

            • Introduction and overview
              • Package history
              • sna and statnet
              • Functionality
              • Terminology and data representation
                • Importing relational data into R
                    • Package highlights
                      • Random graph generation
                        • Example
                          • Visualization and data manipulation
                            • Neighborhood and ego net functions
                            • Visualization
                              • Descriptive indices
                                • Node-level indices
                                • Graph-level indices
                                  • Connectivity and subgraph statistics
                                    • Example
                                      • Position and role analysis
                                        • Example
                                          • Exploratory edge set comparison
                                            • Example
                                              • Network inference and process models
                                                • Example
                                                    • Closing comments

              Journal of Statistical Software 7

              gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

              Rgt gp lt- sapply((110) 10 rep 10)

              Rgt g lt- rgraph(10 tprob = gp)

              Rgt g

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

              Rgt apply(g 2 mean)

              [1] 00 02 03 03 06 03 06 07 08 09

              Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

              Rgt g lt- rgnm(5 10 12)

              Rgt apply(g 1 sum)

              [1] 12 12 12 12 12

              As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

              8 Social Network Analysis with sna

              or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

              Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

              Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

              Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

              Rgt k10

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

              Rgt t10

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

              Rgt n10

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

              Journal of Statistical Software 9

              [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

              When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

              Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

              Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

              [1] 01482828

              Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

              [1] 004646465

              Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

              [1] 08052525

              By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

              More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

              Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

              + d = 015))

              10 Social Network Analysis with sna

              with the magnitude of the specified effects depending on the exact choice of parameters

              Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

              Rgt g lt- matrix(0 10 10)

              Rgt g[1] lt- 1

              Rgt g2 lt- rewirews(g 05)[1]

              Rgt g2

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

              Rgt sum(g - g2) == 0

              [1] TRUE

              Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

              Rgt g3 lt- rmperm(g2)

              Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

              [1] TRUE

              Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

              22 Visualization and data manipulation

              Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

              Journal of Statistical Software 11

              extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

              Rgt g lt- rgraph(5)

              Rgt evaledgeperturbation(g 1 2 centralization betweenness)

              [1] 007291667

              Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

              Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

              In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

              Neighborhood and ego net functions

              The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

              12 Social Network Analysis with sna

              is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

              While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

              In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

              To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

              Rgt g lt- rgraph(10 tp = 15 9)

              Rgt gin lt- egoextract(g neighborhood = in)

              Rgt gout lt- egoextract(g neighborhood = out)

              Rgt gcomb lt- egoextract(g neighborhood = combined)

              Rgt gcomb[13]

              $`1`[1] [2] [3] [4]

              [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

              $`2`[1] [2] [3] [4]

              [1] 0 1 0 0[2] 1 0 0 0

              Journal of Statistical Software 13

              [3] 1 0 0 0[4] 1 0 1 0

              $`3`[1] [2] [3] [4]

              [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

              Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

              [1] TRUE

              Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

              [1] TRUE

              Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

              [1] TRUE

              Rgt egosize lt- sapply(gcomb NROW)

              Rgt if(any(egosize gt 2))

              + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

              1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

              8 9 10000000000 008333333 000000000

              Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

              Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

              Rgt g lt- rgraph(6)

              Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

              [1] TRUE

              14 Social Network Analysis with sna

              Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

              [1] TRUE

              Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

              + cmode = freeman) 2)

              [1] TRUE

              Rgt gapply(g c(1 2) 16 mean)

              [1] 400 300 300 550 325 325

              Rgt gapply(g c(1 2) 16 mean distance = 2)

              [1] 40 38 36 34 32 30

              To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

              Rgt g lt- rgraph(10 tp = 29)

              Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

              Rgt par(mfrow=c(33))

              Rgt for(i in 19)

              + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

              Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

              + partial = FALSE)

              Rgt par(mfrow = c(3 3))

              Rgt for(i in 19)

              + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

              Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

              Visualization

              Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

              Journal of Statistical Software 15

              Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

              Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

              Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

              Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

              in the ith panel iff vprime belongs to the ith order partial neighborhood of v

              and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

              While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

              Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

              16 Social Network Analysis with sna

              Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

              Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

              Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

              Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

              elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

              All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

              gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

              Rgt g lt- rgraph(5 diag = TRUE)

              Journal of Statistical Software 17

              Default Curved Edges MDS Layout

              Circular Layout Sociomatrix

              1

              2

              3

              4

              5

              1 2 3 4 5

              1

              2

              3

              4

              5

              Multiple Options

              1

              2

              3

              4

              5

              Figure 3 Sample visualizations using gplot with multiple layout and display options

              Rgt par(mfrow = c(2 3))

              Rgt gplot(g main = Default)

              Rgt gplot(g usecurv = TRUE main = Curved Edges)

              Rgt gplot(g mode = mds main = MDS Layout)

              Rgt gplot(g mode = circle main = Circular Layout)

              Rgt plotsociomatrix(g main = Sociomatrix)

              Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

              + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

              + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

              Output from the above is shown in Figure 3

              Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

              Rgt gplot3d(rgws(1 5 3 1 0))

              Rgt gplot3d(rgws(1 5 3 1 005))

              18 Social Network Analysis with sna

              Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

              Rgt gplot3d(rgws(1 5 3 1 02))

              Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

              As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

              Rgt par(mfrow = c(1 3))

              Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

              + xlab = ylab = main = gplotvertex Example)

              Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

              + col = 110 sides = 312 radius = 01)

              Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

              Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

              Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

              + xlab = ylab = main = gplotloop Example)

              Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

              + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

              + arrowhead = TRUE)

              Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

              + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

              The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

              23 Descriptive indices

              The literature of social network analysis is rich with descriptive indices of various sorts

              gplot3d1gif
              Media File (imagegif)
              gplot3d2gif
              Media File (imagegif)
              gplot3d3gif
              Media File (imagegif)

              Journal of Statistical Software 19

              minus15 minus10 minus05 00 05 10 15

              minus15

              minus10

              minus05

              00

              05

              10

              15

              gplotvertex Example

              10 12 14 16 18 20

              10

              12

              14

              16

              18

              20

              gplotarrow Example

              minus2 minus1 0 1 2

              minus2minus1

              01

              2

              gplotloop Example

              Figure 5 Examples of the use of gplot supplemental functions

              all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

              Node-level indices

              Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

              sum(vprimevprimeprime)subV v

              gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

              G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

              equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

              vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

              20 Social Network Analysis with sna

              closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

              Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

              An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

              Journal of Statistical Software 21

              the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

              To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

              Rgt dat lt- rgraph(10)

              Rgt degree(dat cmode = indegree)

              [1] 4 4 8 2 4 5 4 4 3 6

              Rgt degree(dat cmode = outdegree)

              [1] 6 3 5 2 5 4 4 4 5 6

              Rgt degree(dat)

              [1] 10 7 13 4 9 9 8 8 8 12

              Rgt closeness(dat)

              [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

              Rgt betweenness(dat)

              [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

              Rgt stresscent(dat)

              [1] 21 6 27 1 14 15 6 7 7 21

              Rgt graphcent(dat)

              [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

              Rgt evcent(dat)

              [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

              22 Social Network Analysis with sna

              Rgt infocent(dat)

              [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

              As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

              Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

              [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

              Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

              + evcent(dat rescale = TRUE)) lt 1e-10)

              [1] TRUE

              Rgt bonpow(dat exponent = -05)

              [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

              As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

              Rgt memb lt- sample(13 10 replace = TRUE)

              Rgt summary(brokerage(dat memb))

              Gould-Fernandez Brokerage Analysis

              Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

              w_I 50000 58638 27314 -03162 07518

              Journal of Statistical Software 23

              w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

              Individual Properties (by Group)

              Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

              [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

              b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

              Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

              [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

              t[1] -07838541[2] 14877951

              Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

              [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

              b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

              Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

              24 Social Network Analysis with sna

              for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

              Graph-level indices

              Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

              C(G) =|V |sumi=1

              [(maxvisinV

              c (vG))minus c (vi G)

              ] (1)

              ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

              C(G) = |V | [clowast(G)minus c(G)] (2)

              where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

              i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

              2For instance when all vertices are automorphically equivalent

              Journal of Statistical Software 25

              centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

              although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

              In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

              The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

              Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

              Rgt gden(g)

              [1] 006666667 031111111 054444444 072222222 093333333

              Rgt grecip(g)

              [1] 08666667 03777778 04888889 06666667 08666667

              Rgt grecip(g measure = edgewise)

              [1] 00000000 00000000 05306122 07692308 09285714

              Rgt grecip(g) == 1 - hierarchy(g)

              [1] TRUE TRUE TRUE TRUE TRUE

              Rgt gtrans(g)

              [1] 10000000 02957746 05047619 06809651 09326923

              Rgt gtrans(g measure = weakcensus)

              3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

              26 Social Network Analysis with sna

              [1] 0 21 106 254 582

              Rgt connectedness(g)

              [1] 04666667 10000000 10000000 10000000 10000000

              Rgt efficiency(g)

              [1] 100000000 076543210 050617284 030864198 007407407

              Rgt hierarchy(g measure = krackhardt)

              [1] 10 02 00 00 00

              Rgt lubness(g)

              [1] 02 10 10 10 10

              centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

              Rgt centralization(g degree cmode = outdegree)

              [1] 01728395

              Rgt centralization(g betweenness)

              [1] 0

              Rgt apply(g 1 centralization degree cmode = outdegree)

              [1] 017283951 027160494 038271605 006172840 007407407

              Rgt apply(g 1 centralization betweenness)

              [1] 0000000000 0135802469 0043467078 0021237507 0004151969

              As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

              Journal of Statistical Software 27

              Rgt o2scent lt- function(dat tmaxdev = FALSE )

              + n lt- NROW(dat)

              + if(tmaxdev)

              + return((n-1) choose(n-1 2))

              + odeg lt- degree(dat cmode = outdegree)

              + choose(odeg 2)

              +

              Rgt apply(g 1 centralization o2scent)

              [1] 002160494 020370370 054012346 008950617 014506173

              Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

              24 Connectivity and subgraph statistics

              Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

              sumNj=1

              sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

              is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

              At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

              28 Social Network Analysis with sna

              subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

              Example

              To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

              Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

              Rgt apply(dyadcensus(g1) 2 mean)

              Mut Asym Null100 1284 3116

              Rgt apply(triadcensus(g1) 2 mean)

              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

              Journal of Statistical Software 29

              120C 210 300030 000 000

              Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

              Rgt apply(dyadcensus(g2) 2 mean)

              Mut Asym Null884 926 2690

              Rgt apply(triadcensus(g2) 2 mean)

              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

              Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

              Rgt apply(dyadcensus(g3) 2 mean)

              Mut Asym Null894 2044 1562

              Rgt apply(triadcensus(g3) 2 mean)

              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

              Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

              + dyadictabulation = bylength)$pathcount

              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

              Rgt kcyclecensus(g3[1] maxlen = 5

              + cyclecomembership = bylength)$cyclecount

              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

              30 Social Network Analysis with sna

              Rgt componentdist(g3[1])

              $membership[1] 1 1 1 1 1 1 1 1 1 1

              $csize[1] 10

              $cdist[1] 0 0 0 0 0 0 0 0 0 1

              Rgt structurestatistics(g3[1])

              0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

              In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

              Rgt g4 lt- g1[12]

              Rgt g4[2] lt- g2[1]

              Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

              + g1 = 1 g2 = 2)

              Rgt summary(cug)

              CUG Test Results

              Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

              Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

              Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

              Rgt summary(cug)

              Journal of Statistical Software 31

              CUG Test Results

              Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

              Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

              A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

              25 Position and role analysis

              The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

              In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

              32 Social Network Analysis with sna

              This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

              After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

              The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

              Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

              Example

              To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

              Journal of Statistical Software 33

              with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

              Rgt gp lt- sapply(runif(20 0 1) rep 20)

              Rgt g lt- rgraph(20 tprob = gp)

              Rgt eq lt- equivclust(g)

              Rgt b lt- blockmodel(g eq h = 15)

              Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

              Rgt ge

              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

              26 Exploratory edge set comparison

              One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

              cov(GH) =

              sum(ij)

              (AG

              ij minus microG

              )(AH

              ij minus microH

              )|V | (|V | minus 1)

              (3)

              34 Social Network Analysis with sna

              where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

              (ij)AXij is the graph mean The graph variance is then cov(GG)

              and the graph correlation ρ(GH) = cov(GH)radic

              cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

              The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

              Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

              In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

              Journal of Statistical Software 35

              Example

              We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

              Rgt g1 lt- rgraph(5)

              Rgt g2 lt -rgraph(5)

              Rgt g3 lt- rmperm(g2)

              Rgt gcor(g1 g2)

              [1] -01336306

              Rgt gcor(g1 g3)

              [1] 008908708

              Rgt gcor(g2 g3)

              [1] -04583333

              Rgt gscor(g1 g2 reps = 1e5)

              [1] 05345225

              Rgt gscor(g1 g3 reps = 1e5)

              [1] 05345225

              Rgt gscor(g2 g3 reps = 1e5)

              [1] 1

              Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

              Rgt x lt- rgraph(20 4)

              Rgt y lt- x[1] + 4 x[2] + 2 x[3]

              Rgt nl lt- netlm(y x)

              Rgt summary(nl)

              36 Social Network Analysis with sna

              OLS Network Model

              Residuals0 25 50 75 100

              -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

              CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

              (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

              Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

              Test Diagnostics

              Null Hypothesis qapReplications 1000Coefficient Distribution Summary

              (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

              As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

              Rgt x lt- rgraph(20 4)

              Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

              Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

              Rgt y lt- rgraph(20 tprob = yp)

              Rgt nl lt- netlogit(y x)

              Rgt summary(nl)

              Network Logit Model

              Coefficients

              Journal of Statistical Software 37

              Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

              Goodness of Fit Statistics

              Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

              3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

              (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

              Contingency Table (predicted (rows) x actual (cols))

              0 10 0 01 39 341

              Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

              Test Diagnostics

              Null Hypothesis qapReplications 1000Distribution Summary

              (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

              It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

              38 Social Network Analysis with sna

              parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

              27 Network inference and process models

              A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

              Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

              Journal of Statistical Software 39

              of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

              )prodk

              (1minusPr(Bk)

              )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

              While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

              y =

              (wsum

              i=1

              θiWi

              )y + Xβ + ε (4)

              ε =

              (zsum

              i=1

              ψiZi

              )ε+ ν (5)

              where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

              40 Social Network Analysis with sna

              Example

              To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

              Rgt g lt- rgraph(20)

              Rgt ep lt- rbeta(20 1 25)

              Rgt em lt- rbeta(20 15 25)

              Rgt dat lt- array(dim = c(20 20 20))

              Rgt for(i in 120)

              + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

              Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

              Rgt pem lt- matrix(nrow = 20 ncol = 2)

              Rgt pem[1] lt- 2

              Rgt pem[2] lt- 11

              Rgt pep lt- matrix(nrow = 20 ncol = 2)

              Rgt pep[1] lt- 2

              Rgt pep[2] lt- 11

              Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

              + epprior = pep burntime = 300 draws = 100)

              Rgt summary(b)

              Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

              Multiple Error Probability Model

              Marginal Posterior Network Distribution

              a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

              Journal of Statistical Software 41

              a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

              a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

              Marginal Posterior Global Error Distribution

              e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

              Marginal Posterior Error Distribution (by observer)

              Probability of False Negatives (e^-)

              Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

              42 Social Network Analysis with sna

              o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

              Probability of False Positives (e^+)

              Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

              MCMC Diagnostics

              Replicate Chains 5Burn Time 300

              Journal of Statistical Software 43

              Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

              Max 1003116Med 09992194IQR 00004545115

              Rgt cor(em apply(b$em 2 median))

              [1] 09187894

              Rgt cor(ep apply(b$ep 2 median))

              [1] 0971649

              Rgt mean(apply(b$net c(2 3) median) == g)

              [1] 1

              Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

              Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

              Rgt mean(consensus(dat method = LASintersection) == g)

              [1] 07725

              Rgt mean(consensus(dat method = LASunion) == g)

              [1] 0905

              Rgt mean(consensus(dat method = centralgraph) == g)

              [1] 09575

              Rgt mean(consensus(dat method = romneybatchelder) == g)

              44 Social Network Analysis with sna

              Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

              For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

              As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

              Rgt w1 lt- rgraph(50)

              Rgt w2 lt- rgraph(50)

              Rgt x lt- matrix(rnorm(50 5) 50 5)

              Rgt r1 lt- 02

              Rgt r2 lt- 03

              Rgt sigma lt- 01

              Rgt beta lt- rnorm(5)

              Rgt nu lt- rnorm(50 0 sigma)

              Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

              Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

              Rgt fit lt- lnam(y x w1 w2)

              Rgt summary(fit)

              Calllnam(y = y x = x W1 = w1 W2 = w2)

              ResidualsMin 1Q Median 3Q Max

              -052052 -018305 001156 015557 062082

              CoefficientsEstimate Std Error Z value Pr(gt|z|)

              X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

              Journal of Statistical Software 45

              X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

              Estimate Std ErrorSigma 009597 922e-05

              Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

              Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

              In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

              3 Closing comments

              The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

              Acknowledgments

              The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

              46 Social Network Analysis with sna

              minus3 minus2 minus1 0 1 2

              minus3minus2

              minus10

              12

              Fitted vs Observed Values

              y

              y

              minus3 minus2 minus1 0 1 2

              minus02

              minus01

              00

              01

              02

              Fitted Values vs Estimated Disturbances

              y

              ν

              minus2 minus1 0 1 2

              minus04

              minus02

              00

              02

              04

              06

              Normal QminusQ Residual Plot

              Theoretical Quantiles

              Sam

              ple

              Qua

              ntile

              s

              Net Influence Plot

              Figure 6 Plot method output for lnam

              team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

              References

              Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

              Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

              Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

              Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

              Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

              Journal of Statistical Software 47

              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

              48 Social Network Analysis with sna

              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

              Journal of Statistical Software 49

              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

              50 Social Network Analysis with sna

              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

              Journal of Statistical Software 51

              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

              Affiliation

              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

              • Introduction and overview
                • Package history
                • sna and statnet
                • Functionality
                • Terminology and data representation
                  • Importing relational data into R
                      • Package highlights
                        • Random graph generation
                          • Example
                            • Visualization and data manipulation
                              • Neighborhood and ego net functions
                              • Visualization
                                • Descriptive indices
                                  • Node-level indices
                                  • Graph-level indices
                                    • Connectivity and subgraph statistics
                                      • Example
                                        • Position and role analysis
                                          • Example
                                            • Exploratory edge set comparison
                                              • Example
                                                • Network inference and process models
                                                  • Example
                                                      • Closing comments

                8 Social Network Analysis with sna

                or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

                Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

                Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

                Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

                Rgt k10

                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

                Rgt t10

                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

                Rgt n10

                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

                Journal of Statistical Software 9

                [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

                When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

                Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

                Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

                [1] 01482828

                Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

                [1] 004646465

                Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

                [1] 08052525

                By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

                More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

                Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

                + d = 015))

                10 Social Network Analysis with sna

                with the magnitude of the specified effects depending on the exact choice of parameters

                Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

                Rgt g lt- matrix(0 10 10)

                Rgt g[1] lt- 1

                Rgt g2 lt- rewirews(g 05)[1]

                Rgt g2

                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

                Rgt sum(g - g2) == 0

                [1] TRUE

                Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

                Rgt g3 lt- rmperm(g2)

                Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

                [1] TRUE

                Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

                22 Visualization and data manipulation

                Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

                Journal of Statistical Software 11

                extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

                Rgt g lt- rgraph(5)

                Rgt evaledgeperturbation(g 1 2 centralization betweenness)

                [1] 007291667

                Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

                Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

                In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

                Neighborhood and ego net functions

                The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

                12 Social Network Analysis with sna

                is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

                While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

                In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

                To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

                Rgt g lt- rgraph(10 tp = 15 9)

                Rgt gin lt- egoextract(g neighborhood = in)

                Rgt gout lt- egoextract(g neighborhood = out)

                Rgt gcomb lt- egoextract(g neighborhood = combined)

                Rgt gcomb[13]

                $`1`[1] [2] [3] [4]

                [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

                $`2`[1] [2] [3] [4]

                [1] 0 1 0 0[2] 1 0 0 0

                Journal of Statistical Software 13

                [3] 1 0 0 0[4] 1 0 1 0

                $`3`[1] [2] [3] [4]

                [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                [1] TRUE

                Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                [1] TRUE

                Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                [1] TRUE

                Rgt egosize lt- sapply(gcomb NROW)

                Rgt if(any(egosize gt 2))

                + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                8 9 10000000000 008333333 000000000

                Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                Rgt g lt- rgraph(6)

                Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                [1] TRUE

                14 Social Network Analysis with sna

                Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                [1] TRUE

                Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                + cmode = freeman) 2)

                [1] TRUE

                Rgt gapply(g c(1 2) 16 mean)

                [1] 400 300 300 550 325 325

                Rgt gapply(g c(1 2) 16 mean distance = 2)

                [1] 40 38 36 34 32 30

                To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                Rgt g lt- rgraph(10 tp = 29)

                Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                Rgt par(mfrow=c(33))

                Rgt for(i in 19)

                + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                + partial = FALSE)

                Rgt par(mfrow = c(3 3))

                Rgt for(i in 19)

                + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                Visualization

                Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                Journal of Statistical Software 15

                Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                16 Social Network Analysis with sna

                Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                Rgt g lt- rgraph(5 diag = TRUE)

                Journal of Statistical Software 17

                Default Curved Edges MDS Layout

                Circular Layout Sociomatrix

                1

                2

                3

                4

                5

                1 2 3 4 5

                1

                2

                3

                4

                5

                Multiple Options

                1

                2

                3

                4

                5

                Figure 3 Sample visualizations using gplot with multiple layout and display options

                Rgt par(mfrow = c(2 3))

                Rgt gplot(g main = Default)

                Rgt gplot(g usecurv = TRUE main = Curved Edges)

                Rgt gplot(g mode = mds main = MDS Layout)

                Rgt gplot(g mode = circle main = Circular Layout)

                Rgt plotsociomatrix(g main = Sociomatrix)

                Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                Output from the above is shown in Figure 3

                Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                Rgt gplot3d(rgws(1 5 3 1 0))

                Rgt gplot3d(rgws(1 5 3 1 005))

                18 Social Network Analysis with sna

                Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                Rgt gplot3d(rgws(1 5 3 1 02))

                Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                Rgt par(mfrow = c(1 3))

                Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                + xlab = ylab = main = gplotvertex Example)

                Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                + col = 110 sides = 312 radius = 01)

                Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                + xlab = ylab = main = gplotloop Example)

                Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                + arrowhead = TRUE)

                Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                23 Descriptive indices

                The literature of social network analysis is rich with descriptive indices of various sorts

                gplot3d1gif
                Media File (imagegif)
                gplot3d2gif
                Media File (imagegif)
                gplot3d3gif
                Media File (imagegif)

                Journal of Statistical Software 19

                minus15 minus10 minus05 00 05 10 15

                minus15

                minus10

                minus05

                00

                05

                10

                15

                gplotvertex Example

                10 12 14 16 18 20

                10

                12

                14

                16

                18

                20

                gplotarrow Example

                minus2 minus1 0 1 2

                minus2minus1

                01

                2

                gplotloop Example

                Figure 5 Examples of the use of gplot supplemental functions

                all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                Node-level indices

                Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                sum(vprimevprimeprime)subV v

                gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                20 Social Network Analysis with sna

                closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                Journal of Statistical Software 21

                the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                Rgt dat lt- rgraph(10)

                Rgt degree(dat cmode = indegree)

                [1] 4 4 8 2 4 5 4 4 3 6

                Rgt degree(dat cmode = outdegree)

                [1] 6 3 5 2 5 4 4 4 5 6

                Rgt degree(dat)

                [1] 10 7 13 4 9 9 8 8 8 12

                Rgt closeness(dat)

                [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                Rgt betweenness(dat)

                [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                Rgt stresscent(dat)

                [1] 21 6 27 1 14 15 6 7 7 21

                Rgt graphcent(dat)

                [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                Rgt evcent(dat)

                [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                22 Social Network Analysis with sna

                Rgt infocent(dat)

                [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                + evcent(dat rescale = TRUE)) lt 1e-10)

                [1] TRUE

                Rgt bonpow(dat exponent = -05)

                [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                Rgt memb lt- sample(13 10 replace = TRUE)

                Rgt summary(brokerage(dat memb))

                Gould-Fernandez Brokerage Analysis

                Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                w_I 50000 58638 27314 -03162 07518

                Journal of Statistical Software 23

                w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                Individual Properties (by Group)

                Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                t[1] -07838541[2] 14877951

                Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                24 Social Network Analysis with sna

                for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                Graph-level indices

                Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                C(G) =|V |sumi=1

                [(maxvisinV

                c (vG))minus c (vi G)

                ] (1)

                ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                C(G) = |V | [clowast(G)minus c(G)] (2)

                where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                2For instance when all vertices are automorphically equivalent

                Journal of Statistical Software 25

                centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                Rgt gden(g)

                [1] 006666667 031111111 054444444 072222222 093333333

                Rgt grecip(g)

                [1] 08666667 03777778 04888889 06666667 08666667

                Rgt grecip(g measure = edgewise)

                [1] 00000000 00000000 05306122 07692308 09285714

                Rgt grecip(g) == 1 - hierarchy(g)

                [1] TRUE TRUE TRUE TRUE TRUE

                Rgt gtrans(g)

                [1] 10000000 02957746 05047619 06809651 09326923

                Rgt gtrans(g measure = weakcensus)

                3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                26 Social Network Analysis with sna

                [1] 0 21 106 254 582

                Rgt connectedness(g)

                [1] 04666667 10000000 10000000 10000000 10000000

                Rgt efficiency(g)

                [1] 100000000 076543210 050617284 030864198 007407407

                Rgt hierarchy(g measure = krackhardt)

                [1] 10 02 00 00 00

                Rgt lubness(g)

                [1] 02 10 10 10 10

                centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                Rgt centralization(g degree cmode = outdegree)

                [1] 01728395

                Rgt centralization(g betweenness)

                [1] 0

                Rgt apply(g 1 centralization degree cmode = outdegree)

                [1] 017283951 027160494 038271605 006172840 007407407

                Rgt apply(g 1 centralization betweenness)

                [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                Journal of Statistical Software 27

                Rgt o2scent lt- function(dat tmaxdev = FALSE )

                + n lt- NROW(dat)

                + if(tmaxdev)

                + return((n-1) choose(n-1 2))

                + odeg lt- degree(dat cmode = outdegree)

                + choose(odeg 2)

                +

                Rgt apply(g 1 centralization o2scent)

                [1] 002160494 020370370 054012346 008950617 014506173

                Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                24 Connectivity and subgraph statistics

                Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                sumNj=1

                sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                28 Social Network Analysis with sna

                subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                Example

                To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                Rgt apply(dyadcensus(g1) 2 mean)

                Mut Asym Null100 1284 3116

                Rgt apply(triadcensus(g1) 2 mean)

                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                Journal of Statistical Software 29

                120C 210 300030 000 000

                Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                Rgt apply(dyadcensus(g2) 2 mean)

                Mut Asym Null884 926 2690

                Rgt apply(triadcensus(g2) 2 mean)

                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                Rgt apply(dyadcensus(g3) 2 mean)

                Mut Asym Null894 2044 1562

                Rgt apply(triadcensus(g3) 2 mean)

                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                + dyadictabulation = bylength)$pathcount

                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                Rgt kcyclecensus(g3[1] maxlen = 5

                + cyclecomembership = bylength)$cyclecount

                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                30 Social Network Analysis with sna

                Rgt componentdist(g3[1])

                $membership[1] 1 1 1 1 1 1 1 1 1 1

                $csize[1] 10

                $cdist[1] 0 0 0 0 0 0 0 0 0 1

                Rgt structurestatistics(g3[1])

                0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                Rgt g4 lt- g1[12]

                Rgt g4[2] lt- g2[1]

                Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                + g1 = 1 g2 = 2)

                Rgt summary(cug)

                CUG Test Results

                Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                Rgt summary(cug)

                Journal of Statistical Software 31

                CUG Test Results

                Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                25 Position and role analysis

                The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                32 Social Network Analysis with sna

                This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                Example

                To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                Journal of Statistical Software 33

                with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                Rgt gp lt- sapply(runif(20 0 1) rep 20)

                Rgt g lt- rgraph(20 tprob = gp)

                Rgt eq lt- equivclust(g)

                Rgt b lt- blockmodel(g eq h = 15)

                Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                Rgt ge

                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                26 Exploratory edge set comparison

                One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                cov(GH) =

                sum(ij)

                (AG

                ij minus microG

                )(AH

                ij minus microH

                )|V | (|V | minus 1)

                (3)

                34 Social Network Analysis with sna

                where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                (ij)AXij is the graph mean The graph variance is then cov(GG)

                and the graph correlation ρ(GH) = cov(GH)radic

                cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                Journal of Statistical Software 35

                Example

                We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                Rgt g1 lt- rgraph(5)

                Rgt g2 lt -rgraph(5)

                Rgt g3 lt- rmperm(g2)

                Rgt gcor(g1 g2)

                [1] -01336306

                Rgt gcor(g1 g3)

                [1] 008908708

                Rgt gcor(g2 g3)

                [1] -04583333

                Rgt gscor(g1 g2 reps = 1e5)

                [1] 05345225

                Rgt gscor(g1 g3 reps = 1e5)

                [1] 05345225

                Rgt gscor(g2 g3 reps = 1e5)

                [1] 1

                Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                Rgt x lt- rgraph(20 4)

                Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                Rgt nl lt- netlm(y x)

                Rgt summary(nl)

                36 Social Network Analysis with sna

                OLS Network Model

                Residuals0 25 50 75 100

                -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                Test Diagnostics

                Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                Rgt x lt- rgraph(20 4)

                Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                Rgt y lt- rgraph(20 tprob = yp)

                Rgt nl lt- netlogit(y x)

                Rgt summary(nl)

                Network Logit Model

                Coefficients

                Journal of Statistical Software 37

                Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                Goodness of Fit Statistics

                Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                Contingency Table (predicted (rows) x actual (cols))

                0 10 0 01 39 341

                Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                Test Diagnostics

                Null Hypothesis qapReplications 1000Distribution Summary

                (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                38 Social Network Analysis with sna

                parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                27 Network inference and process models

                A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                Journal of Statistical Software 39

                of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                )prodk

                (1minusPr(Bk)

                )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                y =

                (wsum

                i=1

                θiWi

                )y + Xβ + ε (4)

                ε =

                (zsum

                i=1

                ψiZi

                )ε+ ν (5)

                where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                40 Social Network Analysis with sna

                Example

                To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                Rgt g lt- rgraph(20)

                Rgt ep lt- rbeta(20 1 25)

                Rgt em lt- rbeta(20 15 25)

                Rgt dat lt- array(dim = c(20 20 20))

                Rgt for(i in 120)

                + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                Rgt pem lt- matrix(nrow = 20 ncol = 2)

                Rgt pem[1] lt- 2

                Rgt pem[2] lt- 11

                Rgt pep lt- matrix(nrow = 20 ncol = 2)

                Rgt pep[1] lt- 2

                Rgt pep[2] lt- 11

                Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                + epprior = pep burntime = 300 draws = 100)

                Rgt summary(b)

                Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                Multiple Error Probability Model

                Marginal Posterior Network Distribution

                a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                Journal of Statistical Software 41

                a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                Marginal Posterior Global Error Distribution

                e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                Marginal Posterior Error Distribution (by observer)

                Probability of False Negatives (e^-)

                Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                42 Social Network Analysis with sna

                o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                Probability of False Positives (e^+)

                Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                MCMC Diagnostics

                Replicate Chains 5Burn Time 300

                Journal of Statistical Software 43

                Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                Max 1003116Med 09992194IQR 00004545115

                Rgt cor(em apply(b$em 2 median))

                [1] 09187894

                Rgt cor(ep apply(b$ep 2 median))

                [1] 0971649

                Rgt mean(apply(b$net c(2 3) median) == g)

                [1] 1

                Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                Rgt mean(consensus(dat method = LASintersection) == g)

                [1] 07725

                Rgt mean(consensus(dat method = LASunion) == g)

                [1] 0905

                Rgt mean(consensus(dat method = centralgraph) == g)

                [1] 09575

                Rgt mean(consensus(dat method = romneybatchelder) == g)

                44 Social Network Analysis with sna

                Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                Rgt w1 lt- rgraph(50)

                Rgt w2 lt- rgraph(50)

                Rgt x lt- matrix(rnorm(50 5) 50 5)

                Rgt r1 lt- 02

                Rgt r2 lt- 03

                Rgt sigma lt- 01

                Rgt beta lt- rnorm(5)

                Rgt nu lt- rnorm(50 0 sigma)

                Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                Rgt fit lt- lnam(y x w1 w2)

                Rgt summary(fit)

                Calllnam(y = y x = x W1 = w1 W2 = w2)

                ResidualsMin 1Q Median 3Q Max

                -052052 -018305 001156 015557 062082

                CoefficientsEstimate Std Error Z value Pr(gt|z|)

                X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                Journal of Statistical Software 45

                X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                Estimate Std ErrorSigma 009597 922e-05

                Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                3 Closing comments

                The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                Acknowledgments

                The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                46 Social Network Analysis with sna

                minus3 minus2 minus1 0 1 2

                minus3minus2

                minus10

                12

                Fitted vs Observed Values

                y

                y

                minus3 minus2 minus1 0 1 2

                minus02

                minus01

                00

                01

                02

                Fitted Values vs Estimated Disturbances

                y

                ν

                minus2 minus1 0 1 2

                minus04

                minus02

                00

                02

                04

                06

                Normal QminusQ Residual Plot

                Theoretical Quantiles

                Sam

                ple

                Qua

                ntile

                s

                Net Influence Plot

                Figure 6 Plot method output for lnam

                team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                References

                Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                Journal of Statistical Software 47

                Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                48 Social Network Analysis with sna

                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                Journal of Statistical Software 49

                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                50 Social Network Analysis with sna

                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                Journal of Statistical Software 51

                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                Affiliation

                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                • Introduction and overview
                  • Package history
                  • sna and statnet
                  • Functionality
                  • Terminology and data representation
                    • Importing relational data into R
                        • Package highlights
                          • Random graph generation
                            • Example
                              • Visualization and data manipulation
                                • Neighborhood and ego net functions
                                • Visualization
                                  • Descriptive indices
                                    • Node-level indices
                                    • Graph-level indices
                                      • Connectivity and subgraph statistics
                                        • Example
                                          • Position and role analysis
                                            • Example
                                              • Exploratory edge set comparison
                                                • Example
                                                  • Network inference and process models
                                                    • Example
                                                        • Closing comments

                  Journal of Statistical Software 9

                  [9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

                  When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

                  Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

                  Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

                  [1] 01482828

                  Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

                  [1] 004646465

                  Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

                  [1] 08052525

                  By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

                  More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

                  Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

                  + d = 015))

                  10 Social Network Analysis with sna

                  with the magnitude of the specified effects depending on the exact choice of parameters

                  Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

                  Rgt g lt- matrix(0 10 10)

                  Rgt g[1] lt- 1

                  Rgt g2 lt- rewirews(g 05)[1]

                  Rgt g2

                  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

                  Rgt sum(g - g2) == 0

                  [1] TRUE

                  Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

                  Rgt g3 lt- rmperm(g2)

                  Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

                  [1] TRUE

                  Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

                  22 Visualization and data manipulation

                  Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

                  Journal of Statistical Software 11

                  extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

                  Rgt g lt- rgraph(5)

                  Rgt evaledgeperturbation(g 1 2 centralization betweenness)

                  [1] 007291667

                  Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

                  Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

                  In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

                  Neighborhood and ego net functions

                  The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

                  12 Social Network Analysis with sna

                  is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

                  While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

                  In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

                  To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

                  Rgt g lt- rgraph(10 tp = 15 9)

                  Rgt gin lt- egoextract(g neighborhood = in)

                  Rgt gout lt- egoextract(g neighborhood = out)

                  Rgt gcomb lt- egoextract(g neighborhood = combined)

                  Rgt gcomb[13]

                  $`1`[1] [2] [3] [4]

                  [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

                  $`2`[1] [2] [3] [4]

                  [1] 0 1 0 0[2] 1 0 0 0

                  Journal of Statistical Software 13

                  [3] 1 0 0 0[4] 1 0 1 0

                  $`3`[1] [2] [3] [4]

                  [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                  Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                  [1] TRUE

                  Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                  [1] TRUE

                  Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                  [1] TRUE

                  Rgt egosize lt- sapply(gcomb NROW)

                  Rgt if(any(egosize gt 2))

                  + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                  1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                  8 9 10000000000 008333333 000000000

                  Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                  Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                  Rgt g lt- rgraph(6)

                  Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                  [1] TRUE

                  14 Social Network Analysis with sna

                  Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                  [1] TRUE

                  Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                  + cmode = freeman) 2)

                  [1] TRUE

                  Rgt gapply(g c(1 2) 16 mean)

                  [1] 400 300 300 550 325 325

                  Rgt gapply(g c(1 2) 16 mean distance = 2)

                  [1] 40 38 36 34 32 30

                  To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                  Rgt g lt- rgraph(10 tp = 29)

                  Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                  Rgt par(mfrow=c(33))

                  Rgt for(i in 19)

                  + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                  Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                  + partial = FALSE)

                  Rgt par(mfrow = c(3 3))

                  Rgt for(i in 19)

                  + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                  Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                  Visualization

                  Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                  Journal of Statistical Software 15

                  Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                  Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                  Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                  Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                  in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                  and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                  While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                  Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                  16 Social Network Analysis with sna

                  Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                  Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                  Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                  Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                  elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                  All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                  gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                  Rgt g lt- rgraph(5 diag = TRUE)

                  Journal of Statistical Software 17

                  Default Curved Edges MDS Layout

                  Circular Layout Sociomatrix

                  1

                  2

                  3

                  4

                  5

                  1 2 3 4 5

                  1

                  2

                  3

                  4

                  5

                  Multiple Options

                  1

                  2

                  3

                  4

                  5

                  Figure 3 Sample visualizations using gplot with multiple layout and display options

                  Rgt par(mfrow = c(2 3))

                  Rgt gplot(g main = Default)

                  Rgt gplot(g usecurv = TRUE main = Curved Edges)

                  Rgt gplot(g mode = mds main = MDS Layout)

                  Rgt gplot(g mode = circle main = Circular Layout)

                  Rgt plotsociomatrix(g main = Sociomatrix)

                  Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                  + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                  + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                  Output from the above is shown in Figure 3

                  Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                  Rgt gplot3d(rgws(1 5 3 1 0))

                  Rgt gplot3d(rgws(1 5 3 1 005))

                  18 Social Network Analysis with sna

                  Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                  Rgt gplot3d(rgws(1 5 3 1 02))

                  Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                  As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                  Rgt par(mfrow = c(1 3))

                  Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                  + xlab = ylab = main = gplotvertex Example)

                  Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                  + col = 110 sides = 312 radius = 01)

                  Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                  Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                  Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                  + xlab = ylab = main = gplotloop Example)

                  Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                  + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                  + arrowhead = TRUE)

                  Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                  + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                  The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                  23 Descriptive indices

                  The literature of social network analysis is rich with descriptive indices of various sorts

                  gplot3d1gif
                  Media File (imagegif)
                  gplot3d2gif
                  Media File (imagegif)
                  gplot3d3gif
                  Media File (imagegif)

                  Journal of Statistical Software 19

                  minus15 minus10 minus05 00 05 10 15

                  minus15

                  minus10

                  minus05

                  00

                  05

                  10

                  15

                  gplotvertex Example

                  10 12 14 16 18 20

                  10

                  12

                  14

                  16

                  18

                  20

                  gplotarrow Example

                  minus2 minus1 0 1 2

                  minus2minus1

                  01

                  2

                  gplotloop Example

                  Figure 5 Examples of the use of gplot supplemental functions

                  all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                  Node-level indices

                  Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                  sum(vprimevprimeprime)subV v

                  gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                  G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                  equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                  vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                  20 Social Network Analysis with sna

                  closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                  Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                  An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                  Journal of Statistical Software 21

                  the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                  To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                  Rgt dat lt- rgraph(10)

                  Rgt degree(dat cmode = indegree)

                  [1] 4 4 8 2 4 5 4 4 3 6

                  Rgt degree(dat cmode = outdegree)

                  [1] 6 3 5 2 5 4 4 4 5 6

                  Rgt degree(dat)

                  [1] 10 7 13 4 9 9 8 8 8 12

                  Rgt closeness(dat)

                  [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                  Rgt betweenness(dat)

                  [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                  Rgt stresscent(dat)

                  [1] 21 6 27 1 14 15 6 7 7 21

                  Rgt graphcent(dat)

                  [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                  Rgt evcent(dat)

                  [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                  22 Social Network Analysis with sna

                  Rgt infocent(dat)

                  [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                  As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                  Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                  [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                  Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                  + evcent(dat rescale = TRUE)) lt 1e-10)

                  [1] TRUE

                  Rgt bonpow(dat exponent = -05)

                  [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                  As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                  Rgt memb lt- sample(13 10 replace = TRUE)

                  Rgt summary(brokerage(dat memb))

                  Gould-Fernandez Brokerage Analysis

                  Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                  w_I 50000 58638 27314 -03162 07518

                  Journal of Statistical Software 23

                  w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                  Individual Properties (by Group)

                  Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                  [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                  b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                  Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                  [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                  t[1] -07838541[2] 14877951

                  Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                  [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                  b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                  Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                  24 Social Network Analysis with sna

                  for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                  Graph-level indices

                  Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                  C(G) =|V |sumi=1

                  [(maxvisinV

                  c (vG))minus c (vi G)

                  ] (1)

                  ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                  C(G) = |V | [clowast(G)minus c(G)] (2)

                  where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                  i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                  2For instance when all vertices are automorphically equivalent

                  Journal of Statistical Software 25

                  centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                  although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                  In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                  The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                  Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                  Rgt gden(g)

                  [1] 006666667 031111111 054444444 072222222 093333333

                  Rgt grecip(g)

                  [1] 08666667 03777778 04888889 06666667 08666667

                  Rgt grecip(g measure = edgewise)

                  [1] 00000000 00000000 05306122 07692308 09285714

                  Rgt grecip(g) == 1 - hierarchy(g)

                  [1] TRUE TRUE TRUE TRUE TRUE

                  Rgt gtrans(g)

                  [1] 10000000 02957746 05047619 06809651 09326923

                  Rgt gtrans(g measure = weakcensus)

                  3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                  26 Social Network Analysis with sna

                  [1] 0 21 106 254 582

                  Rgt connectedness(g)

                  [1] 04666667 10000000 10000000 10000000 10000000

                  Rgt efficiency(g)

                  [1] 100000000 076543210 050617284 030864198 007407407

                  Rgt hierarchy(g measure = krackhardt)

                  [1] 10 02 00 00 00

                  Rgt lubness(g)

                  [1] 02 10 10 10 10

                  centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                  Rgt centralization(g degree cmode = outdegree)

                  [1] 01728395

                  Rgt centralization(g betweenness)

                  [1] 0

                  Rgt apply(g 1 centralization degree cmode = outdegree)

                  [1] 017283951 027160494 038271605 006172840 007407407

                  Rgt apply(g 1 centralization betweenness)

                  [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                  As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                  Journal of Statistical Software 27

                  Rgt o2scent lt- function(dat tmaxdev = FALSE )

                  + n lt- NROW(dat)

                  + if(tmaxdev)

                  + return((n-1) choose(n-1 2))

                  + odeg lt- degree(dat cmode = outdegree)

                  + choose(odeg 2)

                  +

                  Rgt apply(g 1 centralization o2scent)

                  [1] 002160494 020370370 054012346 008950617 014506173

                  Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                  24 Connectivity and subgraph statistics

                  Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                  sumNj=1

                  sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                  is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                  At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                  28 Social Network Analysis with sna

                  subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                  Example

                  To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                  Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                  Rgt apply(dyadcensus(g1) 2 mean)

                  Mut Asym Null100 1284 3116

                  Rgt apply(triadcensus(g1) 2 mean)

                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                  Journal of Statistical Software 29

                  120C 210 300030 000 000

                  Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                  Rgt apply(dyadcensus(g2) 2 mean)

                  Mut Asym Null884 926 2690

                  Rgt apply(triadcensus(g2) 2 mean)

                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                  Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                  Rgt apply(dyadcensus(g3) 2 mean)

                  Mut Asym Null894 2044 1562

                  Rgt apply(triadcensus(g3) 2 mean)

                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                  Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                  + dyadictabulation = bylength)$pathcount

                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                  Rgt kcyclecensus(g3[1] maxlen = 5

                  + cyclecomembership = bylength)$cyclecount

                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                  30 Social Network Analysis with sna

                  Rgt componentdist(g3[1])

                  $membership[1] 1 1 1 1 1 1 1 1 1 1

                  $csize[1] 10

                  $cdist[1] 0 0 0 0 0 0 0 0 0 1

                  Rgt structurestatistics(g3[1])

                  0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                  In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                  Rgt g4 lt- g1[12]

                  Rgt g4[2] lt- g2[1]

                  Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                  + g1 = 1 g2 = 2)

                  Rgt summary(cug)

                  CUG Test Results

                  Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                  Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                  Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                  Rgt summary(cug)

                  Journal of Statistical Software 31

                  CUG Test Results

                  Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                  Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                  A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                  25 Position and role analysis

                  The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                  In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                  32 Social Network Analysis with sna

                  This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                  After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                  The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                  Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                  Example

                  To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                  Journal of Statistical Software 33

                  with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                  Rgt gp lt- sapply(runif(20 0 1) rep 20)

                  Rgt g lt- rgraph(20 tprob = gp)

                  Rgt eq lt- equivclust(g)

                  Rgt b lt- blockmodel(g eq h = 15)

                  Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                  Rgt ge

                  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                  26 Exploratory edge set comparison

                  One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                  cov(GH) =

                  sum(ij)

                  (AG

                  ij minus microG

                  )(AH

                  ij minus microH

                  )|V | (|V | minus 1)

                  (3)

                  34 Social Network Analysis with sna

                  where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                  (ij)AXij is the graph mean The graph variance is then cov(GG)

                  and the graph correlation ρ(GH) = cov(GH)radic

                  cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                  The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                  Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                  In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                  Journal of Statistical Software 35

                  Example

                  We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                  Rgt g1 lt- rgraph(5)

                  Rgt g2 lt -rgraph(5)

                  Rgt g3 lt- rmperm(g2)

                  Rgt gcor(g1 g2)

                  [1] -01336306

                  Rgt gcor(g1 g3)

                  [1] 008908708

                  Rgt gcor(g2 g3)

                  [1] -04583333

                  Rgt gscor(g1 g2 reps = 1e5)

                  [1] 05345225

                  Rgt gscor(g1 g3 reps = 1e5)

                  [1] 05345225

                  Rgt gscor(g2 g3 reps = 1e5)

                  [1] 1

                  Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                  Rgt x lt- rgraph(20 4)

                  Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                  Rgt nl lt- netlm(y x)

                  Rgt summary(nl)

                  36 Social Network Analysis with sna

                  OLS Network Model

                  Residuals0 25 50 75 100

                  -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                  CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                  (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                  Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                  Test Diagnostics

                  Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                  (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                  As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                  Rgt x lt- rgraph(20 4)

                  Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                  Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                  Rgt y lt- rgraph(20 tprob = yp)

                  Rgt nl lt- netlogit(y x)

                  Rgt summary(nl)

                  Network Logit Model

                  Coefficients

                  Journal of Statistical Software 37

                  Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                  Goodness of Fit Statistics

                  Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                  3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                  (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                  Contingency Table (predicted (rows) x actual (cols))

                  0 10 0 01 39 341

                  Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                  Test Diagnostics

                  Null Hypothesis qapReplications 1000Distribution Summary

                  (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                  It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                  38 Social Network Analysis with sna

                  parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                  27 Network inference and process models

                  A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                  Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                  Journal of Statistical Software 39

                  of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                  )prodk

                  (1minusPr(Bk)

                  )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                  While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                  y =

                  (wsum

                  i=1

                  θiWi

                  )y + Xβ + ε (4)

                  ε =

                  (zsum

                  i=1

                  ψiZi

                  )ε+ ν (5)

                  where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                  40 Social Network Analysis with sna

                  Example

                  To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                  Rgt g lt- rgraph(20)

                  Rgt ep lt- rbeta(20 1 25)

                  Rgt em lt- rbeta(20 15 25)

                  Rgt dat lt- array(dim = c(20 20 20))

                  Rgt for(i in 120)

                  + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                  Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                  Rgt pem lt- matrix(nrow = 20 ncol = 2)

                  Rgt pem[1] lt- 2

                  Rgt pem[2] lt- 11

                  Rgt pep lt- matrix(nrow = 20 ncol = 2)

                  Rgt pep[1] lt- 2

                  Rgt pep[2] lt- 11

                  Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                  + epprior = pep burntime = 300 draws = 100)

                  Rgt summary(b)

                  Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                  Multiple Error Probability Model

                  Marginal Posterior Network Distribution

                  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                  Journal of Statistical Software 41

                  a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                  a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                  Marginal Posterior Global Error Distribution

                  e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                  Marginal Posterior Error Distribution (by observer)

                  Probability of False Negatives (e^-)

                  Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                  42 Social Network Analysis with sna

                  o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                  Probability of False Positives (e^+)

                  Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                  MCMC Diagnostics

                  Replicate Chains 5Burn Time 300

                  Journal of Statistical Software 43

                  Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                  Max 1003116Med 09992194IQR 00004545115

                  Rgt cor(em apply(b$em 2 median))

                  [1] 09187894

                  Rgt cor(ep apply(b$ep 2 median))

                  [1] 0971649

                  Rgt mean(apply(b$net c(2 3) median) == g)

                  [1] 1

                  Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                  Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                  Rgt mean(consensus(dat method = LASintersection) == g)

                  [1] 07725

                  Rgt mean(consensus(dat method = LASunion) == g)

                  [1] 0905

                  Rgt mean(consensus(dat method = centralgraph) == g)

                  [1] 09575

                  Rgt mean(consensus(dat method = romneybatchelder) == g)

                  44 Social Network Analysis with sna

                  Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                  For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                  As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                  Rgt w1 lt- rgraph(50)

                  Rgt w2 lt- rgraph(50)

                  Rgt x lt- matrix(rnorm(50 5) 50 5)

                  Rgt r1 lt- 02

                  Rgt r2 lt- 03

                  Rgt sigma lt- 01

                  Rgt beta lt- rnorm(5)

                  Rgt nu lt- rnorm(50 0 sigma)

                  Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                  Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                  Rgt fit lt- lnam(y x w1 w2)

                  Rgt summary(fit)

                  Calllnam(y = y x = x W1 = w1 W2 = w2)

                  ResidualsMin 1Q Median 3Q Max

                  -052052 -018305 001156 015557 062082

                  CoefficientsEstimate Std Error Z value Pr(gt|z|)

                  X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                  Journal of Statistical Software 45

                  X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                  Estimate Std ErrorSigma 009597 922e-05

                  Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                  Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                  In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                  3 Closing comments

                  The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                  Acknowledgments

                  The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                  46 Social Network Analysis with sna

                  minus3 minus2 minus1 0 1 2

                  minus3minus2

                  minus10

                  12

                  Fitted vs Observed Values

                  y

                  y

                  minus3 minus2 minus1 0 1 2

                  minus02

                  minus01

                  00

                  01

                  02

                  Fitted Values vs Estimated Disturbances

                  y

                  ν

                  minus2 minus1 0 1 2

                  minus04

                  minus02

                  00

                  02

                  04

                  06

                  Normal QminusQ Residual Plot

                  Theoretical Quantiles

                  Sam

                  ple

                  Qua

                  ntile

                  s

                  Net Influence Plot

                  Figure 6 Plot method output for lnam

                  team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                  References

                  Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                  Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                  Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                  Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                  Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                  Journal of Statistical Software 47

                  Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                  Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                  Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                  Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                  Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                  Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                  Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                  Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                  Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                  Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                  Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                  Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                  Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                  Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                  Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                  Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                  Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                  48 Social Network Analysis with sna

                  Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                  Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                  Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                  Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                  Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                  Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                  Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                  Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                  Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                  Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                  Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                  Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                  Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                  Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                  Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                  Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                  Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                  Journal of Statistical Software 49

                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                  50 Social Network Analysis with sna

                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                  Journal of Statistical Software 51

                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                  Affiliation

                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                  • Introduction and overview
                    • Package history
                    • sna and statnet
                    • Functionality
                    • Terminology and data representation
                      • Importing relational data into R
                          • Package highlights
                            • Random graph generation
                              • Example
                                • Visualization and data manipulation
                                  • Neighborhood and ego net functions
                                  • Visualization
                                    • Descriptive indices
                                      • Node-level indices
                                      • Graph-level indices
                                        • Connectivity and subgraph statistics
                                          • Example
                                            • Position and role analysis
                                              • Example
                                                • Exploratory edge set comparison
                                                  • Example
                                                    • Network inference and process models
                                                      • Example
                                                          • Closing comments

                    10 Social Network Analysis with sna

                    with the magnitude of the specified effects depending on the exact choice of parameters

                    Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

                    Rgt g lt- matrix(0 10 10)

                    Rgt g[1] lt- 1

                    Rgt g2 lt- rewirews(g 05)[1]

                    Rgt g2

                    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

                    Rgt sum(g - g2) == 0

                    [1] TRUE

                    Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

                    Rgt g3 lt- rmperm(g2)

                    Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

                    [1] TRUE

                    Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

                    22 Visualization and data manipulation

                    Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

                    Journal of Statistical Software 11

                    extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

                    Rgt g lt- rgraph(5)

                    Rgt evaledgeperturbation(g 1 2 centralization betweenness)

                    [1] 007291667

                    Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

                    Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

                    In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

                    Neighborhood and ego net functions

                    The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

                    12 Social Network Analysis with sna

                    is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

                    While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

                    In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

                    To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

                    Rgt g lt- rgraph(10 tp = 15 9)

                    Rgt gin lt- egoextract(g neighborhood = in)

                    Rgt gout lt- egoextract(g neighborhood = out)

                    Rgt gcomb lt- egoextract(g neighborhood = combined)

                    Rgt gcomb[13]

                    $`1`[1] [2] [3] [4]

                    [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

                    $`2`[1] [2] [3] [4]

                    [1] 0 1 0 0[2] 1 0 0 0

                    Journal of Statistical Software 13

                    [3] 1 0 0 0[4] 1 0 1 0

                    $`3`[1] [2] [3] [4]

                    [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                    Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                    [1] TRUE

                    Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                    [1] TRUE

                    Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                    [1] TRUE

                    Rgt egosize lt- sapply(gcomb NROW)

                    Rgt if(any(egosize gt 2))

                    + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                    1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                    8 9 10000000000 008333333 000000000

                    Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                    Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                    Rgt g lt- rgraph(6)

                    Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                    [1] TRUE

                    14 Social Network Analysis with sna

                    Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                    [1] TRUE

                    Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                    + cmode = freeman) 2)

                    [1] TRUE

                    Rgt gapply(g c(1 2) 16 mean)

                    [1] 400 300 300 550 325 325

                    Rgt gapply(g c(1 2) 16 mean distance = 2)

                    [1] 40 38 36 34 32 30

                    To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                    Rgt g lt- rgraph(10 tp = 29)

                    Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                    Rgt par(mfrow=c(33))

                    Rgt for(i in 19)

                    + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                    Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                    + partial = FALSE)

                    Rgt par(mfrow = c(3 3))

                    Rgt for(i in 19)

                    + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                    Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                    Visualization

                    Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                    Journal of Statistical Software 15

                    Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                    Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                    Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                    Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                    in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                    and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                    While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                    Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                    16 Social Network Analysis with sna

                    Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                    Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                    Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                    Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                    elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                    All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                    gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                    Rgt g lt- rgraph(5 diag = TRUE)

                    Journal of Statistical Software 17

                    Default Curved Edges MDS Layout

                    Circular Layout Sociomatrix

                    1

                    2

                    3

                    4

                    5

                    1 2 3 4 5

                    1

                    2

                    3

                    4

                    5

                    Multiple Options

                    1

                    2

                    3

                    4

                    5

                    Figure 3 Sample visualizations using gplot with multiple layout and display options

                    Rgt par(mfrow = c(2 3))

                    Rgt gplot(g main = Default)

                    Rgt gplot(g usecurv = TRUE main = Curved Edges)

                    Rgt gplot(g mode = mds main = MDS Layout)

                    Rgt gplot(g mode = circle main = Circular Layout)

                    Rgt plotsociomatrix(g main = Sociomatrix)

                    Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                    + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                    + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                    Output from the above is shown in Figure 3

                    Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                    Rgt gplot3d(rgws(1 5 3 1 0))

                    Rgt gplot3d(rgws(1 5 3 1 005))

                    18 Social Network Analysis with sna

                    Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                    Rgt gplot3d(rgws(1 5 3 1 02))

                    Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                    As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                    Rgt par(mfrow = c(1 3))

                    Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                    + xlab = ylab = main = gplotvertex Example)

                    Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                    + col = 110 sides = 312 radius = 01)

                    Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                    Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                    Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                    + xlab = ylab = main = gplotloop Example)

                    Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                    + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                    + arrowhead = TRUE)

                    Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                    + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                    The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                    23 Descriptive indices

                    The literature of social network analysis is rich with descriptive indices of various sorts

                    gplot3d1gif
                    Media File (imagegif)
                    gplot3d2gif
                    Media File (imagegif)
                    gplot3d3gif
                    Media File (imagegif)

                    Journal of Statistical Software 19

                    minus15 minus10 minus05 00 05 10 15

                    minus15

                    minus10

                    minus05

                    00

                    05

                    10

                    15

                    gplotvertex Example

                    10 12 14 16 18 20

                    10

                    12

                    14

                    16

                    18

                    20

                    gplotarrow Example

                    minus2 minus1 0 1 2

                    minus2minus1

                    01

                    2

                    gplotloop Example

                    Figure 5 Examples of the use of gplot supplemental functions

                    all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                    Node-level indices

                    Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                    sum(vprimevprimeprime)subV v

                    gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                    G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                    equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                    vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                    20 Social Network Analysis with sna

                    closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                    Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                    An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                    Journal of Statistical Software 21

                    the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                    To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                    Rgt dat lt- rgraph(10)

                    Rgt degree(dat cmode = indegree)

                    [1] 4 4 8 2 4 5 4 4 3 6

                    Rgt degree(dat cmode = outdegree)

                    [1] 6 3 5 2 5 4 4 4 5 6

                    Rgt degree(dat)

                    [1] 10 7 13 4 9 9 8 8 8 12

                    Rgt closeness(dat)

                    [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                    Rgt betweenness(dat)

                    [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                    Rgt stresscent(dat)

                    [1] 21 6 27 1 14 15 6 7 7 21

                    Rgt graphcent(dat)

                    [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                    Rgt evcent(dat)

                    [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                    22 Social Network Analysis with sna

                    Rgt infocent(dat)

                    [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                    As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                    Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                    [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                    Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                    + evcent(dat rescale = TRUE)) lt 1e-10)

                    [1] TRUE

                    Rgt bonpow(dat exponent = -05)

                    [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                    As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                    Rgt memb lt- sample(13 10 replace = TRUE)

                    Rgt summary(brokerage(dat memb))

                    Gould-Fernandez Brokerage Analysis

                    Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                    w_I 50000 58638 27314 -03162 07518

                    Journal of Statistical Software 23

                    w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                    Individual Properties (by Group)

                    Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                    [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                    b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                    Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                    [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                    t[1] -07838541[2] 14877951

                    Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                    [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                    b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                    Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                    24 Social Network Analysis with sna

                    for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                    Graph-level indices

                    Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                    C(G) =|V |sumi=1

                    [(maxvisinV

                    c (vG))minus c (vi G)

                    ] (1)

                    ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                    C(G) = |V | [clowast(G)minus c(G)] (2)

                    where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                    i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                    2For instance when all vertices are automorphically equivalent

                    Journal of Statistical Software 25

                    centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                    although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                    In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                    The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                    Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                    Rgt gden(g)

                    [1] 006666667 031111111 054444444 072222222 093333333

                    Rgt grecip(g)

                    [1] 08666667 03777778 04888889 06666667 08666667

                    Rgt grecip(g measure = edgewise)

                    [1] 00000000 00000000 05306122 07692308 09285714

                    Rgt grecip(g) == 1 - hierarchy(g)

                    [1] TRUE TRUE TRUE TRUE TRUE

                    Rgt gtrans(g)

                    [1] 10000000 02957746 05047619 06809651 09326923

                    Rgt gtrans(g measure = weakcensus)

                    3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                    26 Social Network Analysis with sna

                    [1] 0 21 106 254 582

                    Rgt connectedness(g)

                    [1] 04666667 10000000 10000000 10000000 10000000

                    Rgt efficiency(g)

                    [1] 100000000 076543210 050617284 030864198 007407407

                    Rgt hierarchy(g measure = krackhardt)

                    [1] 10 02 00 00 00

                    Rgt lubness(g)

                    [1] 02 10 10 10 10

                    centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                    Rgt centralization(g degree cmode = outdegree)

                    [1] 01728395

                    Rgt centralization(g betweenness)

                    [1] 0

                    Rgt apply(g 1 centralization degree cmode = outdegree)

                    [1] 017283951 027160494 038271605 006172840 007407407

                    Rgt apply(g 1 centralization betweenness)

                    [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                    As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                    Journal of Statistical Software 27

                    Rgt o2scent lt- function(dat tmaxdev = FALSE )

                    + n lt- NROW(dat)

                    + if(tmaxdev)

                    + return((n-1) choose(n-1 2))

                    + odeg lt- degree(dat cmode = outdegree)

                    + choose(odeg 2)

                    +

                    Rgt apply(g 1 centralization o2scent)

                    [1] 002160494 020370370 054012346 008950617 014506173

                    Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                    24 Connectivity and subgraph statistics

                    Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                    sumNj=1

                    sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                    is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                    At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                    28 Social Network Analysis with sna

                    subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                    Example

                    To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                    Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                    Rgt apply(dyadcensus(g1) 2 mean)

                    Mut Asym Null100 1284 3116

                    Rgt apply(triadcensus(g1) 2 mean)

                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                    Journal of Statistical Software 29

                    120C 210 300030 000 000

                    Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                    Rgt apply(dyadcensus(g2) 2 mean)

                    Mut Asym Null884 926 2690

                    Rgt apply(triadcensus(g2) 2 mean)

                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                    Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                    Rgt apply(dyadcensus(g3) 2 mean)

                    Mut Asym Null894 2044 1562

                    Rgt apply(triadcensus(g3) 2 mean)

                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                    Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                    + dyadictabulation = bylength)$pathcount

                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                    Rgt kcyclecensus(g3[1] maxlen = 5

                    + cyclecomembership = bylength)$cyclecount

                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                    30 Social Network Analysis with sna

                    Rgt componentdist(g3[1])

                    $membership[1] 1 1 1 1 1 1 1 1 1 1

                    $csize[1] 10

                    $cdist[1] 0 0 0 0 0 0 0 0 0 1

                    Rgt structurestatistics(g3[1])

                    0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                    In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                    Rgt g4 lt- g1[12]

                    Rgt g4[2] lt- g2[1]

                    Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                    + g1 = 1 g2 = 2)

                    Rgt summary(cug)

                    CUG Test Results

                    Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                    Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                    Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                    Rgt summary(cug)

                    Journal of Statistical Software 31

                    CUG Test Results

                    Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                    Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                    A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                    25 Position and role analysis

                    The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                    In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                    32 Social Network Analysis with sna

                    This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                    After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                    The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                    Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                    Example

                    To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                    Journal of Statistical Software 33

                    with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                    Rgt gp lt- sapply(runif(20 0 1) rep 20)

                    Rgt g lt- rgraph(20 tprob = gp)

                    Rgt eq lt- equivclust(g)

                    Rgt b lt- blockmodel(g eq h = 15)

                    Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                    Rgt ge

                    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                    26 Exploratory edge set comparison

                    One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                    cov(GH) =

                    sum(ij)

                    (AG

                    ij minus microG

                    )(AH

                    ij minus microH

                    )|V | (|V | minus 1)

                    (3)

                    34 Social Network Analysis with sna

                    where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                    (ij)AXij is the graph mean The graph variance is then cov(GG)

                    and the graph correlation ρ(GH) = cov(GH)radic

                    cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                    The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                    Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                    In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                    Journal of Statistical Software 35

                    Example

                    We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                    Rgt g1 lt- rgraph(5)

                    Rgt g2 lt -rgraph(5)

                    Rgt g3 lt- rmperm(g2)

                    Rgt gcor(g1 g2)

                    [1] -01336306

                    Rgt gcor(g1 g3)

                    [1] 008908708

                    Rgt gcor(g2 g3)

                    [1] -04583333

                    Rgt gscor(g1 g2 reps = 1e5)

                    [1] 05345225

                    Rgt gscor(g1 g3 reps = 1e5)

                    [1] 05345225

                    Rgt gscor(g2 g3 reps = 1e5)

                    [1] 1

                    Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                    Rgt x lt- rgraph(20 4)

                    Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                    Rgt nl lt- netlm(y x)

                    Rgt summary(nl)

                    36 Social Network Analysis with sna

                    OLS Network Model

                    Residuals0 25 50 75 100

                    -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                    CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                    (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                    Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                    Test Diagnostics

                    Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                    (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                    As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                    Rgt x lt- rgraph(20 4)

                    Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                    Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                    Rgt y lt- rgraph(20 tprob = yp)

                    Rgt nl lt- netlogit(y x)

                    Rgt summary(nl)

                    Network Logit Model

                    Coefficients

                    Journal of Statistical Software 37

                    Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                    Goodness of Fit Statistics

                    Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                    3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                    (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                    Contingency Table (predicted (rows) x actual (cols))

                    0 10 0 01 39 341

                    Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                    Test Diagnostics

                    Null Hypothesis qapReplications 1000Distribution Summary

                    (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                    It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                    38 Social Network Analysis with sna

                    parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                    27 Network inference and process models

                    A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                    Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                    Journal of Statistical Software 39

                    of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                    )prodk

                    (1minusPr(Bk)

                    )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                    While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                    y =

                    (wsum

                    i=1

                    θiWi

                    )y + Xβ + ε (4)

                    ε =

                    (zsum

                    i=1

                    ψiZi

                    )ε+ ν (5)

                    where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                    40 Social Network Analysis with sna

                    Example

                    To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                    Rgt g lt- rgraph(20)

                    Rgt ep lt- rbeta(20 1 25)

                    Rgt em lt- rbeta(20 15 25)

                    Rgt dat lt- array(dim = c(20 20 20))

                    Rgt for(i in 120)

                    + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                    Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                    Rgt pem lt- matrix(nrow = 20 ncol = 2)

                    Rgt pem[1] lt- 2

                    Rgt pem[2] lt- 11

                    Rgt pep lt- matrix(nrow = 20 ncol = 2)

                    Rgt pep[1] lt- 2

                    Rgt pep[2] lt- 11

                    Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                    + epprior = pep burntime = 300 draws = 100)

                    Rgt summary(b)

                    Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                    Multiple Error Probability Model

                    Marginal Posterior Network Distribution

                    a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                    Journal of Statistical Software 41

                    a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                    a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                    Marginal Posterior Global Error Distribution

                    e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                    Marginal Posterior Error Distribution (by observer)

                    Probability of False Negatives (e^-)

                    Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                    42 Social Network Analysis with sna

                    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                    Probability of False Positives (e^+)

                    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                    MCMC Diagnostics

                    Replicate Chains 5Burn Time 300

                    Journal of Statistical Software 43

                    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                    Max 1003116Med 09992194IQR 00004545115

                    Rgt cor(em apply(b$em 2 median))

                    [1] 09187894

                    Rgt cor(ep apply(b$ep 2 median))

                    [1] 0971649

                    Rgt mean(apply(b$net c(2 3) median) == g)

                    [1] 1

                    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                    Rgt mean(consensus(dat method = LASintersection) == g)

                    [1] 07725

                    Rgt mean(consensus(dat method = LASunion) == g)

                    [1] 0905

                    Rgt mean(consensus(dat method = centralgraph) == g)

                    [1] 09575

                    Rgt mean(consensus(dat method = romneybatchelder) == g)

                    44 Social Network Analysis with sna

                    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                    Rgt w1 lt- rgraph(50)

                    Rgt w2 lt- rgraph(50)

                    Rgt x lt- matrix(rnorm(50 5) 50 5)

                    Rgt r1 lt- 02

                    Rgt r2 lt- 03

                    Rgt sigma lt- 01

                    Rgt beta lt- rnorm(5)

                    Rgt nu lt- rnorm(50 0 sigma)

                    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                    Rgt fit lt- lnam(y x w1 w2)

                    Rgt summary(fit)

                    Calllnam(y = y x = x W1 = w1 W2 = w2)

                    ResidualsMin 1Q Median 3Q Max

                    -052052 -018305 001156 015557 062082

                    CoefficientsEstimate Std Error Z value Pr(gt|z|)

                    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                    Journal of Statistical Software 45

                    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                    Estimate Std ErrorSigma 009597 922e-05

                    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                    3 Closing comments

                    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                    Acknowledgments

                    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                    46 Social Network Analysis with sna

                    minus3 minus2 minus1 0 1 2

                    minus3minus2

                    minus10

                    12

                    Fitted vs Observed Values

                    y

                    y

                    minus3 minus2 minus1 0 1 2

                    minus02

                    minus01

                    00

                    01

                    02

                    Fitted Values vs Estimated Disturbances

                    y

                    ν

                    minus2 minus1 0 1 2

                    minus04

                    minus02

                    00

                    02

                    04

                    06

                    Normal QminusQ Residual Plot

                    Theoretical Quantiles

                    Sam

                    ple

                    Qua

                    ntile

                    s

                    Net Influence Plot

                    Figure 6 Plot method output for lnam

                    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                    References

                    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                    Journal of Statistical Software 47

                    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                    48 Social Network Analysis with sna

                    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                    Journal of Statistical Software 49

                    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                    50 Social Network Analysis with sna

                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                    Journal of Statistical Software 51

                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                    Affiliation

                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                    • Introduction and overview
                      • Package history
                      • sna and statnet
                      • Functionality
                      • Terminology and data representation
                        • Importing relational data into R
                            • Package highlights
                              • Random graph generation
                                • Example
                                  • Visualization and data manipulation
                                    • Neighborhood and ego net functions
                                    • Visualization
                                      • Descriptive indices
                                        • Node-level indices
                                        • Graph-level indices
                                          • Connectivity and subgraph statistics
                                            • Example
                                              • Position and role analysis
                                                • Example
                                                  • Exploratory edge set comparison
                                                    • Example
                                                      • Network inference and process models
                                                        • Example
                                                            • Closing comments

                      Journal of Statistical Software 11

                      extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

                      Rgt g lt- rgraph(5)

                      Rgt evaledgeperturbation(g 1 2 centralization betweenness)

                      [1] 007291667

                      Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

                      Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

                      In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

                      Neighborhood and ego net functions

                      The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

                      12 Social Network Analysis with sna

                      is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

                      While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

                      In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

                      To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

                      Rgt g lt- rgraph(10 tp = 15 9)

                      Rgt gin lt- egoextract(g neighborhood = in)

                      Rgt gout lt- egoextract(g neighborhood = out)

                      Rgt gcomb lt- egoextract(g neighborhood = combined)

                      Rgt gcomb[13]

                      $`1`[1] [2] [3] [4]

                      [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

                      $`2`[1] [2] [3] [4]

                      [1] 0 1 0 0[2] 1 0 0 0

                      Journal of Statistical Software 13

                      [3] 1 0 0 0[4] 1 0 1 0

                      $`3`[1] [2] [3] [4]

                      [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                      Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                      [1] TRUE

                      Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                      [1] TRUE

                      Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                      [1] TRUE

                      Rgt egosize lt- sapply(gcomb NROW)

                      Rgt if(any(egosize gt 2))

                      + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                      1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                      8 9 10000000000 008333333 000000000

                      Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                      Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                      Rgt g lt- rgraph(6)

                      Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                      [1] TRUE

                      14 Social Network Analysis with sna

                      Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                      [1] TRUE

                      Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                      + cmode = freeman) 2)

                      [1] TRUE

                      Rgt gapply(g c(1 2) 16 mean)

                      [1] 400 300 300 550 325 325

                      Rgt gapply(g c(1 2) 16 mean distance = 2)

                      [1] 40 38 36 34 32 30

                      To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                      Rgt g lt- rgraph(10 tp = 29)

                      Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                      Rgt par(mfrow=c(33))

                      Rgt for(i in 19)

                      + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                      Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                      + partial = FALSE)

                      Rgt par(mfrow = c(3 3))

                      Rgt for(i in 19)

                      + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                      Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                      Visualization

                      Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                      Journal of Statistical Software 15

                      Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                      Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                      Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                      Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                      in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                      and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                      While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                      Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                      16 Social Network Analysis with sna

                      Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                      Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                      Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                      Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                      elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                      All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                      gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                      Rgt g lt- rgraph(5 diag = TRUE)

                      Journal of Statistical Software 17

                      Default Curved Edges MDS Layout

                      Circular Layout Sociomatrix

                      1

                      2

                      3

                      4

                      5

                      1 2 3 4 5

                      1

                      2

                      3

                      4

                      5

                      Multiple Options

                      1

                      2

                      3

                      4

                      5

                      Figure 3 Sample visualizations using gplot with multiple layout and display options

                      Rgt par(mfrow = c(2 3))

                      Rgt gplot(g main = Default)

                      Rgt gplot(g usecurv = TRUE main = Curved Edges)

                      Rgt gplot(g mode = mds main = MDS Layout)

                      Rgt gplot(g mode = circle main = Circular Layout)

                      Rgt plotsociomatrix(g main = Sociomatrix)

                      Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                      + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                      + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                      Output from the above is shown in Figure 3

                      Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                      Rgt gplot3d(rgws(1 5 3 1 0))

                      Rgt gplot3d(rgws(1 5 3 1 005))

                      18 Social Network Analysis with sna

                      Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                      Rgt gplot3d(rgws(1 5 3 1 02))

                      Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                      As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                      Rgt par(mfrow = c(1 3))

                      Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                      + xlab = ylab = main = gplotvertex Example)

                      Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                      + col = 110 sides = 312 radius = 01)

                      Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                      Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                      Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                      + xlab = ylab = main = gplotloop Example)

                      Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                      + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                      + arrowhead = TRUE)

                      Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                      + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                      The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                      23 Descriptive indices

                      The literature of social network analysis is rich with descriptive indices of various sorts

                      gplot3d1gif
                      Media File (imagegif)
                      gplot3d2gif
                      Media File (imagegif)
                      gplot3d3gif
                      Media File (imagegif)

                      Journal of Statistical Software 19

                      minus15 minus10 minus05 00 05 10 15

                      minus15

                      minus10

                      minus05

                      00

                      05

                      10

                      15

                      gplotvertex Example

                      10 12 14 16 18 20

                      10

                      12

                      14

                      16

                      18

                      20

                      gplotarrow Example

                      minus2 minus1 0 1 2

                      minus2minus1

                      01

                      2

                      gplotloop Example

                      Figure 5 Examples of the use of gplot supplemental functions

                      all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                      Node-level indices

                      Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                      sum(vprimevprimeprime)subV v

                      gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                      G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                      equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                      vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                      20 Social Network Analysis with sna

                      closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                      Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                      An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                      Journal of Statistical Software 21

                      the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                      To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                      Rgt dat lt- rgraph(10)

                      Rgt degree(dat cmode = indegree)

                      [1] 4 4 8 2 4 5 4 4 3 6

                      Rgt degree(dat cmode = outdegree)

                      [1] 6 3 5 2 5 4 4 4 5 6

                      Rgt degree(dat)

                      [1] 10 7 13 4 9 9 8 8 8 12

                      Rgt closeness(dat)

                      [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                      Rgt betweenness(dat)

                      [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                      Rgt stresscent(dat)

                      [1] 21 6 27 1 14 15 6 7 7 21

                      Rgt graphcent(dat)

                      [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                      Rgt evcent(dat)

                      [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                      22 Social Network Analysis with sna

                      Rgt infocent(dat)

                      [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                      As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                      Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                      [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                      Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                      + evcent(dat rescale = TRUE)) lt 1e-10)

                      [1] TRUE

                      Rgt bonpow(dat exponent = -05)

                      [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                      As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                      Rgt memb lt- sample(13 10 replace = TRUE)

                      Rgt summary(brokerage(dat memb))

                      Gould-Fernandez Brokerage Analysis

                      Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                      w_I 50000 58638 27314 -03162 07518

                      Journal of Statistical Software 23

                      w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                      Individual Properties (by Group)

                      Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                      [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                      b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                      Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                      [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                      t[1] -07838541[2] 14877951

                      Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                      [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                      b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                      Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                      24 Social Network Analysis with sna

                      for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                      Graph-level indices

                      Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                      C(G) =|V |sumi=1

                      [(maxvisinV

                      c (vG))minus c (vi G)

                      ] (1)

                      ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                      C(G) = |V | [clowast(G)minus c(G)] (2)

                      where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                      i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                      2For instance when all vertices are automorphically equivalent

                      Journal of Statistical Software 25

                      centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                      although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                      In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                      The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                      Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                      Rgt gden(g)

                      [1] 006666667 031111111 054444444 072222222 093333333

                      Rgt grecip(g)

                      [1] 08666667 03777778 04888889 06666667 08666667

                      Rgt grecip(g measure = edgewise)

                      [1] 00000000 00000000 05306122 07692308 09285714

                      Rgt grecip(g) == 1 - hierarchy(g)

                      [1] TRUE TRUE TRUE TRUE TRUE

                      Rgt gtrans(g)

                      [1] 10000000 02957746 05047619 06809651 09326923

                      Rgt gtrans(g measure = weakcensus)

                      3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                      26 Social Network Analysis with sna

                      [1] 0 21 106 254 582

                      Rgt connectedness(g)

                      [1] 04666667 10000000 10000000 10000000 10000000

                      Rgt efficiency(g)

                      [1] 100000000 076543210 050617284 030864198 007407407

                      Rgt hierarchy(g measure = krackhardt)

                      [1] 10 02 00 00 00

                      Rgt lubness(g)

                      [1] 02 10 10 10 10

                      centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                      Rgt centralization(g degree cmode = outdegree)

                      [1] 01728395

                      Rgt centralization(g betweenness)

                      [1] 0

                      Rgt apply(g 1 centralization degree cmode = outdegree)

                      [1] 017283951 027160494 038271605 006172840 007407407

                      Rgt apply(g 1 centralization betweenness)

                      [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                      As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                      Journal of Statistical Software 27

                      Rgt o2scent lt- function(dat tmaxdev = FALSE )

                      + n lt- NROW(dat)

                      + if(tmaxdev)

                      + return((n-1) choose(n-1 2))

                      + odeg lt- degree(dat cmode = outdegree)

                      + choose(odeg 2)

                      +

                      Rgt apply(g 1 centralization o2scent)

                      [1] 002160494 020370370 054012346 008950617 014506173

                      Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                      24 Connectivity and subgraph statistics

                      Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                      sumNj=1

                      sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                      is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                      At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                      28 Social Network Analysis with sna

                      subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                      Example

                      To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                      Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                      Rgt apply(dyadcensus(g1) 2 mean)

                      Mut Asym Null100 1284 3116

                      Rgt apply(triadcensus(g1) 2 mean)

                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                      Journal of Statistical Software 29

                      120C 210 300030 000 000

                      Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                      Rgt apply(dyadcensus(g2) 2 mean)

                      Mut Asym Null884 926 2690

                      Rgt apply(triadcensus(g2) 2 mean)

                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                      Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                      Rgt apply(dyadcensus(g3) 2 mean)

                      Mut Asym Null894 2044 1562

                      Rgt apply(triadcensus(g3) 2 mean)

                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                      Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                      + dyadictabulation = bylength)$pathcount

                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                      Rgt kcyclecensus(g3[1] maxlen = 5

                      + cyclecomembership = bylength)$cyclecount

                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                      30 Social Network Analysis with sna

                      Rgt componentdist(g3[1])

                      $membership[1] 1 1 1 1 1 1 1 1 1 1

                      $csize[1] 10

                      $cdist[1] 0 0 0 0 0 0 0 0 0 1

                      Rgt structurestatistics(g3[1])

                      0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                      In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                      Rgt g4 lt- g1[12]

                      Rgt g4[2] lt- g2[1]

                      Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                      + g1 = 1 g2 = 2)

                      Rgt summary(cug)

                      CUG Test Results

                      Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                      Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                      Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                      Rgt summary(cug)

                      Journal of Statistical Software 31

                      CUG Test Results

                      Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                      Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                      A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                      25 Position and role analysis

                      The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                      In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                      32 Social Network Analysis with sna

                      This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                      After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                      The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                      Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                      Example

                      To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                      Journal of Statistical Software 33

                      with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                      Rgt gp lt- sapply(runif(20 0 1) rep 20)

                      Rgt g lt- rgraph(20 tprob = gp)

                      Rgt eq lt- equivclust(g)

                      Rgt b lt- blockmodel(g eq h = 15)

                      Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                      Rgt ge

                      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                      26 Exploratory edge set comparison

                      One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                      cov(GH) =

                      sum(ij)

                      (AG

                      ij minus microG

                      )(AH

                      ij minus microH

                      )|V | (|V | minus 1)

                      (3)

                      34 Social Network Analysis with sna

                      where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                      (ij)AXij is the graph mean The graph variance is then cov(GG)

                      and the graph correlation ρ(GH) = cov(GH)radic

                      cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                      The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                      Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                      In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                      Journal of Statistical Software 35

                      Example

                      We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                      Rgt g1 lt- rgraph(5)

                      Rgt g2 lt -rgraph(5)

                      Rgt g3 lt- rmperm(g2)

                      Rgt gcor(g1 g2)

                      [1] -01336306

                      Rgt gcor(g1 g3)

                      [1] 008908708

                      Rgt gcor(g2 g3)

                      [1] -04583333

                      Rgt gscor(g1 g2 reps = 1e5)

                      [1] 05345225

                      Rgt gscor(g1 g3 reps = 1e5)

                      [1] 05345225

                      Rgt gscor(g2 g3 reps = 1e5)

                      [1] 1

                      Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                      Rgt x lt- rgraph(20 4)

                      Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                      Rgt nl lt- netlm(y x)

                      Rgt summary(nl)

                      36 Social Network Analysis with sna

                      OLS Network Model

                      Residuals0 25 50 75 100

                      -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                      CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                      (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                      Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                      Test Diagnostics

                      Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                      (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                      As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                      Rgt x lt- rgraph(20 4)

                      Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                      Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                      Rgt y lt- rgraph(20 tprob = yp)

                      Rgt nl lt- netlogit(y x)

                      Rgt summary(nl)

                      Network Logit Model

                      Coefficients

                      Journal of Statistical Software 37

                      Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                      Goodness of Fit Statistics

                      Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                      3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                      (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                      Contingency Table (predicted (rows) x actual (cols))

                      0 10 0 01 39 341

                      Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                      Test Diagnostics

                      Null Hypothesis qapReplications 1000Distribution Summary

                      (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                      It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                      38 Social Network Analysis with sna

                      parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                      27 Network inference and process models

                      A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                      Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                      Journal of Statistical Software 39

                      of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                      )prodk

                      (1minusPr(Bk)

                      )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                      While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                      y =

                      (wsum

                      i=1

                      θiWi

                      )y + Xβ + ε (4)

                      ε =

                      (zsum

                      i=1

                      ψiZi

                      )ε+ ν (5)

                      where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                      40 Social Network Analysis with sna

                      Example

                      To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                      Rgt g lt- rgraph(20)

                      Rgt ep lt- rbeta(20 1 25)

                      Rgt em lt- rbeta(20 15 25)

                      Rgt dat lt- array(dim = c(20 20 20))

                      Rgt for(i in 120)

                      + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                      Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                      Rgt pem lt- matrix(nrow = 20 ncol = 2)

                      Rgt pem[1] lt- 2

                      Rgt pem[2] lt- 11

                      Rgt pep lt- matrix(nrow = 20 ncol = 2)

                      Rgt pep[1] lt- 2

                      Rgt pep[2] lt- 11

                      Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                      + epprior = pep burntime = 300 draws = 100)

                      Rgt summary(b)

                      Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                      Multiple Error Probability Model

                      Marginal Posterior Network Distribution

                      a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                      Journal of Statistical Software 41

                      a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                      a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                      Marginal Posterior Global Error Distribution

                      e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                      Marginal Posterior Error Distribution (by observer)

                      Probability of False Negatives (e^-)

                      Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                      42 Social Network Analysis with sna

                      o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                      Probability of False Positives (e^+)

                      Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                      MCMC Diagnostics

                      Replicate Chains 5Burn Time 300

                      Journal of Statistical Software 43

                      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                      Max 1003116Med 09992194IQR 00004545115

                      Rgt cor(em apply(b$em 2 median))

                      [1] 09187894

                      Rgt cor(ep apply(b$ep 2 median))

                      [1] 0971649

                      Rgt mean(apply(b$net c(2 3) median) == g)

                      [1] 1

                      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                      Rgt mean(consensus(dat method = LASintersection) == g)

                      [1] 07725

                      Rgt mean(consensus(dat method = LASunion) == g)

                      [1] 0905

                      Rgt mean(consensus(dat method = centralgraph) == g)

                      [1] 09575

                      Rgt mean(consensus(dat method = romneybatchelder) == g)

                      44 Social Network Analysis with sna

                      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                      Rgt w1 lt- rgraph(50)

                      Rgt w2 lt- rgraph(50)

                      Rgt x lt- matrix(rnorm(50 5) 50 5)

                      Rgt r1 lt- 02

                      Rgt r2 lt- 03

                      Rgt sigma lt- 01

                      Rgt beta lt- rnorm(5)

                      Rgt nu lt- rnorm(50 0 sigma)

                      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                      Rgt fit lt- lnam(y x w1 w2)

                      Rgt summary(fit)

                      Calllnam(y = y x = x W1 = w1 W2 = w2)

                      ResidualsMin 1Q Median 3Q Max

                      -052052 -018305 001156 015557 062082

                      CoefficientsEstimate Std Error Z value Pr(gt|z|)

                      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                      Journal of Statistical Software 45

                      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                      Estimate Std ErrorSigma 009597 922e-05

                      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                      3 Closing comments

                      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                      Acknowledgments

                      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                      46 Social Network Analysis with sna

                      minus3 minus2 minus1 0 1 2

                      minus3minus2

                      minus10

                      12

                      Fitted vs Observed Values

                      y

                      y

                      minus3 minus2 minus1 0 1 2

                      minus02

                      minus01

                      00

                      01

                      02

                      Fitted Values vs Estimated Disturbances

                      y

                      ν

                      minus2 minus1 0 1 2

                      minus04

                      minus02

                      00

                      02

                      04

                      06

                      Normal QminusQ Residual Plot

                      Theoretical Quantiles

                      Sam

                      ple

                      Qua

                      ntile

                      s

                      Net Influence Plot

                      Figure 6 Plot method output for lnam

                      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                      References

                      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                      Journal of Statistical Software 47

                      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                      48 Social Network Analysis with sna

                      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                      Journal of Statistical Software 49

                      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                      50 Social Network Analysis with sna

                      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                      Journal of Statistical Software 51

                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                      Affiliation

                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                      • Introduction and overview
                        • Package history
                        • sna and statnet
                        • Functionality
                        • Terminology and data representation
                          • Importing relational data into R
                              • Package highlights
                                • Random graph generation
                                  • Example
                                    • Visualization and data manipulation
                                      • Neighborhood and ego net functions
                                      • Visualization
                                        • Descriptive indices
                                          • Node-level indices
                                          • Graph-level indices
                                            • Connectivity and subgraph statistics
                                              • Example
                                                • Position and role analysis
                                                  • Example
                                                    • Exploratory edge set comparison
                                                      • Example
                                                        • Network inference and process models
                                                          • Example
                                                              • Closing comments

                        12 Social Network Analysis with sna

                        is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

                        While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

                        In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

                        To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

                        Rgt g lt- rgraph(10 tp = 15 9)

                        Rgt gin lt- egoextract(g neighborhood = in)

                        Rgt gout lt- egoextract(g neighborhood = out)

                        Rgt gcomb lt- egoextract(g neighborhood = combined)

                        Rgt gcomb[13]

                        $`1`[1] [2] [3] [4]

                        [1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

                        $`2`[1] [2] [3] [4]

                        [1] 0 1 0 0[2] 1 0 0 0

                        Journal of Statistical Software 13

                        [3] 1 0 0 0[4] 1 0 1 0

                        $`3`[1] [2] [3] [4]

                        [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                        Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                        [1] TRUE

                        Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                        [1] TRUE

                        Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                        [1] TRUE

                        Rgt egosize lt- sapply(gcomb NROW)

                        Rgt if(any(egosize gt 2))

                        + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                        1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                        8 9 10000000000 008333333 000000000

                        Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                        Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                        Rgt g lt- rgraph(6)

                        Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                        [1] TRUE

                        14 Social Network Analysis with sna

                        Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                        [1] TRUE

                        Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                        + cmode = freeman) 2)

                        [1] TRUE

                        Rgt gapply(g c(1 2) 16 mean)

                        [1] 400 300 300 550 325 325

                        Rgt gapply(g c(1 2) 16 mean distance = 2)

                        [1] 40 38 36 34 32 30

                        To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                        Rgt g lt- rgraph(10 tp = 29)

                        Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                        Rgt par(mfrow=c(33))

                        Rgt for(i in 19)

                        + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                        Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                        + partial = FALSE)

                        Rgt par(mfrow = c(3 3))

                        Rgt for(i in 19)

                        + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                        Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                        Visualization

                        Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                        Journal of Statistical Software 15

                        Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                        Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                        Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                        Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                        in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                        and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                        While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                        Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                        16 Social Network Analysis with sna

                        Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                        Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                        Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                        Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                        elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                        All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                        gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                        Rgt g lt- rgraph(5 diag = TRUE)

                        Journal of Statistical Software 17

                        Default Curved Edges MDS Layout

                        Circular Layout Sociomatrix

                        1

                        2

                        3

                        4

                        5

                        1 2 3 4 5

                        1

                        2

                        3

                        4

                        5

                        Multiple Options

                        1

                        2

                        3

                        4

                        5

                        Figure 3 Sample visualizations using gplot with multiple layout and display options

                        Rgt par(mfrow = c(2 3))

                        Rgt gplot(g main = Default)

                        Rgt gplot(g usecurv = TRUE main = Curved Edges)

                        Rgt gplot(g mode = mds main = MDS Layout)

                        Rgt gplot(g mode = circle main = Circular Layout)

                        Rgt plotsociomatrix(g main = Sociomatrix)

                        Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                        + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                        + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                        Output from the above is shown in Figure 3

                        Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                        Rgt gplot3d(rgws(1 5 3 1 0))

                        Rgt gplot3d(rgws(1 5 3 1 005))

                        18 Social Network Analysis with sna

                        Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                        Rgt gplot3d(rgws(1 5 3 1 02))

                        Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                        As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                        Rgt par(mfrow = c(1 3))

                        Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                        + xlab = ylab = main = gplotvertex Example)

                        Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                        + col = 110 sides = 312 radius = 01)

                        Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                        Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                        Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                        + xlab = ylab = main = gplotloop Example)

                        Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                        + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                        + arrowhead = TRUE)

                        Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                        + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                        The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                        23 Descriptive indices

                        The literature of social network analysis is rich with descriptive indices of various sorts

                        gplot3d1gif
                        Media File (imagegif)
                        gplot3d2gif
                        Media File (imagegif)
                        gplot3d3gif
                        Media File (imagegif)

                        Journal of Statistical Software 19

                        minus15 minus10 minus05 00 05 10 15

                        minus15

                        minus10

                        minus05

                        00

                        05

                        10

                        15

                        gplotvertex Example

                        10 12 14 16 18 20

                        10

                        12

                        14

                        16

                        18

                        20

                        gplotarrow Example

                        minus2 minus1 0 1 2

                        minus2minus1

                        01

                        2

                        gplotloop Example

                        Figure 5 Examples of the use of gplot supplemental functions

                        all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                        Node-level indices

                        Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                        sum(vprimevprimeprime)subV v

                        gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                        G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                        equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                        vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                        20 Social Network Analysis with sna

                        closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                        Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                        An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                        Journal of Statistical Software 21

                        the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                        To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                        Rgt dat lt- rgraph(10)

                        Rgt degree(dat cmode = indegree)

                        [1] 4 4 8 2 4 5 4 4 3 6

                        Rgt degree(dat cmode = outdegree)

                        [1] 6 3 5 2 5 4 4 4 5 6

                        Rgt degree(dat)

                        [1] 10 7 13 4 9 9 8 8 8 12

                        Rgt closeness(dat)

                        [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                        Rgt betweenness(dat)

                        [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                        Rgt stresscent(dat)

                        [1] 21 6 27 1 14 15 6 7 7 21

                        Rgt graphcent(dat)

                        [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                        Rgt evcent(dat)

                        [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                        22 Social Network Analysis with sna

                        Rgt infocent(dat)

                        [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                        As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                        Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                        [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                        Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                        + evcent(dat rescale = TRUE)) lt 1e-10)

                        [1] TRUE

                        Rgt bonpow(dat exponent = -05)

                        [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                        As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                        Rgt memb lt- sample(13 10 replace = TRUE)

                        Rgt summary(brokerage(dat memb))

                        Gould-Fernandez Brokerage Analysis

                        Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                        w_I 50000 58638 27314 -03162 07518

                        Journal of Statistical Software 23

                        w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                        Individual Properties (by Group)

                        Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                        [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                        b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                        Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                        [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                        t[1] -07838541[2] 14877951

                        Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                        [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                        b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                        Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                        24 Social Network Analysis with sna

                        for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                        Graph-level indices

                        Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                        C(G) =|V |sumi=1

                        [(maxvisinV

                        c (vG))minus c (vi G)

                        ] (1)

                        ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                        C(G) = |V | [clowast(G)minus c(G)] (2)

                        where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                        i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                        2For instance when all vertices are automorphically equivalent

                        Journal of Statistical Software 25

                        centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                        although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                        In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                        The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                        Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                        Rgt gden(g)

                        [1] 006666667 031111111 054444444 072222222 093333333

                        Rgt grecip(g)

                        [1] 08666667 03777778 04888889 06666667 08666667

                        Rgt grecip(g measure = edgewise)

                        [1] 00000000 00000000 05306122 07692308 09285714

                        Rgt grecip(g) == 1 - hierarchy(g)

                        [1] TRUE TRUE TRUE TRUE TRUE

                        Rgt gtrans(g)

                        [1] 10000000 02957746 05047619 06809651 09326923

                        Rgt gtrans(g measure = weakcensus)

                        3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                        26 Social Network Analysis with sna

                        [1] 0 21 106 254 582

                        Rgt connectedness(g)

                        [1] 04666667 10000000 10000000 10000000 10000000

                        Rgt efficiency(g)

                        [1] 100000000 076543210 050617284 030864198 007407407

                        Rgt hierarchy(g measure = krackhardt)

                        [1] 10 02 00 00 00

                        Rgt lubness(g)

                        [1] 02 10 10 10 10

                        centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                        Rgt centralization(g degree cmode = outdegree)

                        [1] 01728395

                        Rgt centralization(g betweenness)

                        [1] 0

                        Rgt apply(g 1 centralization degree cmode = outdegree)

                        [1] 017283951 027160494 038271605 006172840 007407407

                        Rgt apply(g 1 centralization betweenness)

                        [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                        As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                        Journal of Statistical Software 27

                        Rgt o2scent lt- function(dat tmaxdev = FALSE )

                        + n lt- NROW(dat)

                        + if(tmaxdev)

                        + return((n-1) choose(n-1 2))

                        + odeg lt- degree(dat cmode = outdegree)

                        + choose(odeg 2)

                        +

                        Rgt apply(g 1 centralization o2scent)

                        [1] 002160494 020370370 054012346 008950617 014506173

                        Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                        24 Connectivity and subgraph statistics

                        Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                        sumNj=1

                        sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                        is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                        At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                        28 Social Network Analysis with sna

                        subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                        Example

                        To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                        Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                        Rgt apply(dyadcensus(g1) 2 mean)

                        Mut Asym Null100 1284 3116

                        Rgt apply(triadcensus(g1) 2 mean)

                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                        Journal of Statistical Software 29

                        120C 210 300030 000 000

                        Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                        Rgt apply(dyadcensus(g2) 2 mean)

                        Mut Asym Null884 926 2690

                        Rgt apply(triadcensus(g2) 2 mean)

                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                        Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                        Rgt apply(dyadcensus(g3) 2 mean)

                        Mut Asym Null894 2044 1562

                        Rgt apply(triadcensus(g3) 2 mean)

                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                        Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                        + dyadictabulation = bylength)$pathcount

                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                        Rgt kcyclecensus(g3[1] maxlen = 5

                        + cyclecomembership = bylength)$cyclecount

                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                        30 Social Network Analysis with sna

                        Rgt componentdist(g3[1])

                        $membership[1] 1 1 1 1 1 1 1 1 1 1

                        $csize[1] 10

                        $cdist[1] 0 0 0 0 0 0 0 0 0 1

                        Rgt structurestatistics(g3[1])

                        0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                        In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                        Rgt g4 lt- g1[12]

                        Rgt g4[2] lt- g2[1]

                        Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                        + g1 = 1 g2 = 2)

                        Rgt summary(cug)

                        CUG Test Results

                        Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                        Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                        Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                        Rgt summary(cug)

                        Journal of Statistical Software 31

                        CUG Test Results

                        Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                        Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                        A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                        25 Position and role analysis

                        The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                        In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                        32 Social Network Analysis with sna

                        This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                        After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                        The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                        Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                        Example

                        To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                        Journal of Statistical Software 33

                        with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                        Rgt gp lt- sapply(runif(20 0 1) rep 20)

                        Rgt g lt- rgraph(20 tprob = gp)

                        Rgt eq lt- equivclust(g)

                        Rgt b lt- blockmodel(g eq h = 15)

                        Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                        Rgt ge

                        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                        26 Exploratory edge set comparison

                        One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                        cov(GH) =

                        sum(ij)

                        (AG

                        ij minus microG

                        )(AH

                        ij minus microH

                        )|V | (|V | minus 1)

                        (3)

                        34 Social Network Analysis with sna

                        where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                        (ij)AXij is the graph mean The graph variance is then cov(GG)

                        and the graph correlation ρ(GH) = cov(GH)radic

                        cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                        The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                        Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                        In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                        Journal of Statistical Software 35

                        Example

                        We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                        Rgt g1 lt- rgraph(5)

                        Rgt g2 lt -rgraph(5)

                        Rgt g3 lt- rmperm(g2)

                        Rgt gcor(g1 g2)

                        [1] -01336306

                        Rgt gcor(g1 g3)

                        [1] 008908708

                        Rgt gcor(g2 g3)

                        [1] -04583333

                        Rgt gscor(g1 g2 reps = 1e5)

                        [1] 05345225

                        Rgt gscor(g1 g3 reps = 1e5)

                        [1] 05345225

                        Rgt gscor(g2 g3 reps = 1e5)

                        [1] 1

                        Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                        Rgt x lt- rgraph(20 4)

                        Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                        Rgt nl lt- netlm(y x)

                        Rgt summary(nl)

                        36 Social Network Analysis with sna

                        OLS Network Model

                        Residuals0 25 50 75 100

                        -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                        CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                        (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                        Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                        Test Diagnostics

                        Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                        (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                        As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                        Rgt x lt- rgraph(20 4)

                        Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                        Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                        Rgt y lt- rgraph(20 tprob = yp)

                        Rgt nl lt- netlogit(y x)

                        Rgt summary(nl)

                        Network Logit Model

                        Coefficients

                        Journal of Statistical Software 37

                        Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                        Goodness of Fit Statistics

                        Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                        3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                        (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                        Contingency Table (predicted (rows) x actual (cols))

                        0 10 0 01 39 341

                        Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                        Test Diagnostics

                        Null Hypothesis qapReplications 1000Distribution Summary

                        (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                        It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                        38 Social Network Analysis with sna

                        parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                        27 Network inference and process models

                        A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                        Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                        Journal of Statistical Software 39

                        of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                        )prodk

                        (1minusPr(Bk)

                        )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                        While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                        y =

                        (wsum

                        i=1

                        θiWi

                        )y + Xβ + ε (4)

                        ε =

                        (zsum

                        i=1

                        ψiZi

                        )ε+ ν (5)

                        where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                        40 Social Network Analysis with sna

                        Example

                        To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                        Rgt g lt- rgraph(20)

                        Rgt ep lt- rbeta(20 1 25)

                        Rgt em lt- rbeta(20 15 25)

                        Rgt dat lt- array(dim = c(20 20 20))

                        Rgt for(i in 120)

                        + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                        Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                        Rgt pem lt- matrix(nrow = 20 ncol = 2)

                        Rgt pem[1] lt- 2

                        Rgt pem[2] lt- 11

                        Rgt pep lt- matrix(nrow = 20 ncol = 2)

                        Rgt pep[1] lt- 2

                        Rgt pep[2] lt- 11

                        Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                        + epprior = pep burntime = 300 draws = 100)

                        Rgt summary(b)

                        Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                        Multiple Error Probability Model

                        Marginal Posterior Network Distribution

                        a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                        Journal of Statistical Software 41

                        a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                        a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                        Marginal Posterior Global Error Distribution

                        e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                        Marginal Posterior Error Distribution (by observer)

                        Probability of False Negatives (e^-)

                        Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                        42 Social Network Analysis with sna

                        o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                        Probability of False Positives (e^+)

                        Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                        MCMC Diagnostics

                        Replicate Chains 5Burn Time 300

                        Journal of Statistical Software 43

                        Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                        Max 1003116Med 09992194IQR 00004545115

                        Rgt cor(em apply(b$em 2 median))

                        [1] 09187894

                        Rgt cor(ep apply(b$ep 2 median))

                        [1] 0971649

                        Rgt mean(apply(b$net c(2 3) median) == g)

                        [1] 1

                        Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                        Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                        Rgt mean(consensus(dat method = LASintersection) == g)

                        [1] 07725

                        Rgt mean(consensus(dat method = LASunion) == g)

                        [1] 0905

                        Rgt mean(consensus(dat method = centralgraph) == g)

                        [1] 09575

                        Rgt mean(consensus(dat method = romneybatchelder) == g)

                        44 Social Network Analysis with sna

                        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                        Rgt w1 lt- rgraph(50)

                        Rgt w2 lt- rgraph(50)

                        Rgt x lt- matrix(rnorm(50 5) 50 5)

                        Rgt r1 lt- 02

                        Rgt r2 lt- 03

                        Rgt sigma lt- 01

                        Rgt beta lt- rnorm(5)

                        Rgt nu lt- rnorm(50 0 sigma)

                        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                        Rgt fit lt- lnam(y x w1 w2)

                        Rgt summary(fit)

                        Calllnam(y = y x = x W1 = w1 W2 = w2)

                        ResidualsMin 1Q Median 3Q Max

                        -052052 -018305 001156 015557 062082

                        CoefficientsEstimate Std Error Z value Pr(gt|z|)

                        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                        Journal of Statistical Software 45

                        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                        Estimate Std ErrorSigma 009597 922e-05

                        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                        3 Closing comments

                        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                        Acknowledgments

                        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                        46 Social Network Analysis with sna

                        minus3 minus2 minus1 0 1 2

                        minus3minus2

                        minus10

                        12

                        Fitted vs Observed Values

                        y

                        y

                        minus3 minus2 minus1 0 1 2

                        minus02

                        minus01

                        00

                        01

                        02

                        Fitted Values vs Estimated Disturbances

                        y

                        ν

                        minus2 minus1 0 1 2

                        minus04

                        minus02

                        00

                        02

                        04

                        06

                        Normal QminusQ Residual Plot

                        Theoretical Quantiles

                        Sam

                        ple

                        Qua

                        ntile

                        s

                        Net Influence Plot

                        Figure 6 Plot method output for lnam

                        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                        References

                        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                        Journal of Statistical Software 47

                        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                        48 Social Network Analysis with sna

                        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                        Journal of Statistical Software 49

                        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                        50 Social Network Analysis with sna

                        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                        Journal of Statistical Software 51

                        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                        Affiliation

                        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                        • Introduction and overview
                          • Package history
                          • sna and statnet
                          • Functionality
                          • Terminology and data representation
                            • Importing relational data into R
                                • Package highlights
                                  • Random graph generation
                                    • Example
                                      • Visualization and data manipulation
                                        • Neighborhood and ego net functions
                                        • Visualization
                                          • Descriptive indices
                                            • Node-level indices
                                            • Graph-level indices
                                              • Connectivity and subgraph statistics
                                                • Example
                                                  • Position and role analysis
                                                    • Example
                                                      • Exploratory edge set comparison
                                                        • Example
                                                          • Network inference and process models
                                                            • Example
                                                                • Closing comments

                          Journal of Statistical Software 13

                          [3] 1 0 0 0[4] 1 0 1 0

                          $`3`[1] [2] [3] [4]

                          [1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

                          Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

                          [1] TRUE

                          Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

                          [1] TRUE

                          Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

                          [1] TRUE

                          Rgt egosize lt- sapply(gcomb NROW)

                          Rgt if(any(egosize gt 2))

                          + sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

                          1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

                          8 9 10000000000 008333333 000000000

                          Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

                          Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

                          Rgt g lt- rgraph(6)

                          Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

                          [1] TRUE

                          14 Social Network Analysis with sna

                          Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                          [1] TRUE

                          Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                          + cmode = freeman) 2)

                          [1] TRUE

                          Rgt gapply(g c(1 2) 16 mean)

                          [1] 400 300 300 550 325 325

                          Rgt gapply(g c(1 2) 16 mean distance = 2)

                          [1] 40 38 36 34 32 30

                          To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                          Rgt g lt- rgraph(10 tp = 29)

                          Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                          Rgt par(mfrow=c(33))

                          Rgt for(i in 19)

                          + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                          Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                          + partial = FALSE)

                          Rgt par(mfrow = c(3 3))

                          Rgt for(i in 19)

                          + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                          Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                          Visualization

                          Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                          Journal of Statistical Software 15

                          Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                          Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                          Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                          Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                          in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                          and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                          While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                          Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                          16 Social Network Analysis with sna

                          Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                          Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                          Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                          Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                          elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                          All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                          gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                          Rgt g lt- rgraph(5 diag = TRUE)

                          Journal of Statistical Software 17

                          Default Curved Edges MDS Layout

                          Circular Layout Sociomatrix

                          1

                          2

                          3

                          4

                          5

                          1 2 3 4 5

                          1

                          2

                          3

                          4

                          5

                          Multiple Options

                          1

                          2

                          3

                          4

                          5

                          Figure 3 Sample visualizations using gplot with multiple layout and display options

                          Rgt par(mfrow = c(2 3))

                          Rgt gplot(g main = Default)

                          Rgt gplot(g usecurv = TRUE main = Curved Edges)

                          Rgt gplot(g mode = mds main = MDS Layout)

                          Rgt gplot(g mode = circle main = Circular Layout)

                          Rgt plotsociomatrix(g main = Sociomatrix)

                          Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                          + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                          + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                          Output from the above is shown in Figure 3

                          Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                          Rgt gplot3d(rgws(1 5 3 1 0))

                          Rgt gplot3d(rgws(1 5 3 1 005))

                          18 Social Network Analysis with sna

                          Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                          Rgt gplot3d(rgws(1 5 3 1 02))

                          Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                          As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                          Rgt par(mfrow = c(1 3))

                          Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                          + xlab = ylab = main = gplotvertex Example)

                          Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                          + col = 110 sides = 312 radius = 01)

                          Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                          Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                          Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                          + xlab = ylab = main = gplotloop Example)

                          Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                          + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                          + arrowhead = TRUE)

                          Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                          + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                          The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                          23 Descriptive indices

                          The literature of social network analysis is rich with descriptive indices of various sorts

                          gplot3d1gif
                          Media File (imagegif)
                          gplot3d2gif
                          Media File (imagegif)
                          gplot3d3gif
                          Media File (imagegif)

                          Journal of Statistical Software 19

                          minus15 minus10 minus05 00 05 10 15

                          minus15

                          minus10

                          minus05

                          00

                          05

                          10

                          15

                          gplotvertex Example

                          10 12 14 16 18 20

                          10

                          12

                          14

                          16

                          18

                          20

                          gplotarrow Example

                          minus2 minus1 0 1 2

                          minus2minus1

                          01

                          2

                          gplotloop Example

                          Figure 5 Examples of the use of gplot supplemental functions

                          all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                          Node-level indices

                          Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                          sum(vprimevprimeprime)subV v

                          gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                          G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                          equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                          vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                          20 Social Network Analysis with sna

                          closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                          Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                          An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                          Journal of Statistical Software 21

                          the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                          To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                          Rgt dat lt- rgraph(10)

                          Rgt degree(dat cmode = indegree)

                          [1] 4 4 8 2 4 5 4 4 3 6

                          Rgt degree(dat cmode = outdegree)

                          [1] 6 3 5 2 5 4 4 4 5 6

                          Rgt degree(dat)

                          [1] 10 7 13 4 9 9 8 8 8 12

                          Rgt closeness(dat)

                          [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                          Rgt betweenness(dat)

                          [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                          Rgt stresscent(dat)

                          [1] 21 6 27 1 14 15 6 7 7 21

                          Rgt graphcent(dat)

                          [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                          Rgt evcent(dat)

                          [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                          22 Social Network Analysis with sna

                          Rgt infocent(dat)

                          [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                          As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                          Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                          [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                          Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                          + evcent(dat rescale = TRUE)) lt 1e-10)

                          [1] TRUE

                          Rgt bonpow(dat exponent = -05)

                          [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                          As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                          Rgt memb lt- sample(13 10 replace = TRUE)

                          Rgt summary(brokerage(dat memb))

                          Gould-Fernandez Brokerage Analysis

                          Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                          w_I 50000 58638 27314 -03162 07518

                          Journal of Statistical Software 23

                          w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                          Individual Properties (by Group)

                          Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                          [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                          b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                          Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                          [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                          t[1] -07838541[2] 14877951

                          Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                          [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                          b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                          Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                          24 Social Network Analysis with sna

                          for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                          Graph-level indices

                          Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                          C(G) =|V |sumi=1

                          [(maxvisinV

                          c (vG))minus c (vi G)

                          ] (1)

                          ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                          C(G) = |V | [clowast(G)minus c(G)] (2)

                          where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                          i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                          2For instance when all vertices are automorphically equivalent

                          Journal of Statistical Software 25

                          centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                          although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                          In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                          The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                          Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                          Rgt gden(g)

                          [1] 006666667 031111111 054444444 072222222 093333333

                          Rgt grecip(g)

                          [1] 08666667 03777778 04888889 06666667 08666667

                          Rgt grecip(g measure = edgewise)

                          [1] 00000000 00000000 05306122 07692308 09285714

                          Rgt grecip(g) == 1 - hierarchy(g)

                          [1] TRUE TRUE TRUE TRUE TRUE

                          Rgt gtrans(g)

                          [1] 10000000 02957746 05047619 06809651 09326923

                          Rgt gtrans(g measure = weakcensus)

                          3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                          26 Social Network Analysis with sna

                          [1] 0 21 106 254 582

                          Rgt connectedness(g)

                          [1] 04666667 10000000 10000000 10000000 10000000

                          Rgt efficiency(g)

                          [1] 100000000 076543210 050617284 030864198 007407407

                          Rgt hierarchy(g measure = krackhardt)

                          [1] 10 02 00 00 00

                          Rgt lubness(g)

                          [1] 02 10 10 10 10

                          centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                          Rgt centralization(g degree cmode = outdegree)

                          [1] 01728395

                          Rgt centralization(g betweenness)

                          [1] 0

                          Rgt apply(g 1 centralization degree cmode = outdegree)

                          [1] 017283951 027160494 038271605 006172840 007407407

                          Rgt apply(g 1 centralization betweenness)

                          [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                          As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                          Journal of Statistical Software 27

                          Rgt o2scent lt- function(dat tmaxdev = FALSE )

                          + n lt- NROW(dat)

                          + if(tmaxdev)

                          + return((n-1) choose(n-1 2))

                          + odeg lt- degree(dat cmode = outdegree)

                          + choose(odeg 2)

                          +

                          Rgt apply(g 1 centralization o2scent)

                          [1] 002160494 020370370 054012346 008950617 014506173

                          Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                          24 Connectivity and subgraph statistics

                          Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                          sumNj=1

                          sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                          is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                          At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                          28 Social Network Analysis with sna

                          subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                          Example

                          To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                          Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                          Rgt apply(dyadcensus(g1) 2 mean)

                          Mut Asym Null100 1284 3116

                          Rgt apply(triadcensus(g1) 2 mean)

                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                          Journal of Statistical Software 29

                          120C 210 300030 000 000

                          Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                          Rgt apply(dyadcensus(g2) 2 mean)

                          Mut Asym Null884 926 2690

                          Rgt apply(triadcensus(g2) 2 mean)

                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                          Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                          Rgt apply(dyadcensus(g3) 2 mean)

                          Mut Asym Null894 2044 1562

                          Rgt apply(triadcensus(g3) 2 mean)

                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                          Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                          + dyadictabulation = bylength)$pathcount

                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                          Rgt kcyclecensus(g3[1] maxlen = 5

                          + cyclecomembership = bylength)$cyclecount

                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                          30 Social Network Analysis with sna

                          Rgt componentdist(g3[1])

                          $membership[1] 1 1 1 1 1 1 1 1 1 1

                          $csize[1] 10

                          $cdist[1] 0 0 0 0 0 0 0 0 0 1

                          Rgt structurestatistics(g3[1])

                          0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                          In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                          Rgt g4 lt- g1[12]

                          Rgt g4[2] lt- g2[1]

                          Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                          + g1 = 1 g2 = 2)

                          Rgt summary(cug)

                          CUG Test Results

                          Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                          Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                          Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                          Rgt summary(cug)

                          Journal of Statistical Software 31

                          CUG Test Results

                          Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                          Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                          A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                          25 Position and role analysis

                          The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                          In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                          32 Social Network Analysis with sna

                          This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                          After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                          The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                          Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                          Example

                          To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                          Journal of Statistical Software 33

                          with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                          Rgt gp lt- sapply(runif(20 0 1) rep 20)

                          Rgt g lt- rgraph(20 tprob = gp)

                          Rgt eq lt- equivclust(g)

                          Rgt b lt- blockmodel(g eq h = 15)

                          Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                          Rgt ge

                          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                          26 Exploratory edge set comparison

                          One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                          cov(GH) =

                          sum(ij)

                          (AG

                          ij minus microG

                          )(AH

                          ij minus microH

                          )|V | (|V | minus 1)

                          (3)

                          34 Social Network Analysis with sna

                          where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                          (ij)AXij is the graph mean The graph variance is then cov(GG)

                          and the graph correlation ρ(GH) = cov(GH)radic

                          cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                          The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                          Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                          In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                          Journal of Statistical Software 35

                          Example

                          We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                          Rgt g1 lt- rgraph(5)

                          Rgt g2 lt -rgraph(5)

                          Rgt g3 lt- rmperm(g2)

                          Rgt gcor(g1 g2)

                          [1] -01336306

                          Rgt gcor(g1 g3)

                          [1] 008908708

                          Rgt gcor(g2 g3)

                          [1] -04583333

                          Rgt gscor(g1 g2 reps = 1e5)

                          [1] 05345225

                          Rgt gscor(g1 g3 reps = 1e5)

                          [1] 05345225

                          Rgt gscor(g2 g3 reps = 1e5)

                          [1] 1

                          Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                          Rgt x lt- rgraph(20 4)

                          Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                          Rgt nl lt- netlm(y x)

                          Rgt summary(nl)

                          36 Social Network Analysis with sna

                          OLS Network Model

                          Residuals0 25 50 75 100

                          -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                          CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                          (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                          Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                          Test Diagnostics

                          Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                          (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                          As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                          Rgt x lt- rgraph(20 4)

                          Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                          Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                          Rgt y lt- rgraph(20 tprob = yp)

                          Rgt nl lt- netlogit(y x)

                          Rgt summary(nl)

                          Network Logit Model

                          Coefficients

                          Journal of Statistical Software 37

                          Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                          Goodness of Fit Statistics

                          Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                          3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                          (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                          Contingency Table (predicted (rows) x actual (cols))

                          0 10 0 01 39 341

                          Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                          Test Diagnostics

                          Null Hypothesis qapReplications 1000Distribution Summary

                          (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                          It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                          38 Social Network Analysis with sna

                          parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                          27 Network inference and process models

                          A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                          Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                          Journal of Statistical Software 39

                          of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                          )prodk

                          (1minusPr(Bk)

                          )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                          While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                          y =

                          (wsum

                          i=1

                          θiWi

                          )y + Xβ + ε (4)

                          ε =

                          (zsum

                          i=1

                          ψiZi

                          )ε+ ν (5)

                          where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                          40 Social Network Analysis with sna

                          Example

                          To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                          Rgt g lt- rgraph(20)

                          Rgt ep lt- rbeta(20 1 25)

                          Rgt em lt- rbeta(20 15 25)

                          Rgt dat lt- array(dim = c(20 20 20))

                          Rgt for(i in 120)

                          + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                          Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                          Rgt pem lt- matrix(nrow = 20 ncol = 2)

                          Rgt pem[1] lt- 2

                          Rgt pem[2] lt- 11

                          Rgt pep lt- matrix(nrow = 20 ncol = 2)

                          Rgt pep[1] lt- 2

                          Rgt pep[2] lt- 11

                          Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                          + epprior = pep burntime = 300 draws = 100)

                          Rgt summary(b)

                          Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                          Multiple Error Probability Model

                          Marginal Posterior Network Distribution

                          a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                          Journal of Statistical Software 41

                          a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                          a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                          Marginal Posterior Global Error Distribution

                          e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                          Marginal Posterior Error Distribution (by observer)

                          Probability of False Negatives (e^-)

                          Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                          42 Social Network Analysis with sna

                          o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                          Probability of False Positives (e^+)

                          Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                          MCMC Diagnostics

                          Replicate Chains 5Burn Time 300

                          Journal of Statistical Software 43

                          Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                          Max 1003116Med 09992194IQR 00004545115

                          Rgt cor(em apply(b$em 2 median))

                          [1] 09187894

                          Rgt cor(ep apply(b$ep 2 median))

                          [1] 0971649

                          Rgt mean(apply(b$net c(2 3) median) == g)

                          [1] 1

                          Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                          Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                          Rgt mean(consensus(dat method = LASintersection) == g)

                          [1] 07725

                          Rgt mean(consensus(dat method = LASunion) == g)

                          [1] 0905

                          Rgt mean(consensus(dat method = centralgraph) == g)

                          [1] 09575

                          Rgt mean(consensus(dat method = romneybatchelder) == g)

                          44 Social Network Analysis with sna

                          Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                          For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                          As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                          Rgt w1 lt- rgraph(50)

                          Rgt w2 lt- rgraph(50)

                          Rgt x lt- matrix(rnorm(50 5) 50 5)

                          Rgt r1 lt- 02

                          Rgt r2 lt- 03

                          Rgt sigma lt- 01

                          Rgt beta lt- rnorm(5)

                          Rgt nu lt- rnorm(50 0 sigma)

                          Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                          Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                          Rgt fit lt- lnam(y x w1 w2)

                          Rgt summary(fit)

                          Calllnam(y = y x = x W1 = w1 W2 = w2)

                          ResidualsMin 1Q Median 3Q Max

                          -052052 -018305 001156 015557 062082

                          CoefficientsEstimate Std Error Z value Pr(gt|z|)

                          X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                          Journal of Statistical Software 45

                          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                          Estimate Std ErrorSigma 009597 922e-05

                          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                          3 Closing comments

                          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                          Acknowledgments

                          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                          46 Social Network Analysis with sna

                          minus3 minus2 minus1 0 1 2

                          minus3minus2

                          minus10

                          12

                          Fitted vs Observed Values

                          y

                          y

                          minus3 minus2 minus1 0 1 2

                          minus02

                          minus01

                          00

                          01

                          02

                          Fitted Values vs Estimated Disturbances

                          y

                          ν

                          minus2 minus1 0 1 2

                          minus04

                          minus02

                          00

                          02

                          04

                          06

                          Normal QminusQ Residual Plot

                          Theoretical Quantiles

                          Sam

                          ple

                          Qua

                          ntile

                          s

                          Net Influence Plot

                          Figure 6 Plot method output for lnam

                          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                          References

                          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                          Journal of Statistical Software 47

                          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                          48 Social Network Analysis with sna

                          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                          Journal of Statistical Software 49

                          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                          50 Social Network Analysis with sna

                          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                          Journal of Statistical Software 51

                          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                          Affiliation

                          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                          • Introduction and overview
                            • Package history
                            • sna and statnet
                            • Functionality
                            • Terminology and data representation
                              • Importing relational data into R
                                  • Package highlights
                                    • Random graph generation
                                      • Example
                                        • Visualization and data manipulation
                                          • Neighborhood and ego net functions
                                          • Visualization
                                            • Descriptive indices
                                              • Node-level indices
                                              • Graph-level indices
                                                • Connectivity and subgraph statistics
                                                  • Example
                                                    • Position and role analysis
                                                      • Example
                                                        • Exploratory edge set comparison
                                                          • Example
                                                            • Network inference and process models
                                                              • Example
                                                                  • Closing comments

                            14 Social Network Analysis with sna

                            Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

                            [1] TRUE

                            Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

                            + cmode = freeman) 2)

                            [1] TRUE

                            Rgt gapply(g c(1 2) 16 mean)

                            [1] 400 300 300 550 325 325

                            Rgt gapply(g c(1 2) 16 mean distance = 2)

                            [1] 40 38 36 34 32 30

                            To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

                            Rgt g lt- rgraph(10 tp = 29)

                            Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

                            Rgt par(mfrow=c(33))

                            Rgt for(i in 19)

                            + gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

                            Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

                            + partial = FALSE)

                            Rgt par(mfrow = c(3 3))

                            Rgt for(i in 19)

                            + gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

                            Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

                            Visualization

                            Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

                            Journal of Statistical Software 15

                            Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                            Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                            Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                            Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                            in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                            and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                            While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                            Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                            16 Social Network Analysis with sna

                            Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                            Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                            Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                            Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                            elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                            All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                            gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                            Rgt g lt- rgraph(5 diag = TRUE)

                            Journal of Statistical Software 17

                            Default Curved Edges MDS Layout

                            Circular Layout Sociomatrix

                            1

                            2

                            3

                            4

                            5

                            1 2 3 4 5

                            1

                            2

                            3

                            4

                            5

                            Multiple Options

                            1

                            2

                            3

                            4

                            5

                            Figure 3 Sample visualizations using gplot with multiple layout and display options

                            Rgt par(mfrow = c(2 3))

                            Rgt gplot(g main = Default)

                            Rgt gplot(g usecurv = TRUE main = Curved Edges)

                            Rgt gplot(g mode = mds main = MDS Layout)

                            Rgt gplot(g mode = circle main = Circular Layout)

                            Rgt plotsociomatrix(g main = Sociomatrix)

                            Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                            + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                            + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                            Output from the above is shown in Figure 3

                            Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                            Rgt gplot3d(rgws(1 5 3 1 0))

                            Rgt gplot3d(rgws(1 5 3 1 005))

                            18 Social Network Analysis with sna

                            Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                            Rgt gplot3d(rgws(1 5 3 1 02))

                            Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                            As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                            Rgt par(mfrow = c(1 3))

                            Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                            + xlab = ylab = main = gplotvertex Example)

                            Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                            + col = 110 sides = 312 radius = 01)

                            Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                            Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                            Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                            + xlab = ylab = main = gplotloop Example)

                            Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                            + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                            + arrowhead = TRUE)

                            Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                            + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                            The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                            23 Descriptive indices

                            The literature of social network analysis is rich with descriptive indices of various sorts

                            gplot3d1gif
                            Media File (imagegif)
                            gplot3d2gif
                            Media File (imagegif)
                            gplot3d3gif
                            Media File (imagegif)

                            Journal of Statistical Software 19

                            minus15 minus10 minus05 00 05 10 15

                            minus15

                            minus10

                            minus05

                            00

                            05

                            10

                            15

                            gplotvertex Example

                            10 12 14 16 18 20

                            10

                            12

                            14

                            16

                            18

                            20

                            gplotarrow Example

                            minus2 minus1 0 1 2

                            minus2minus1

                            01

                            2

                            gplotloop Example

                            Figure 5 Examples of the use of gplot supplemental functions

                            all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                            Node-level indices

                            Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                            sum(vprimevprimeprime)subV v

                            gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                            G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                            equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                            vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                            20 Social Network Analysis with sna

                            closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                            Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                            An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                            Journal of Statistical Software 21

                            the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                            To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                            Rgt dat lt- rgraph(10)

                            Rgt degree(dat cmode = indegree)

                            [1] 4 4 8 2 4 5 4 4 3 6

                            Rgt degree(dat cmode = outdegree)

                            [1] 6 3 5 2 5 4 4 4 5 6

                            Rgt degree(dat)

                            [1] 10 7 13 4 9 9 8 8 8 12

                            Rgt closeness(dat)

                            [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                            Rgt betweenness(dat)

                            [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                            Rgt stresscent(dat)

                            [1] 21 6 27 1 14 15 6 7 7 21

                            Rgt graphcent(dat)

                            [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                            Rgt evcent(dat)

                            [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                            22 Social Network Analysis with sna

                            Rgt infocent(dat)

                            [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                            As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                            Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                            [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                            Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                            + evcent(dat rescale = TRUE)) lt 1e-10)

                            [1] TRUE

                            Rgt bonpow(dat exponent = -05)

                            [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                            As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                            Rgt memb lt- sample(13 10 replace = TRUE)

                            Rgt summary(brokerage(dat memb))

                            Gould-Fernandez Brokerage Analysis

                            Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                            w_I 50000 58638 27314 -03162 07518

                            Journal of Statistical Software 23

                            w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                            Individual Properties (by Group)

                            Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                            [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                            b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                            Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                            [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                            t[1] -07838541[2] 14877951

                            Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                            [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                            b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                            Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                            24 Social Network Analysis with sna

                            for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                            Graph-level indices

                            Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                            C(G) =|V |sumi=1

                            [(maxvisinV

                            c (vG))minus c (vi G)

                            ] (1)

                            ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                            C(G) = |V | [clowast(G)minus c(G)] (2)

                            where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                            i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                            2For instance when all vertices are automorphically equivalent

                            Journal of Statistical Software 25

                            centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                            although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                            In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                            The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                            Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                            Rgt gden(g)

                            [1] 006666667 031111111 054444444 072222222 093333333

                            Rgt grecip(g)

                            [1] 08666667 03777778 04888889 06666667 08666667

                            Rgt grecip(g measure = edgewise)

                            [1] 00000000 00000000 05306122 07692308 09285714

                            Rgt grecip(g) == 1 - hierarchy(g)

                            [1] TRUE TRUE TRUE TRUE TRUE

                            Rgt gtrans(g)

                            [1] 10000000 02957746 05047619 06809651 09326923

                            Rgt gtrans(g measure = weakcensus)

                            3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                            26 Social Network Analysis with sna

                            [1] 0 21 106 254 582

                            Rgt connectedness(g)

                            [1] 04666667 10000000 10000000 10000000 10000000

                            Rgt efficiency(g)

                            [1] 100000000 076543210 050617284 030864198 007407407

                            Rgt hierarchy(g measure = krackhardt)

                            [1] 10 02 00 00 00

                            Rgt lubness(g)

                            [1] 02 10 10 10 10

                            centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                            Rgt centralization(g degree cmode = outdegree)

                            [1] 01728395

                            Rgt centralization(g betweenness)

                            [1] 0

                            Rgt apply(g 1 centralization degree cmode = outdegree)

                            [1] 017283951 027160494 038271605 006172840 007407407

                            Rgt apply(g 1 centralization betweenness)

                            [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                            As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                            Journal of Statistical Software 27

                            Rgt o2scent lt- function(dat tmaxdev = FALSE )

                            + n lt- NROW(dat)

                            + if(tmaxdev)

                            + return((n-1) choose(n-1 2))

                            + odeg lt- degree(dat cmode = outdegree)

                            + choose(odeg 2)

                            +

                            Rgt apply(g 1 centralization o2scent)

                            [1] 002160494 020370370 054012346 008950617 014506173

                            Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                            24 Connectivity and subgraph statistics

                            Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                            sumNj=1

                            sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                            is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                            At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                            28 Social Network Analysis with sna

                            subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                            Example

                            To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                            Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                            Rgt apply(dyadcensus(g1) 2 mean)

                            Mut Asym Null100 1284 3116

                            Rgt apply(triadcensus(g1) 2 mean)

                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                            Journal of Statistical Software 29

                            120C 210 300030 000 000

                            Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                            Rgt apply(dyadcensus(g2) 2 mean)

                            Mut Asym Null884 926 2690

                            Rgt apply(triadcensus(g2) 2 mean)

                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                            Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                            Rgt apply(dyadcensus(g3) 2 mean)

                            Mut Asym Null894 2044 1562

                            Rgt apply(triadcensus(g3) 2 mean)

                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                            Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                            + dyadictabulation = bylength)$pathcount

                            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                            Rgt kcyclecensus(g3[1] maxlen = 5

                            + cyclecomembership = bylength)$cyclecount

                            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                            30 Social Network Analysis with sna

                            Rgt componentdist(g3[1])

                            $membership[1] 1 1 1 1 1 1 1 1 1 1

                            $csize[1] 10

                            $cdist[1] 0 0 0 0 0 0 0 0 0 1

                            Rgt structurestatistics(g3[1])

                            0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                            In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                            Rgt g4 lt- g1[12]

                            Rgt g4[2] lt- g2[1]

                            Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                            + g1 = 1 g2 = 2)

                            Rgt summary(cug)

                            CUG Test Results

                            Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                            Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                            Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                            Rgt summary(cug)

                            Journal of Statistical Software 31

                            CUG Test Results

                            Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                            Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                            A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                            25 Position and role analysis

                            The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                            In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                            32 Social Network Analysis with sna

                            This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                            After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                            The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                            Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                            Example

                            To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                            Journal of Statistical Software 33

                            with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                            Rgt gp lt- sapply(runif(20 0 1) rep 20)

                            Rgt g lt- rgraph(20 tprob = gp)

                            Rgt eq lt- equivclust(g)

                            Rgt b lt- blockmodel(g eq h = 15)

                            Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                            Rgt ge

                            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                            26 Exploratory edge set comparison

                            One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                            cov(GH) =

                            sum(ij)

                            (AG

                            ij minus microG

                            )(AH

                            ij minus microH

                            )|V | (|V | minus 1)

                            (3)

                            34 Social Network Analysis with sna

                            where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                            (ij)AXij is the graph mean The graph variance is then cov(GG)

                            and the graph correlation ρ(GH) = cov(GH)radic

                            cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                            The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                            Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                            In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                            Journal of Statistical Software 35

                            Example

                            We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                            Rgt g1 lt- rgraph(5)

                            Rgt g2 lt -rgraph(5)

                            Rgt g3 lt- rmperm(g2)

                            Rgt gcor(g1 g2)

                            [1] -01336306

                            Rgt gcor(g1 g3)

                            [1] 008908708

                            Rgt gcor(g2 g3)

                            [1] -04583333

                            Rgt gscor(g1 g2 reps = 1e5)

                            [1] 05345225

                            Rgt gscor(g1 g3 reps = 1e5)

                            [1] 05345225

                            Rgt gscor(g2 g3 reps = 1e5)

                            [1] 1

                            Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                            Rgt x lt- rgraph(20 4)

                            Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                            Rgt nl lt- netlm(y x)

                            Rgt summary(nl)

                            36 Social Network Analysis with sna

                            OLS Network Model

                            Residuals0 25 50 75 100

                            -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                            CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                            (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                            Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                            Test Diagnostics

                            Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                            (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                            As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                            Rgt x lt- rgraph(20 4)

                            Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                            Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                            Rgt y lt- rgraph(20 tprob = yp)

                            Rgt nl lt- netlogit(y x)

                            Rgt summary(nl)

                            Network Logit Model

                            Coefficients

                            Journal of Statistical Software 37

                            Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                            Goodness of Fit Statistics

                            Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                            3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                            (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                            Contingency Table (predicted (rows) x actual (cols))

                            0 10 0 01 39 341

                            Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                            Test Diagnostics

                            Null Hypothesis qapReplications 1000Distribution Summary

                            (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                            It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                            38 Social Network Analysis with sna

                            parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                            27 Network inference and process models

                            A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                            Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                            Journal of Statistical Software 39

                            of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                            )prodk

                            (1minusPr(Bk)

                            )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                            While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                            y =

                            (wsum

                            i=1

                            θiWi

                            )y + Xβ + ε (4)

                            ε =

                            (zsum

                            i=1

                            ψiZi

                            )ε+ ν (5)

                            where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                            40 Social Network Analysis with sna

                            Example

                            To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                            Rgt g lt- rgraph(20)

                            Rgt ep lt- rbeta(20 1 25)

                            Rgt em lt- rbeta(20 15 25)

                            Rgt dat lt- array(dim = c(20 20 20))

                            Rgt for(i in 120)

                            + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                            Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                            Rgt pem lt- matrix(nrow = 20 ncol = 2)

                            Rgt pem[1] lt- 2

                            Rgt pem[2] lt- 11

                            Rgt pep lt- matrix(nrow = 20 ncol = 2)

                            Rgt pep[1] lt- 2

                            Rgt pep[2] lt- 11

                            Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                            + epprior = pep burntime = 300 draws = 100)

                            Rgt summary(b)

                            Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                            Multiple Error Probability Model

                            Marginal Posterior Network Distribution

                            a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                            Journal of Statistical Software 41

                            a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                            a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                            Marginal Posterior Global Error Distribution

                            e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                            Marginal Posterior Error Distribution (by observer)

                            Probability of False Negatives (e^-)

                            Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                            42 Social Network Analysis with sna

                            o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                            Probability of False Positives (e^+)

                            Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                            MCMC Diagnostics

                            Replicate Chains 5Burn Time 300

                            Journal of Statistical Software 43

                            Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                            Max 1003116Med 09992194IQR 00004545115

                            Rgt cor(em apply(b$em 2 median))

                            [1] 09187894

                            Rgt cor(ep apply(b$ep 2 median))

                            [1] 0971649

                            Rgt mean(apply(b$net c(2 3) median) == g)

                            [1] 1

                            Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                            Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                            Rgt mean(consensus(dat method = LASintersection) == g)

                            [1] 07725

                            Rgt mean(consensus(dat method = LASunion) == g)

                            [1] 0905

                            Rgt mean(consensus(dat method = centralgraph) == g)

                            [1] 09575

                            Rgt mean(consensus(dat method = romneybatchelder) == g)

                            44 Social Network Analysis with sna

                            Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                            For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                            As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                            Rgt w1 lt- rgraph(50)

                            Rgt w2 lt- rgraph(50)

                            Rgt x lt- matrix(rnorm(50 5) 50 5)

                            Rgt r1 lt- 02

                            Rgt r2 lt- 03

                            Rgt sigma lt- 01

                            Rgt beta lt- rnorm(5)

                            Rgt nu lt- rnorm(50 0 sigma)

                            Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                            Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                            Rgt fit lt- lnam(y x w1 w2)

                            Rgt summary(fit)

                            Calllnam(y = y x = x W1 = w1 W2 = w2)

                            ResidualsMin 1Q Median 3Q Max

                            -052052 -018305 001156 015557 062082

                            CoefficientsEstimate Std Error Z value Pr(gt|z|)

                            X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                            Journal of Statistical Software 45

                            X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                            Estimate Std ErrorSigma 009597 922e-05

                            Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                            Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                            In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                            3 Closing comments

                            The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                            Acknowledgments

                            The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                            46 Social Network Analysis with sna

                            minus3 minus2 minus1 0 1 2

                            minus3minus2

                            minus10

                            12

                            Fitted vs Observed Values

                            y

                            y

                            minus3 minus2 minus1 0 1 2

                            minus02

                            minus01

                            00

                            01

                            02

                            Fitted Values vs Estimated Disturbances

                            y

                            ν

                            minus2 minus1 0 1 2

                            minus04

                            minus02

                            00

                            02

                            04

                            06

                            Normal QminusQ Residual Plot

                            Theoretical Quantiles

                            Sam

                            ple

                            Qua

                            ntile

                            s

                            Net Influence Plot

                            Figure 6 Plot method output for lnam

                            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                            References

                            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                            Journal of Statistical Software 47

                            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                            48 Social Network Analysis with sna

                            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                            Journal of Statistical Software 49

                            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                            50 Social Network Analysis with sna

                            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                            Journal of Statistical Software 51

                            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                            Affiliation

                            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                            • Introduction and overview
                              • Package history
                              • sna and statnet
                              • Functionality
                              • Terminology and data representation
                                • Importing relational data into R
                                    • Package highlights
                                      • Random graph generation
                                        • Example
                                          • Visualization and data manipulation
                                            • Neighborhood and ego net functions
                                            • Visualization
                                              • Descriptive indices
                                                • Node-level indices
                                                • Graph-level indices
                                                  • Connectivity and subgraph statistics
                                                    • Example
                                                      • Position and role analysis
                                                        • Example
                                                          • Exploratory edge set comparison
                                                            • Example
                                                              • Network inference and process models
                                                                • Example
                                                                    • Closing comments

                              Journal of Statistical Software 15

                              Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

                              Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

                              Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

                              Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

                              in the ith panel iff vprime belongs to the ith order partial neighborhood of v

                              and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

                              While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

                              Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

                              16 Social Network Analysis with sna

                              Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                              Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                              Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                              Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                              elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                              All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                              gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                              Rgt g lt- rgraph(5 diag = TRUE)

                              Journal of Statistical Software 17

                              Default Curved Edges MDS Layout

                              Circular Layout Sociomatrix

                              1

                              2

                              3

                              4

                              5

                              1 2 3 4 5

                              1

                              2

                              3

                              4

                              5

                              Multiple Options

                              1

                              2

                              3

                              4

                              5

                              Figure 3 Sample visualizations using gplot with multiple layout and display options

                              Rgt par(mfrow = c(2 3))

                              Rgt gplot(g main = Default)

                              Rgt gplot(g usecurv = TRUE main = Curved Edges)

                              Rgt gplot(g mode = mds main = MDS Layout)

                              Rgt gplot(g mode = circle main = Circular Layout)

                              Rgt plotsociomatrix(g main = Sociomatrix)

                              Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                              + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                              + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                              Output from the above is shown in Figure 3

                              Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                              Rgt gplot3d(rgws(1 5 3 1 0))

                              Rgt gplot3d(rgws(1 5 3 1 005))

                              18 Social Network Analysis with sna

                              Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                              Rgt gplot3d(rgws(1 5 3 1 02))

                              Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                              As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                              Rgt par(mfrow = c(1 3))

                              Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                              + xlab = ylab = main = gplotvertex Example)

                              Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                              + col = 110 sides = 312 radius = 01)

                              Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                              Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                              Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                              + xlab = ylab = main = gplotloop Example)

                              Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                              + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                              + arrowhead = TRUE)

                              Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                              + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                              The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                              23 Descriptive indices

                              The literature of social network analysis is rich with descriptive indices of various sorts

                              gplot3d1gif
                              Media File (imagegif)
                              gplot3d2gif
                              Media File (imagegif)
                              gplot3d3gif
                              Media File (imagegif)

                              Journal of Statistical Software 19

                              minus15 minus10 minus05 00 05 10 15

                              minus15

                              minus10

                              minus05

                              00

                              05

                              10

                              15

                              gplotvertex Example

                              10 12 14 16 18 20

                              10

                              12

                              14

                              16

                              18

                              20

                              gplotarrow Example

                              minus2 minus1 0 1 2

                              minus2minus1

                              01

                              2

                              gplotloop Example

                              Figure 5 Examples of the use of gplot supplemental functions

                              all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                              Node-level indices

                              Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                              sum(vprimevprimeprime)subV v

                              gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                              G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                              equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                              vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                              20 Social Network Analysis with sna

                              closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                              Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                              An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                              Journal of Statistical Software 21

                              the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                              To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                              Rgt dat lt- rgraph(10)

                              Rgt degree(dat cmode = indegree)

                              [1] 4 4 8 2 4 5 4 4 3 6

                              Rgt degree(dat cmode = outdegree)

                              [1] 6 3 5 2 5 4 4 4 5 6

                              Rgt degree(dat)

                              [1] 10 7 13 4 9 9 8 8 8 12

                              Rgt closeness(dat)

                              [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                              Rgt betweenness(dat)

                              [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                              Rgt stresscent(dat)

                              [1] 21 6 27 1 14 15 6 7 7 21

                              Rgt graphcent(dat)

                              [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                              Rgt evcent(dat)

                              [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                              22 Social Network Analysis with sna

                              Rgt infocent(dat)

                              [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                              As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                              Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                              [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                              Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                              + evcent(dat rescale = TRUE)) lt 1e-10)

                              [1] TRUE

                              Rgt bonpow(dat exponent = -05)

                              [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                              As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                              Rgt memb lt- sample(13 10 replace = TRUE)

                              Rgt summary(brokerage(dat memb))

                              Gould-Fernandez Brokerage Analysis

                              Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                              w_I 50000 58638 27314 -03162 07518

                              Journal of Statistical Software 23

                              w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                              Individual Properties (by Group)

                              Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                              [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                              b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                              Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                              [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                              t[1] -07838541[2] 14877951

                              Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                              [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                              b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                              Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                              24 Social Network Analysis with sna

                              for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                              Graph-level indices

                              Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                              C(G) =|V |sumi=1

                              [(maxvisinV

                              c (vG))minus c (vi G)

                              ] (1)

                              ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                              C(G) = |V | [clowast(G)minus c(G)] (2)

                              where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                              i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                              2For instance when all vertices are automorphically equivalent

                              Journal of Statistical Software 25

                              centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                              although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                              In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                              The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                              Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                              Rgt gden(g)

                              [1] 006666667 031111111 054444444 072222222 093333333

                              Rgt grecip(g)

                              [1] 08666667 03777778 04888889 06666667 08666667

                              Rgt grecip(g measure = edgewise)

                              [1] 00000000 00000000 05306122 07692308 09285714

                              Rgt grecip(g) == 1 - hierarchy(g)

                              [1] TRUE TRUE TRUE TRUE TRUE

                              Rgt gtrans(g)

                              [1] 10000000 02957746 05047619 06809651 09326923

                              Rgt gtrans(g measure = weakcensus)

                              3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                              26 Social Network Analysis with sna

                              [1] 0 21 106 254 582

                              Rgt connectedness(g)

                              [1] 04666667 10000000 10000000 10000000 10000000

                              Rgt efficiency(g)

                              [1] 100000000 076543210 050617284 030864198 007407407

                              Rgt hierarchy(g measure = krackhardt)

                              [1] 10 02 00 00 00

                              Rgt lubness(g)

                              [1] 02 10 10 10 10

                              centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                              Rgt centralization(g degree cmode = outdegree)

                              [1] 01728395

                              Rgt centralization(g betweenness)

                              [1] 0

                              Rgt apply(g 1 centralization degree cmode = outdegree)

                              [1] 017283951 027160494 038271605 006172840 007407407

                              Rgt apply(g 1 centralization betweenness)

                              [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                              As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                              Journal of Statistical Software 27

                              Rgt o2scent lt- function(dat tmaxdev = FALSE )

                              + n lt- NROW(dat)

                              + if(tmaxdev)

                              + return((n-1) choose(n-1 2))

                              + odeg lt- degree(dat cmode = outdegree)

                              + choose(odeg 2)

                              +

                              Rgt apply(g 1 centralization o2scent)

                              [1] 002160494 020370370 054012346 008950617 014506173

                              Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                              24 Connectivity and subgraph statistics

                              Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                              sumNj=1

                              sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                              is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                              At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                              28 Social Network Analysis with sna

                              subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                              Example

                              To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                              Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                              Rgt apply(dyadcensus(g1) 2 mean)

                              Mut Asym Null100 1284 3116

                              Rgt apply(triadcensus(g1) 2 mean)

                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                              Journal of Statistical Software 29

                              120C 210 300030 000 000

                              Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                              Rgt apply(dyadcensus(g2) 2 mean)

                              Mut Asym Null884 926 2690

                              Rgt apply(triadcensus(g2) 2 mean)

                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                              Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                              Rgt apply(dyadcensus(g3) 2 mean)

                              Mut Asym Null894 2044 1562

                              Rgt apply(triadcensus(g3) 2 mean)

                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                              Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                              + dyadictabulation = bylength)$pathcount

                              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                              Rgt kcyclecensus(g3[1] maxlen = 5

                              + cyclecomembership = bylength)$cyclecount

                              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                              30 Social Network Analysis with sna

                              Rgt componentdist(g3[1])

                              $membership[1] 1 1 1 1 1 1 1 1 1 1

                              $csize[1] 10

                              $cdist[1] 0 0 0 0 0 0 0 0 0 1

                              Rgt structurestatistics(g3[1])

                              0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                              In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                              Rgt g4 lt- g1[12]

                              Rgt g4[2] lt- g2[1]

                              Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                              + g1 = 1 g2 = 2)

                              Rgt summary(cug)

                              CUG Test Results

                              Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                              Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                              Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                              Rgt summary(cug)

                              Journal of Statistical Software 31

                              CUG Test Results

                              Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                              Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                              A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                              25 Position and role analysis

                              The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                              In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                              32 Social Network Analysis with sna

                              This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                              After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                              The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                              Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                              Example

                              To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                              Journal of Statistical Software 33

                              with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                              Rgt gp lt- sapply(runif(20 0 1) rep 20)

                              Rgt g lt- rgraph(20 tprob = gp)

                              Rgt eq lt- equivclust(g)

                              Rgt b lt- blockmodel(g eq h = 15)

                              Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                              Rgt ge

                              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                              26 Exploratory edge set comparison

                              One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                              cov(GH) =

                              sum(ij)

                              (AG

                              ij minus microG

                              )(AH

                              ij minus microH

                              )|V | (|V | minus 1)

                              (3)

                              34 Social Network Analysis with sna

                              where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                              (ij)AXij is the graph mean The graph variance is then cov(GG)

                              and the graph correlation ρ(GH) = cov(GH)radic

                              cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                              The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                              Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                              In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                              Journal of Statistical Software 35

                              Example

                              We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                              Rgt g1 lt- rgraph(5)

                              Rgt g2 lt -rgraph(5)

                              Rgt g3 lt- rmperm(g2)

                              Rgt gcor(g1 g2)

                              [1] -01336306

                              Rgt gcor(g1 g3)

                              [1] 008908708

                              Rgt gcor(g2 g3)

                              [1] -04583333

                              Rgt gscor(g1 g2 reps = 1e5)

                              [1] 05345225

                              Rgt gscor(g1 g3 reps = 1e5)

                              [1] 05345225

                              Rgt gscor(g2 g3 reps = 1e5)

                              [1] 1

                              Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                              Rgt x lt- rgraph(20 4)

                              Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                              Rgt nl lt- netlm(y x)

                              Rgt summary(nl)

                              36 Social Network Analysis with sna

                              OLS Network Model

                              Residuals0 25 50 75 100

                              -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                              CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                              (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                              Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                              Test Diagnostics

                              Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                              (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                              As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                              Rgt x lt- rgraph(20 4)

                              Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                              Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                              Rgt y lt- rgraph(20 tprob = yp)

                              Rgt nl lt- netlogit(y x)

                              Rgt summary(nl)

                              Network Logit Model

                              Coefficients

                              Journal of Statistical Software 37

                              Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                              Goodness of Fit Statistics

                              Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                              3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                              (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                              Contingency Table (predicted (rows) x actual (cols))

                              0 10 0 01 39 341

                              Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                              Test Diagnostics

                              Null Hypothesis qapReplications 1000Distribution Summary

                              (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                              It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                              38 Social Network Analysis with sna

                              parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                              27 Network inference and process models

                              A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                              Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                              Journal of Statistical Software 39

                              of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                              )prodk

                              (1minusPr(Bk)

                              )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                              While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                              y =

                              (wsum

                              i=1

                              θiWi

                              )y + Xβ + ε (4)

                              ε =

                              (zsum

                              i=1

                              ψiZi

                              )ε+ ν (5)

                              where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                              40 Social Network Analysis with sna

                              Example

                              To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                              Rgt g lt- rgraph(20)

                              Rgt ep lt- rbeta(20 1 25)

                              Rgt em lt- rbeta(20 15 25)

                              Rgt dat lt- array(dim = c(20 20 20))

                              Rgt for(i in 120)

                              + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                              Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                              Rgt pem lt- matrix(nrow = 20 ncol = 2)

                              Rgt pem[1] lt- 2

                              Rgt pem[2] lt- 11

                              Rgt pep lt- matrix(nrow = 20 ncol = 2)

                              Rgt pep[1] lt- 2

                              Rgt pep[2] lt- 11

                              Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                              + epprior = pep burntime = 300 draws = 100)

                              Rgt summary(b)

                              Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                              Multiple Error Probability Model

                              Marginal Posterior Network Distribution

                              a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                              Journal of Statistical Software 41

                              a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                              a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                              Marginal Posterior Global Error Distribution

                              e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                              Marginal Posterior Error Distribution (by observer)

                              Probability of False Negatives (e^-)

                              Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                              42 Social Network Analysis with sna

                              o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                              Probability of False Positives (e^+)

                              Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                              MCMC Diagnostics

                              Replicate Chains 5Burn Time 300

                              Journal of Statistical Software 43

                              Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                              Max 1003116Med 09992194IQR 00004545115

                              Rgt cor(em apply(b$em 2 median))

                              [1] 09187894

                              Rgt cor(ep apply(b$ep 2 median))

                              [1] 0971649

                              Rgt mean(apply(b$net c(2 3) median) == g)

                              [1] 1

                              Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                              Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                              Rgt mean(consensus(dat method = LASintersection) == g)

                              [1] 07725

                              Rgt mean(consensus(dat method = LASunion) == g)

                              [1] 0905

                              Rgt mean(consensus(dat method = centralgraph) == g)

                              [1] 09575

                              Rgt mean(consensus(dat method = romneybatchelder) == g)

                              44 Social Network Analysis with sna

                              Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                              For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                              As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                              Rgt w1 lt- rgraph(50)

                              Rgt w2 lt- rgraph(50)

                              Rgt x lt- matrix(rnorm(50 5) 50 5)

                              Rgt r1 lt- 02

                              Rgt r2 lt- 03

                              Rgt sigma lt- 01

                              Rgt beta lt- rnorm(5)

                              Rgt nu lt- rnorm(50 0 sigma)

                              Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                              Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                              Rgt fit lt- lnam(y x w1 w2)

                              Rgt summary(fit)

                              Calllnam(y = y x = x W1 = w1 W2 = w2)

                              ResidualsMin 1Q Median 3Q Max

                              -052052 -018305 001156 015557 062082

                              CoefficientsEstimate Std Error Z value Pr(gt|z|)

                              X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                              Journal of Statistical Software 45

                              X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                              Estimate Std ErrorSigma 009597 922e-05

                              Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                              Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                              In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                              3 Closing comments

                              The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                              Acknowledgments

                              The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                              46 Social Network Analysis with sna

                              minus3 minus2 minus1 0 1 2

                              minus3minus2

                              minus10

                              12

                              Fitted vs Observed Values

                              y

                              y

                              minus3 minus2 minus1 0 1 2

                              minus02

                              minus01

                              00

                              01

                              02

                              Fitted Values vs Estimated Disturbances

                              y

                              ν

                              minus2 minus1 0 1 2

                              minus04

                              minus02

                              00

                              02

                              04

                              06

                              Normal QminusQ Residual Plot

                              Theoretical Quantiles

                              Sam

                              ple

                              Qua

                              ntile

                              s

                              Net Influence Plot

                              Figure 6 Plot method output for lnam

                              team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                              References

                              Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                              Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                              Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                              Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                              Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                              Journal of Statistical Software 47

                              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                              48 Social Network Analysis with sna

                              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                              Journal of Statistical Software 49

                              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                              50 Social Network Analysis with sna

                              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                              Journal of Statistical Software 51

                              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                              Affiliation

                              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                              • Introduction and overview
                                • Package history
                                • sna and statnet
                                • Functionality
                                • Terminology and data representation
                                  • Importing relational data into R
                                      • Package highlights
                                        • Random graph generation
                                          • Example
                                            • Visualization and data manipulation
                                              • Neighborhood and ego net functions
                                              • Visualization
                                                • Descriptive indices
                                                  • Node-level indices
                                                  • Graph-level indices
                                                    • Connectivity and subgraph statistics
                                                      • Example
                                                        • Position and role analysis
                                                          • Example
                                                            • Exploratory edge set comparison
                                                              • Example
                                                                • Network inference and process models
                                                                  • Example
                                                                      • Closing comments

                                16 Social Network Analysis with sna

                                Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

                                Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

                                Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

                                Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

                                elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

                                All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

                                gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

                                Rgt g lt- rgraph(5 diag = TRUE)

                                Journal of Statistical Software 17

                                Default Curved Edges MDS Layout

                                Circular Layout Sociomatrix

                                1

                                2

                                3

                                4

                                5

                                1 2 3 4 5

                                1

                                2

                                3

                                4

                                5

                                Multiple Options

                                1

                                2

                                3

                                4

                                5

                                Figure 3 Sample visualizations using gplot with multiple layout and display options

                                Rgt par(mfrow = c(2 3))

                                Rgt gplot(g main = Default)

                                Rgt gplot(g usecurv = TRUE main = Curved Edges)

                                Rgt gplot(g mode = mds main = MDS Layout)

                                Rgt gplot(g mode = circle main = Circular Layout)

                                Rgt plotsociomatrix(g main = Sociomatrix)

                                Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                                + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                                + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                                Output from the above is shown in Figure 3

                                Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                                Rgt gplot3d(rgws(1 5 3 1 0))

                                Rgt gplot3d(rgws(1 5 3 1 005))

                                18 Social Network Analysis with sna

                                Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                                Rgt gplot3d(rgws(1 5 3 1 02))

                                Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                                As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                                Rgt par(mfrow = c(1 3))

                                Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                                + xlab = ylab = main = gplotvertex Example)

                                Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                                + col = 110 sides = 312 radius = 01)

                                Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                                Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                                Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                                + xlab = ylab = main = gplotloop Example)

                                Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                                + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                                + arrowhead = TRUE)

                                Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                                + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                                The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                                23 Descriptive indices

                                The literature of social network analysis is rich with descriptive indices of various sorts

                                gplot3d1gif
                                Media File (imagegif)
                                gplot3d2gif
                                Media File (imagegif)
                                gplot3d3gif
                                Media File (imagegif)

                                Journal of Statistical Software 19

                                minus15 minus10 minus05 00 05 10 15

                                minus15

                                minus10

                                minus05

                                00

                                05

                                10

                                15

                                gplotvertex Example

                                10 12 14 16 18 20

                                10

                                12

                                14

                                16

                                18

                                20

                                gplotarrow Example

                                minus2 minus1 0 1 2

                                minus2minus1

                                01

                                2

                                gplotloop Example

                                Figure 5 Examples of the use of gplot supplemental functions

                                all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                                Node-level indices

                                Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                                sum(vprimevprimeprime)subV v

                                gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                                G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                                equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                                vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                                20 Social Network Analysis with sna

                                closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                                Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                                An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                                Journal of Statistical Software 21

                                the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                Rgt dat lt- rgraph(10)

                                Rgt degree(dat cmode = indegree)

                                [1] 4 4 8 2 4 5 4 4 3 6

                                Rgt degree(dat cmode = outdegree)

                                [1] 6 3 5 2 5 4 4 4 5 6

                                Rgt degree(dat)

                                [1] 10 7 13 4 9 9 8 8 8 12

                                Rgt closeness(dat)

                                [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                Rgt betweenness(dat)

                                [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                Rgt stresscent(dat)

                                [1] 21 6 27 1 14 15 6 7 7 21

                                Rgt graphcent(dat)

                                [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                Rgt evcent(dat)

                                [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                22 Social Network Analysis with sna

                                Rgt infocent(dat)

                                [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                + evcent(dat rescale = TRUE)) lt 1e-10)

                                [1] TRUE

                                Rgt bonpow(dat exponent = -05)

                                [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                Rgt memb lt- sample(13 10 replace = TRUE)

                                Rgt summary(brokerage(dat memb))

                                Gould-Fernandez Brokerage Analysis

                                Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                w_I 50000 58638 27314 -03162 07518

                                Journal of Statistical Software 23

                                w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                Individual Properties (by Group)

                                Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                t[1] -07838541[2] 14877951

                                Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                24 Social Network Analysis with sna

                                for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                Graph-level indices

                                Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                C(G) =|V |sumi=1

                                [(maxvisinV

                                c (vG))minus c (vi G)

                                ] (1)

                                ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                C(G) = |V | [clowast(G)minus c(G)] (2)

                                where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                2For instance when all vertices are automorphically equivalent

                                Journal of Statistical Software 25

                                centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                Rgt gden(g)

                                [1] 006666667 031111111 054444444 072222222 093333333

                                Rgt grecip(g)

                                [1] 08666667 03777778 04888889 06666667 08666667

                                Rgt grecip(g measure = edgewise)

                                [1] 00000000 00000000 05306122 07692308 09285714

                                Rgt grecip(g) == 1 - hierarchy(g)

                                [1] TRUE TRUE TRUE TRUE TRUE

                                Rgt gtrans(g)

                                [1] 10000000 02957746 05047619 06809651 09326923

                                Rgt gtrans(g measure = weakcensus)

                                3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                26 Social Network Analysis with sna

                                [1] 0 21 106 254 582

                                Rgt connectedness(g)

                                [1] 04666667 10000000 10000000 10000000 10000000

                                Rgt efficiency(g)

                                [1] 100000000 076543210 050617284 030864198 007407407

                                Rgt hierarchy(g measure = krackhardt)

                                [1] 10 02 00 00 00

                                Rgt lubness(g)

                                [1] 02 10 10 10 10

                                centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                Rgt centralization(g degree cmode = outdegree)

                                [1] 01728395

                                Rgt centralization(g betweenness)

                                [1] 0

                                Rgt apply(g 1 centralization degree cmode = outdegree)

                                [1] 017283951 027160494 038271605 006172840 007407407

                                Rgt apply(g 1 centralization betweenness)

                                [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                Journal of Statistical Software 27

                                Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                + n lt- NROW(dat)

                                + if(tmaxdev)

                                + return((n-1) choose(n-1 2))

                                + odeg lt- degree(dat cmode = outdegree)

                                + choose(odeg 2)

                                +

                                Rgt apply(g 1 centralization o2scent)

                                [1] 002160494 020370370 054012346 008950617 014506173

                                Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                24 Connectivity and subgraph statistics

                                Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                sumNj=1

                                sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                28 Social Network Analysis with sna

                                subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                Example

                                To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                Rgt apply(dyadcensus(g1) 2 mean)

                                Mut Asym Null100 1284 3116

                                Rgt apply(triadcensus(g1) 2 mean)

                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                Journal of Statistical Software 29

                                120C 210 300030 000 000

                                Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                Rgt apply(dyadcensus(g2) 2 mean)

                                Mut Asym Null884 926 2690

                                Rgt apply(triadcensus(g2) 2 mean)

                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                Rgt apply(dyadcensus(g3) 2 mean)

                                Mut Asym Null894 2044 1562

                                Rgt apply(triadcensus(g3) 2 mean)

                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                + dyadictabulation = bylength)$pathcount

                                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                Rgt kcyclecensus(g3[1] maxlen = 5

                                + cyclecomembership = bylength)$cyclecount

                                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                30 Social Network Analysis with sna

                                Rgt componentdist(g3[1])

                                $membership[1] 1 1 1 1 1 1 1 1 1 1

                                $csize[1] 10

                                $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                Rgt structurestatistics(g3[1])

                                0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                Rgt g4 lt- g1[12]

                                Rgt g4[2] lt- g2[1]

                                Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                + g1 = 1 g2 = 2)

                                Rgt summary(cug)

                                CUG Test Results

                                Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                Rgt summary(cug)

                                Journal of Statistical Software 31

                                CUG Test Results

                                Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                25 Position and role analysis

                                The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                32 Social Network Analysis with sna

                                This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                Example

                                To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                Journal of Statistical Software 33

                                with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                Rgt g lt- rgraph(20 tprob = gp)

                                Rgt eq lt- equivclust(g)

                                Rgt b lt- blockmodel(g eq h = 15)

                                Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                Rgt ge

                                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                26 Exploratory edge set comparison

                                One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                cov(GH) =

                                sum(ij)

                                (AG

                                ij minus microG

                                )(AH

                                ij minus microH

                                )|V | (|V | minus 1)

                                (3)

                                34 Social Network Analysis with sna

                                where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                (ij)AXij is the graph mean The graph variance is then cov(GG)

                                and the graph correlation ρ(GH) = cov(GH)radic

                                cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                Journal of Statistical Software 35

                                Example

                                We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                Rgt g1 lt- rgraph(5)

                                Rgt g2 lt -rgraph(5)

                                Rgt g3 lt- rmperm(g2)

                                Rgt gcor(g1 g2)

                                [1] -01336306

                                Rgt gcor(g1 g3)

                                [1] 008908708

                                Rgt gcor(g2 g3)

                                [1] -04583333

                                Rgt gscor(g1 g2 reps = 1e5)

                                [1] 05345225

                                Rgt gscor(g1 g3 reps = 1e5)

                                [1] 05345225

                                Rgt gscor(g2 g3 reps = 1e5)

                                [1] 1

                                Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                Rgt x lt- rgraph(20 4)

                                Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                Rgt nl lt- netlm(y x)

                                Rgt summary(nl)

                                36 Social Network Analysis with sna

                                OLS Network Model

                                Residuals0 25 50 75 100

                                -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                Test Diagnostics

                                Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                Rgt x lt- rgraph(20 4)

                                Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                Rgt y lt- rgraph(20 tprob = yp)

                                Rgt nl lt- netlogit(y x)

                                Rgt summary(nl)

                                Network Logit Model

                                Coefficients

                                Journal of Statistical Software 37

                                Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                Goodness of Fit Statistics

                                Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                Contingency Table (predicted (rows) x actual (cols))

                                0 10 0 01 39 341

                                Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                Test Diagnostics

                                Null Hypothesis qapReplications 1000Distribution Summary

                                (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                38 Social Network Analysis with sna

                                parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                27 Network inference and process models

                                A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                Journal of Statistical Software 39

                                of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                )prodk

                                (1minusPr(Bk)

                                )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                y =

                                (wsum

                                i=1

                                θiWi

                                )y + Xβ + ε (4)

                                ε =

                                (zsum

                                i=1

                                ψiZi

                                )ε+ ν (5)

                                where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                40 Social Network Analysis with sna

                                Example

                                To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                Rgt g lt- rgraph(20)

                                Rgt ep lt- rbeta(20 1 25)

                                Rgt em lt- rbeta(20 15 25)

                                Rgt dat lt- array(dim = c(20 20 20))

                                Rgt for(i in 120)

                                + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                Rgt pem[1] lt- 2

                                Rgt pem[2] lt- 11

                                Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                Rgt pep[1] lt- 2

                                Rgt pep[2] lt- 11

                                Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                + epprior = pep burntime = 300 draws = 100)

                                Rgt summary(b)

                                Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                Multiple Error Probability Model

                                Marginal Posterior Network Distribution

                                a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                Journal of Statistical Software 41

                                a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                Marginal Posterior Global Error Distribution

                                e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                Marginal Posterior Error Distribution (by observer)

                                Probability of False Negatives (e^-)

                                Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                42 Social Network Analysis with sna

                                o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                Probability of False Positives (e^+)

                                Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                MCMC Diagnostics

                                Replicate Chains 5Burn Time 300

                                Journal of Statistical Software 43

                                Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                Max 1003116Med 09992194IQR 00004545115

                                Rgt cor(em apply(b$em 2 median))

                                [1] 09187894

                                Rgt cor(ep apply(b$ep 2 median))

                                [1] 0971649

                                Rgt mean(apply(b$net c(2 3) median) == g)

                                [1] 1

                                Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                Rgt mean(consensus(dat method = LASintersection) == g)

                                [1] 07725

                                Rgt mean(consensus(dat method = LASunion) == g)

                                [1] 0905

                                Rgt mean(consensus(dat method = centralgraph) == g)

                                [1] 09575

                                Rgt mean(consensus(dat method = romneybatchelder) == g)

                                44 Social Network Analysis with sna

                                Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                Rgt w1 lt- rgraph(50)

                                Rgt w2 lt- rgraph(50)

                                Rgt x lt- matrix(rnorm(50 5) 50 5)

                                Rgt r1 lt- 02

                                Rgt r2 lt- 03

                                Rgt sigma lt- 01

                                Rgt beta lt- rnorm(5)

                                Rgt nu lt- rnorm(50 0 sigma)

                                Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                Rgt fit lt- lnam(y x w1 w2)

                                Rgt summary(fit)

                                Calllnam(y = y x = x W1 = w1 W2 = w2)

                                ResidualsMin 1Q Median 3Q Max

                                -052052 -018305 001156 015557 062082

                                CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                Journal of Statistical Software 45

                                X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                Estimate Std ErrorSigma 009597 922e-05

                                Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                3 Closing comments

                                The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                Acknowledgments

                                The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                46 Social Network Analysis with sna

                                minus3 minus2 minus1 0 1 2

                                minus3minus2

                                minus10

                                12

                                Fitted vs Observed Values

                                y

                                y

                                minus3 minus2 minus1 0 1 2

                                minus02

                                minus01

                                00

                                01

                                02

                                Fitted Values vs Estimated Disturbances

                                y

                                ν

                                minus2 minus1 0 1 2

                                minus04

                                minus02

                                00

                                02

                                04

                                06

                                Normal QminusQ Residual Plot

                                Theoretical Quantiles

                                Sam

                                ple

                                Qua

                                ntile

                                s

                                Net Influence Plot

                                Figure 6 Plot method output for lnam

                                team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                References

                                Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                Journal of Statistical Software 47

                                Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                48 Social Network Analysis with sna

                                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                Journal of Statistical Software 49

                                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                50 Social Network Analysis with sna

                                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                Journal of Statistical Software 51

                                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                Affiliation

                                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                • Introduction and overview
                                  • Package history
                                  • sna and statnet
                                  • Functionality
                                  • Terminology and data representation
                                    • Importing relational data into R
                                        • Package highlights
                                          • Random graph generation
                                            • Example
                                              • Visualization and data manipulation
                                                • Neighborhood and ego net functions
                                                • Visualization
                                                  • Descriptive indices
                                                    • Node-level indices
                                                    • Graph-level indices
                                                      • Connectivity and subgraph statistics
                                                        • Example
                                                          • Position and role analysis
                                                            • Example
                                                              • Exploratory edge set comparison
                                                                • Example
                                                                  • Network inference and process models
                                                                    • Example
                                                                        • Closing comments

                                  Journal of Statistical Software 17

                                  Default Curved Edges MDS Layout

                                  Circular Layout Sociomatrix

                                  1

                                  2

                                  3

                                  4

                                  5

                                  1 2 3 4 5

                                  1

                                  2

                                  3

                                  4

                                  5

                                  Multiple Options

                                  1

                                  2

                                  3

                                  4

                                  5

                                  Figure 3 Sample visualizations using gplot with multiple layout and display options

                                  Rgt par(mfrow = c(2 3))

                                  Rgt gplot(g main = Default)

                                  Rgt gplot(g usecurv = TRUE main = Curved Edges)

                                  Rgt gplot(g mode = mds main = MDS Layout)

                                  Rgt gplot(g mode = circle main = Circular Layout)

                                  Rgt plotsociomatrix(g main = Sociomatrix)

                                  Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

                                  + vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

                                  + displaylabels = TRUE labelbg = gray90 main = Multiple Options)

                                  Output from the above is shown in Figure 3

                                  Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

                                  Rgt gplot3d(rgws(1 5 3 1 0))

                                  Rgt gplot3d(rgws(1 5 3 1 005))

                                  18 Social Network Analysis with sna

                                  Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                                  Rgt gplot3d(rgws(1 5 3 1 02))

                                  Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                                  As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                                  Rgt par(mfrow = c(1 3))

                                  Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                                  + xlab = ylab = main = gplotvertex Example)

                                  Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                                  + col = 110 sides = 312 radius = 01)

                                  Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                                  Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                                  Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                                  + xlab = ylab = main = gplotloop Example)

                                  Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                                  + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                                  + arrowhead = TRUE)

                                  Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                                  + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                                  The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                                  23 Descriptive indices

                                  The literature of social network analysis is rich with descriptive indices of various sorts

                                  gplot3d1gif
                                  Media File (imagegif)
                                  gplot3d2gif
                                  Media File (imagegif)
                                  gplot3d3gif
                                  Media File (imagegif)

                                  Journal of Statistical Software 19

                                  minus15 minus10 minus05 00 05 10 15

                                  minus15

                                  minus10

                                  minus05

                                  00

                                  05

                                  10

                                  15

                                  gplotvertex Example

                                  10 12 14 16 18 20

                                  10

                                  12

                                  14

                                  16

                                  18

                                  20

                                  gplotarrow Example

                                  minus2 minus1 0 1 2

                                  minus2minus1

                                  01

                                  2

                                  gplotloop Example

                                  Figure 5 Examples of the use of gplot supplemental functions

                                  all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                                  Node-level indices

                                  Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                                  sum(vprimevprimeprime)subV v

                                  gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                                  G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                                  equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                                  vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                                  20 Social Network Analysis with sna

                                  closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                                  Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                                  An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                                  Journal of Statistical Software 21

                                  the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                  To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                  Rgt dat lt- rgraph(10)

                                  Rgt degree(dat cmode = indegree)

                                  [1] 4 4 8 2 4 5 4 4 3 6

                                  Rgt degree(dat cmode = outdegree)

                                  [1] 6 3 5 2 5 4 4 4 5 6

                                  Rgt degree(dat)

                                  [1] 10 7 13 4 9 9 8 8 8 12

                                  Rgt closeness(dat)

                                  [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                  Rgt betweenness(dat)

                                  [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                  Rgt stresscent(dat)

                                  [1] 21 6 27 1 14 15 6 7 7 21

                                  Rgt graphcent(dat)

                                  [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                  Rgt evcent(dat)

                                  [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                  22 Social Network Analysis with sna

                                  Rgt infocent(dat)

                                  [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                  As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                  Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                  [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                  Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                  + evcent(dat rescale = TRUE)) lt 1e-10)

                                  [1] TRUE

                                  Rgt bonpow(dat exponent = -05)

                                  [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                  As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                  Rgt memb lt- sample(13 10 replace = TRUE)

                                  Rgt summary(brokerage(dat memb))

                                  Gould-Fernandez Brokerage Analysis

                                  Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                  w_I 50000 58638 27314 -03162 07518

                                  Journal of Statistical Software 23

                                  w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                  Individual Properties (by Group)

                                  Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                  [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                  b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                  Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                  [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                  t[1] -07838541[2] 14877951

                                  Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                  [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                  b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                  Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                  24 Social Network Analysis with sna

                                  for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                  Graph-level indices

                                  Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                  C(G) =|V |sumi=1

                                  [(maxvisinV

                                  c (vG))minus c (vi G)

                                  ] (1)

                                  ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                  C(G) = |V | [clowast(G)minus c(G)] (2)

                                  where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                  i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                  2For instance when all vertices are automorphically equivalent

                                  Journal of Statistical Software 25

                                  centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                  although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                  In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                  The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                  Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                  Rgt gden(g)

                                  [1] 006666667 031111111 054444444 072222222 093333333

                                  Rgt grecip(g)

                                  [1] 08666667 03777778 04888889 06666667 08666667

                                  Rgt grecip(g measure = edgewise)

                                  [1] 00000000 00000000 05306122 07692308 09285714

                                  Rgt grecip(g) == 1 - hierarchy(g)

                                  [1] TRUE TRUE TRUE TRUE TRUE

                                  Rgt gtrans(g)

                                  [1] 10000000 02957746 05047619 06809651 09326923

                                  Rgt gtrans(g measure = weakcensus)

                                  3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                  26 Social Network Analysis with sna

                                  [1] 0 21 106 254 582

                                  Rgt connectedness(g)

                                  [1] 04666667 10000000 10000000 10000000 10000000

                                  Rgt efficiency(g)

                                  [1] 100000000 076543210 050617284 030864198 007407407

                                  Rgt hierarchy(g measure = krackhardt)

                                  [1] 10 02 00 00 00

                                  Rgt lubness(g)

                                  [1] 02 10 10 10 10

                                  centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                  Rgt centralization(g degree cmode = outdegree)

                                  [1] 01728395

                                  Rgt centralization(g betweenness)

                                  [1] 0

                                  Rgt apply(g 1 centralization degree cmode = outdegree)

                                  [1] 017283951 027160494 038271605 006172840 007407407

                                  Rgt apply(g 1 centralization betweenness)

                                  [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                  As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                  Journal of Statistical Software 27

                                  Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                  + n lt- NROW(dat)

                                  + if(tmaxdev)

                                  + return((n-1) choose(n-1 2))

                                  + odeg lt- degree(dat cmode = outdegree)

                                  + choose(odeg 2)

                                  +

                                  Rgt apply(g 1 centralization o2scent)

                                  [1] 002160494 020370370 054012346 008950617 014506173

                                  Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                  24 Connectivity and subgraph statistics

                                  Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                  sumNj=1

                                  sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                  is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                  At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                  28 Social Network Analysis with sna

                                  subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                  Example

                                  To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                  Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                  Rgt apply(dyadcensus(g1) 2 mean)

                                  Mut Asym Null100 1284 3116

                                  Rgt apply(triadcensus(g1) 2 mean)

                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                  Journal of Statistical Software 29

                                  120C 210 300030 000 000

                                  Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                  Rgt apply(dyadcensus(g2) 2 mean)

                                  Mut Asym Null884 926 2690

                                  Rgt apply(triadcensus(g2) 2 mean)

                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                  Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                  Rgt apply(dyadcensus(g3) 2 mean)

                                  Mut Asym Null894 2044 1562

                                  Rgt apply(triadcensus(g3) 2 mean)

                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                  Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                  + dyadictabulation = bylength)$pathcount

                                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                  Rgt kcyclecensus(g3[1] maxlen = 5

                                  + cyclecomembership = bylength)$cyclecount

                                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                  30 Social Network Analysis with sna

                                  Rgt componentdist(g3[1])

                                  $membership[1] 1 1 1 1 1 1 1 1 1 1

                                  $csize[1] 10

                                  $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                  Rgt structurestatistics(g3[1])

                                  0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                  In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                  Rgt g4 lt- g1[12]

                                  Rgt g4[2] lt- g2[1]

                                  Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                  + g1 = 1 g2 = 2)

                                  Rgt summary(cug)

                                  CUG Test Results

                                  Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                  Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                  Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                  Rgt summary(cug)

                                  Journal of Statistical Software 31

                                  CUG Test Results

                                  Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                  Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                  A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                  25 Position and role analysis

                                  The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                  In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                  32 Social Network Analysis with sna

                                  This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                  After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                  The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                  Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                  Example

                                  To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                  Journal of Statistical Software 33

                                  with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                  Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                  Rgt g lt- rgraph(20 tprob = gp)

                                  Rgt eq lt- equivclust(g)

                                  Rgt b lt- blockmodel(g eq h = 15)

                                  Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                  Rgt ge

                                  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                  26 Exploratory edge set comparison

                                  One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                  cov(GH) =

                                  sum(ij)

                                  (AG

                                  ij minus microG

                                  )(AH

                                  ij minus microH

                                  )|V | (|V | minus 1)

                                  (3)

                                  34 Social Network Analysis with sna

                                  where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                  (ij)AXij is the graph mean The graph variance is then cov(GG)

                                  and the graph correlation ρ(GH) = cov(GH)radic

                                  cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                  The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                  Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                  In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                  Journal of Statistical Software 35

                                  Example

                                  We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                  Rgt g1 lt- rgraph(5)

                                  Rgt g2 lt -rgraph(5)

                                  Rgt g3 lt- rmperm(g2)

                                  Rgt gcor(g1 g2)

                                  [1] -01336306

                                  Rgt gcor(g1 g3)

                                  [1] 008908708

                                  Rgt gcor(g2 g3)

                                  [1] -04583333

                                  Rgt gscor(g1 g2 reps = 1e5)

                                  [1] 05345225

                                  Rgt gscor(g1 g3 reps = 1e5)

                                  [1] 05345225

                                  Rgt gscor(g2 g3 reps = 1e5)

                                  [1] 1

                                  Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                  Rgt x lt- rgraph(20 4)

                                  Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                  Rgt nl lt- netlm(y x)

                                  Rgt summary(nl)

                                  36 Social Network Analysis with sna

                                  OLS Network Model

                                  Residuals0 25 50 75 100

                                  -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                  CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                  (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                  Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                  Test Diagnostics

                                  Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                  (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                  As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                  Rgt x lt- rgraph(20 4)

                                  Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                  Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                  Rgt y lt- rgraph(20 tprob = yp)

                                  Rgt nl lt- netlogit(y x)

                                  Rgt summary(nl)

                                  Network Logit Model

                                  Coefficients

                                  Journal of Statistical Software 37

                                  Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                  Goodness of Fit Statistics

                                  Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                  3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                  (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                  Contingency Table (predicted (rows) x actual (cols))

                                  0 10 0 01 39 341

                                  Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                  Test Diagnostics

                                  Null Hypothesis qapReplications 1000Distribution Summary

                                  (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                  It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                  38 Social Network Analysis with sna

                                  parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                  27 Network inference and process models

                                  A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                  Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                  Journal of Statistical Software 39

                                  of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                  )prodk

                                  (1minusPr(Bk)

                                  )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                  While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                  y =

                                  (wsum

                                  i=1

                                  θiWi

                                  )y + Xβ + ε (4)

                                  ε =

                                  (zsum

                                  i=1

                                  ψiZi

                                  )ε+ ν (5)

                                  where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                  40 Social Network Analysis with sna

                                  Example

                                  To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                  Rgt g lt- rgraph(20)

                                  Rgt ep lt- rbeta(20 1 25)

                                  Rgt em lt- rbeta(20 15 25)

                                  Rgt dat lt- array(dim = c(20 20 20))

                                  Rgt for(i in 120)

                                  + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                  Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                  Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                  Rgt pem[1] lt- 2

                                  Rgt pem[2] lt- 11

                                  Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                  Rgt pep[1] lt- 2

                                  Rgt pep[2] lt- 11

                                  Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                  + epprior = pep burntime = 300 draws = 100)

                                  Rgt summary(b)

                                  Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                  Multiple Error Probability Model

                                  Marginal Posterior Network Distribution

                                  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                  Journal of Statistical Software 41

                                  a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                  a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                  Marginal Posterior Global Error Distribution

                                  e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                  Marginal Posterior Error Distribution (by observer)

                                  Probability of False Negatives (e^-)

                                  Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                  42 Social Network Analysis with sna

                                  o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                  Probability of False Positives (e^+)

                                  Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                  MCMC Diagnostics

                                  Replicate Chains 5Burn Time 300

                                  Journal of Statistical Software 43

                                  Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                  Max 1003116Med 09992194IQR 00004545115

                                  Rgt cor(em apply(b$em 2 median))

                                  [1] 09187894

                                  Rgt cor(ep apply(b$ep 2 median))

                                  [1] 0971649

                                  Rgt mean(apply(b$net c(2 3) median) == g)

                                  [1] 1

                                  Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                  Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                  Rgt mean(consensus(dat method = LASintersection) == g)

                                  [1] 07725

                                  Rgt mean(consensus(dat method = LASunion) == g)

                                  [1] 0905

                                  Rgt mean(consensus(dat method = centralgraph) == g)

                                  [1] 09575

                                  Rgt mean(consensus(dat method = romneybatchelder) == g)

                                  44 Social Network Analysis with sna

                                  Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                  For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                  As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                  Rgt w1 lt- rgraph(50)

                                  Rgt w2 lt- rgraph(50)

                                  Rgt x lt- matrix(rnorm(50 5) 50 5)

                                  Rgt r1 lt- 02

                                  Rgt r2 lt- 03

                                  Rgt sigma lt- 01

                                  Rgt beta lt- rnorm(5)

                                  Rgt nu lt- rnorm(50 0 sigma)

                                  Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                  Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                  Rgt fit lt- lnam(y x w1 w2)

                                  Rgt summary(fit)

                                  Calllnam(y = y x = x W1 = w1 W2 = w2)

                                  ResidualsMin 1Q Median 3Q Max

                                  -052052 -018305 001156 015557 062082

                                  CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                  X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                  Journal of Statistical Software 45

                                  X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                  Estimate Std ErrorSigma 009597 922e-05

                                  Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                  Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                  In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                  3 Closing comments

                                  The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                  Acknowledgments

                                  The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                  46 Social Network Analysis with sna

                                  minus3 minus2 minus1 0 1 2

                                  minus3minus2

                                  minus10

                                  12

                                  Fitted vs Observed Values

                                  y

                                  y

                                  minus3 minus2 minus1 0 1 2

                                  minus02

                                  minus01

                                  00

                                  01

                                  02

                                  Fitted Values vs Estimated Disturbances

                                  y

                                  ν

                                  minus2 minus1 0 1 2

                                  minus04

                                  minus02

                                  00

                                  02

                                  04

                                  06

                                  Normal QminusQ Residual Plot

                                  Theoretical Quantiles

                                  Sam

                                  ple

                                  Qua

                                  ntile

                                  s

                                  Net Influence Plot

                                  Figure 6 Plot method output for lnam

                                  team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                  References

                                  Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                  Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                  Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                  Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                  Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                  Journal of Statistical Software 47

                                  Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                  Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                  Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                  Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                  Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                  Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                  Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                  Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                  Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                  Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                  Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                  Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                  Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                  Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                  Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                  Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                  Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                  48 Social Network Analysis with sna

                                  Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                  Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                  Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                  Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                  Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                  Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                  Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                  Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                  Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                  Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                  Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                  Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                  Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                  Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                  Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                  Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                  Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                  Journal of Statistical Software 49

                                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                  50 Social Network Analysis with sna

                                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                  Journal of Statistical Software 51

                                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                  Affiliation

                                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                  • Introduction and overview
                                    • Package history
                                    • sna and statnet
                                    • Functionality
                                    • Terminology and data representation
                                      • Importing relational data into R
                                          • Package highlights
                                            • Random graph generation
                                              • Example
                                                • Visualization and data manipulation
                                                  • Neighborhood and ego net functions
                                                  • Visualization
                                                    • Descriptive indices
                                                      • Node-level indices
                                                      • Graph-level indices
                                                        • Connectivity and subgraph statistics
                                                          • Example
                                                            • Position and role analysis
                                                              • Example
                                                                • Exploratory edge set comparison
                                                                  • Example
                                                                    • Network inference and process models
                                                                      • Example
                                                                          • Closing comments

                                    18 Social Network Analysis with sna

                                    Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

                                    Rgt gplot3d(rgws(1 5 3 1 02))

                                    Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

                                    As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

                                    Rgt par(mfrow = c(1 3))

                                    Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

                                    + xlab = ylab = main = gplotvertex Example)

                                    Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

                                    + col = 110 sides = 312 radius = 01)

                                    Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

                                    Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

                                    Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

                                    + xlab = ylab = main = gplotloop Example)

                                    Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

                                    + offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

                                    + arrowhead = TRUE)

                                    Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

                                    + 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

                                    The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

                                    23 Descriptive indices

                                    The literature of social network analysis is rich with descriptive indices of various sorts

                                    gplot3d1gif
                                    Media File (imagegif)
                                    gplot3d2gif
                                    Media File (imagegif)
                                    gplot3d3gif
                                    Media File (imagegif)

                                    Journal of Statistical Software 19

                                    minus15 minus10 minus05 00 05 10 15

                                    minus15

                                    minus10

                                    minus05

                                    00

                                    05

                                    10

                                    15

                                    gplotvertex Example

                                    10 12 14 16 18 20

                                    10

                                    12

                                    14

                                    16

                                    18

                                    20

                                    gplotarrow Example

                                    minus2 minus1 0 1 2

                                    minus2minus1

                                    01

                                    2

                                    gplotloop Example

                                    Figure 5 Examples of the use of gplot supplemental functions

                                    all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                                    Node-level indices

                                    Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                                    sum(vprimevprimeprime)subV v

                                    gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                                    G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                                    equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                                    vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                                    20 Social Network Analysis with sna

                                    closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                                    Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                                    An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                                    Journal of Statistical Software 21

                                    the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                    To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                    Rgt dat lt- rgraph(10)

                                    Rgt degree(dat cmode = indegree)

                                    [1] 4 4 8 2 4 5 4 4 3 6

                                    Rgt degree(dat cmode = outdegree)

                                    [1] 6 3 5 2 5 4 4 4 5 6

                                    Rgt degree(dat)

                                    [1] 10 7 13 4 9 9 8 8 8 12

                                    Rgt closeness(dat)

                                    [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                    Rgt betweenness(dat)

                                    [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                    Rgt stresscent(dat)

                                    [1] 21 6 27 1 14 15 6 7 7 21

                                    Rgt graphcent(dat)

                                    [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                    Rgt evcent(dat)

                                    [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                    22 Social Network Analysis with sna

                                    Rgt infocent(dat)

                                    [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                    As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                    Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                    [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                    Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                    + evcent(dat rescale = TRUE)) lt 1e-10)

                                    [1] TRUE

                                    Rgt bonpow(dat exponent = -05)

                                    [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                    As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                    Rgt memb lt- sample(13 10 replace = TRUE)

                                    Rgt summary(brokerage(dat memb))

                                    Gould-Fernandez Brokerage Analysis

                                    Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                    w_I 50000 58638 27314 -03162 07518

                                    Journal of Statistical Software 23

                                    w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                    Individual Properties (by Group)

                                    Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                    [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                    b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                    Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                    [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                    t[1] -07838541[2] 14877951

                                    Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                    [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                    b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                    Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                    24 Social Network Analysis with sna

                                    for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                    Graph-level indices

                                    Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                    C(G) =|V |sumi=1

                                    [(maxvisinV

                                    c (vG))minus c (vi G)

                                    ] (1)

                                    ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                    C(G) = |V | [clowast(G)minus c(G)] (2)

                                    where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                    i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                    2For instance when all vertices are automorphically equivalent

                                    Journal of Statistical Software 25

                                    centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                    although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                    In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                    The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                    Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                    Rgt gden(g)

                                    [1] 006666667 031111111 054444444 072222222 093333333

                                    Rgt grecip(g)

                                    [1] 08666667 03777778 04888889 06666667 08666667

                                    Rgt grecip(g measure = edgewise)

                                    [1] 00000000 00000000 05306122 07692308 09285714

                                    Rgt grecip(g) == 1 - hierarchy(g)

                                    [1] TRUE TRUE TRUE TRUE TRUE

                                    Rgt gtrans(g)

                                    [1] 10000000 02957746 05047619 06809651 09326923

                                    Rgt gtrans(g measure = weakcensus)

                                    3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                    26 Social Network Analysis with sna

                                    [1] 0 21 106 254 582

                                    Rgt connectedness(g)

                                    [1] 04666667 10000000 10000000 10000000 10000000

                                    Rgt efficiency(g)

                                    [1] 100000000 076543210 050617284 030864198 007407407

                                    Rgt hierarchy(g measure = krackhardt)

                                    [1] 10 02 00 00 00

                                    Rgt lubness(g)

                                    [1] 02 10 10 10 10

                                    centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                    Rgt centralization(g degree cmode = outdegree)

                                    [1] 01728395

                                    Rgt centralization(g betweenness)

                                    [1] 0

                                    Rgt apply(g 1 centralization degree cmode = outdegree)

                                    [1] 017283951 027160494 038271605 006172840 007407407

                                    Rgt apply(g 1 centralization betweenness)

                                    [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                    As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                    Journal of Statistical Software 27

                                    Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                    + n lt- NROW(dat)

                                    + if(tmaxdev)

                                    + return((n-1) choose(n-1 2))

                                    + odeg lt- degree(dat cmode = outdegree)

                                    + choose(odeg 2)

                                    +

                                    Rgt apply(g 1 centralization o2scent)

                                    [1] 002160494 020370370 054012346 008950617 014506173

                                    Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                    24 Connectivity and subgraph statistics

                                    Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                    sumNj=1

                                    sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                    is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                    At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                    28 Social Network Analysis with sna

                                    subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                    Example

                                    To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                    Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                    Rgt apply(dyadcensus(g1) 2 mean)

                                    Mut Asym Null100 1284 3116

                                    Rgt apply(triadcensus(g1) 2 mean)

                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                    Journal of Statistical Software 29

                                    120C 210 300030 000 000

                                    Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                    Rgt apply(dyadcensus(g2) 2 mean)

                                    Mut Asym Null884 926 2690

                                    Rgt apply(triadcensus(g2) 2 mean)

                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                    Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                    Rgt apply(dyadcensus(g3) 2 mean)

                                    Mut Asym Null894 2044 1562

                                    Rgt apply(triadcensus(g3) 2 mean)

                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                    Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                    + dyadictabulation = bylength)$pathcount

                                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                    Rgt kcyclecensus(g3[1] maxlen = 5

                                    + cyclecomembership = bylength)$cyclecount

                                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                    30 Social Network Analysis with sna

                                    Rgt componentdist(g3[1])

                                    $membership[1] 1 1 1 1 1 1 1 1 1 1

                                    $csize[1] 10

                                    $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                    Rgt structurestatistics(g3[1])

                                    0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                    In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                    Rgt g4 lt- g1[12]

                                    Rgt g4[2] lt- g2[1]

                                    Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                    + g1 = 1 g2 = 2)

                                    Rgt summary(cug)

                                    CUG Test Results

                                    Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                    Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                    Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                    Rgt summary(cug)

                                    Journal of Statistical Software 31

                                    CUG Test Results

                                    Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                    Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                    A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                    25 Position and role analysis

                                    The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                    In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                    32 Social Network Analysis with sna

                                    This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                    After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                    The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                    Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                    Example

                                    To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                    Journal of Statistical Software 33

                                    with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                    Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                    Rgt g lt- rgraph(20 tprob = gp)

                                    Rgt eq lt- equivclust(g)

                                    Rgt b lt- blockmodel(g eq h = 15)

                                    Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                    Rgt ge

                                    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                    26 Exploratory edge set comparison

                                    One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                    cov(GH) =

                                    sum(ij)

                                    (AG

                                    ij minus microG

                                    )(AH

                                    ij minus microH

                                    )|V | (|V | minus 1)

                                    (3)

                                    34 Social Network Analysis with sna

                                    where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                    (ij)AXij is the graph mean The graph variance is then cov(GG)

                                    and the graph correlation ρ(GH) = cov(GH)radic

                                    cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                    The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                    Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                    In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                    Journal of Statistical Software 35

                                    Example

                                    We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                    Rgt g1 lt- rgraph(5)

                                    Rgt g2 lt -rgraph(5)

                                    Rgt g3 lt- rmperm(g2)

                                    Rgt gcor(g1 g2)

                                    [1] -01336306

                                    Rgt gcor(g1 g3)

                                    [1] 008908708

                                    Rgt gcor(g2 g3)

                                    [1] -04583333

                                    Rgt gscor(g1 g2 reps = 1e5)

                                    [1] 05345225

                                    Rgt gscor(g1 g3 reps = 1e5)

                                    [1] 05345225

                                    Rgt gscor(g2 g3 reps = 1e5)

                                    [1] 1

                                    Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                    Rgt x lt- rgraph(20 4)

                                    Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                    Rgt nl lt- netlm(y x)

                                    Rgt summary(nl)

                                    36 Social Network Analysis with sna

                                    OLS Network Model

                                    Residuals0 25 50 75 100

                                    -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                    CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                    (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                    Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                    Test Diagnostics

                                    Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                    (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                    As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                    Rgt x lt- rgraph(20 4)

                                    Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                    Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                    Rgt y lt- rgraph(20 tprob = yp)

                                    Rgt nl lt- netlogit(y x)

                                    Rgt summary(nl)

                                    Network Logit Model

                                    Coefficients

                                    Journal of Statistical Software 37

                                    Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                    Goodness of Fit Statistics

                                    Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                    3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                    (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                    Contingency Table (predicted (rows) x actual (cols))

                                    0 10 0 01 39 341

                                    Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                    Test Diagnostics

                                    Null Hypothesis qapReplications 1000Distribution Summary

                                    (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                    It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                    38 Social Network Analysis with sna

                                    parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                    27 Network inference and process models

                                    A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                    Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                    Journal of Statistical Software 39

                                    of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                    )prodk

                                    (1minusPr(Bk)

                                    )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                    While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                    y =

                                    (wsum

                                    i=1

                                    θiWi

                                    )y + Xβ + ε (4)

                                    ε =

                                    (zsum

                                    i=1

                                    ψiZi

                                    )ε+ ν (5)

                                    where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                    40 Social Network Analysis with sna

                                    Example

                                    To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                    Rgt g lt- rgraph(20)

                                    Rgt ep lt- rbeta(20 1 25)

                                    Rgt em lt- rbeta(20 15 25)

                                    Rgt dat lt- array(dim = c(20 20 20))

                                    Rgt for(i in 120)

                                    + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                    Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                    Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                    Rgt pem[1] lt- 2

                                    Rgt pem[2] lt- 11

                                    Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                    Rgt pep[1] lt- 2

                                    Rgt pep[2] lt- 11

                                    Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                    + epprior = pep burntime = 300 draws = 100)

                                    Rgt summary(b)

                                    Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                    Multiple Error Probability Model

                                    Marginal Posterior Network Distribution

                                    a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                    Journal of Statistical Software 41

                                    a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                    a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                    Marginal Posterior Global Error Distribution

                                    e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                    Marginal Posterior Error Distribution (by observer)

                                    Probability of False Negatives (e^-)

                                    Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                    42 Social Network Analysis with sna

                                    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                    Probability of False Positives (e^+)

                                    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                    MCMC Diagnostics

                                    Replicate Chains 5Burn Time 300

                                    Journal of Statistical Software 43

                                    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                    Max 1003116Med 09992194IQR 00004545115

                                    Rgt cor(em apply(b$em 2 median))

                                    [1] 09187894

                                    Rgt cor(ep apply(b$ep 2 median))

                                    [1] 0971649

                                    Rgt mean(apply(b$net c(2 3) median) == g)

                                    [1] 1

                                    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                    Rgt mean(consensus(dat method = LASintersection) == g)

                                    [1] 07725

                                    Rgt mean(consensus(dat method = LASunion) == g)

                                    [1] 0905

                                    Rgt mean(consensus(dat method = centralgraph) == g)

                                    [1] 09575

                                    Rgt mean(consensus(dat method = romneybatchelder) == g)

                                    44 Social Network Analysis with sna

                                    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                    Rgt w1 lt- rgraph(50)

                                    Rgt w2 lt- rgraph(50)

                                    Rgt x lt- matrix(rnorm(50 5) 50 5)

                                    Rgt r1 lt- 02

                                    Rgt r2 lt- 03

                                    Rgt sigma lt- 01

                                    Rgt beta lt- rnorm(5)

                                    Rgt nu lt- rnorm(50 0 sigma)

                                    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                    Rgt fit lt- lnam(y x w1 w2)

                                    Rgt summary(fit)

                                    Calllnam(y = y x = x W1 = w1 W2 = w2)

                                    ResidualsMin 1Q Median 3Q Max

                                    -052052 -018305 001156 015557 062082

                                    CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                    Journal of Statistical Software 45

                                    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                    Estimate Std ErrorSigma 009597 922e-05

                                    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                    3 Closing comments

                                    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                    Acknowledgments

                                    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                    46 Social Network Analysis with sna

                                    minus3 minus2 minus1 0 1 2

                                    minus3minus2

                                    minus10

                                    12

                                    Fitted vs Observed Values

                                    y

                                    y

                                    minus3 minus2 minus1 0 1 2

                                    minus02

                                    minus01

                                    00

                                    01

                                    02

                                    Fitted Values vs Estimated Disturbances

                                    y

                                    ν

                                    minus2 minus1 0 1 2

                                    minus04

                                    minus02

                                    00

                                    02

                                    04

                                    06

                                    Normal QminusQ Residual Plot

                                    Theoretical Quantiles

                                    Sam

                                    ple

                                    Qua

                                    ntile

                                    s

                                    Net Influence Plot

                                    Figure 6 Plot method output for lnam

                                    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                    References

                                    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                    Journal of Statistical Software 47

                                    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                    48 Social Network Analysis with sna

                                    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                    Journal of Statistical Software 49

                                    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                    50 Social Network Analysis with sna

                                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                    Journal of Statistical Software 51

                                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                    Affiliation

                                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                    • Introduction and overview
                                      • Package history
                                      • sna and statnet
                                      • Functionality
                                      • Terminology and data representation
                                        • Importing relational data into R
                                            • Package highlights
                                              • Random graph generation
                                                • Example
                                                  • Visualization and data manipulation
                                                    • Neighborhood and ego net functions
                                                    • Visualization
                                                      • Descriptive indices
                                                        • Node-level indices
                                                        • Graph-level indices
                                                          • Connectivity and subgraph statistics
                                                            • Example
                                                              • Position and role analysis
                                                                • Example
                                                                  • Exploratory edge set comparison
                                                                    • Example
                                                                      • Network inference and process models
                                                                        • Example
                                                                            • Closing comments

                                      Journal of Statistical Software 19

                                      minus15 minus10 minus05 00 05 10 15

                                      minus15

                                      minus10

                                      minus05

                                      00

                                      05

                                      10

                                      15

                                      gplotvertex Example

                                      10 12 14 16 18 20

                                      10

                                      12

                                      14

                                      16

                                      18

                                      20

                                      gplotarrow Example

                                      minus2 minus1 0 1 2

                                      minus2minus1

                                      01

                                      2

                                      gplotloop Example

                                      Figure 5 Examples of the use of gplot supplemental functions

                                      all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

                                      Node-level indices

                                      Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

                                      sum(vprimevprimeprime)subV v

                                      gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

                                      G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

                                      equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

                                      vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

                                      20 Social Network Analysis with sna

                                      closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                                      Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                                      An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                                      Journal of Statistical Software 21

                                      the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                      To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                      Rgt dat lt- rgraph(10)

                                      Rgt degree(dat cmode = indegree)

                                      [1] 4 4 8 2 4 5 4 4 3 6

                                      Rgt degree(dat cmode = outdegree)

                                      [1] 6 3 5 2 5 4 4 4 5 6

                                      Rgt degree(dat)

                                      [1] 10 7 13 4 9 9 8 8 8 12

                                      Rgt closeness(dat)

                                      [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                      Rgt betweenness(dat)

                                      [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                      Rgt stresscent(dat)

                                      [1] 21 6 27 1 14 15 6 7 7 21

                                      Rgt graphcent(dat)

                                      [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                      Rgt evcent(dat)

                                      [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                      22 Social Network Analysis with sna

                                      Rgt infocent(dat)

                                      [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                      As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                      Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                      [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                      Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                      + evcent(dat rescale = TRUE)) lt 1e-10)

                                      [1] TRUE

                                      Rgt bonpow(dat exponent = -05)

                                      [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                      As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                      Rgt memb lt- sample(13 10 replace = TRUE)

                                      Rgt summary(brokerage(dat memb))

                                      Gould-Fernandez Brokerage Analysis

                                      Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                      w_I 50000 58638 27314 -03162 07518

                                      Journal of Statistical Software 23

                                      w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                      Individual Properties (by Group)

                                      Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                      [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                      b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                      Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                      [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                      t[1] -07838541[2] 14877951

                                      Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                      [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                      b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                      Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                      24 Social Network Analysis with sna

                                      for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                      Graph-level indices

                                      Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                      C(G) =|V |sumi=1

                                      [(maxvisinV

                                      c (vG))minus c (vi G)

                                      ] (1)

                                      ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                      C(G) = |V | [clowast(G)minus c(G)] (2)

                                      where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                      i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                      2For instance when all vertices are automorphically equivalent

                                      Journal of Statistical Software 25

                                      centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                      although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                      In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                      The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                      Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                      Rgt gden(g)

                                      [1] 006666667 031111111 054444444 072222222 093333333

                                      Rgt grecip(g)

                                      [1] 08666667 03777778 04888889 06666667 08666667

                                      Rgt grecip(g measure = edgewise)

                                      [1] 00000000 00000000 05306122 07692308 09285714

                                      Rgt grecip(g) == 1 - hierarchy(g)

                                      [1] TRUE TRUE TRUE TRUE TRUE

                                      Rgt gtrans(g)

                                      [1] 10000000 02957746 05047619 06809651 09326923

                                      Rgt gtrans(g measure = weakcensus)

                                      3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                      26 Social Network Analysis with sna

                                      [1] 0 21 106 254 582

                                      Rgt connectedness(g)

                                      [1] 04666667 10000000 10000000 10000000 10000000

                                      Rgt efficiency(g)

                                      [1] 100000000 076543210 050617284 030864198 007407407

                                      Rgt hierarchy(g measure = krackhardt)

                                      [1] 10 02 00 00 00

                                      Rgt lubness(g)

                                      [1] 02 10 10 10 10

                                      centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                      Rgt centralization(g degree cmode = outdegree)

                                      [1] 01728395

                                      Rgt centralization(g betweenness)

                                      [1] 0

                                      Rgt apply(g 1 centralization degree cmode = outdegree)

                                      [1] 017283951 027160494 038271605 006172840 007407407

                                      Rgt apply(g 1 centralization betweenness)

                                      [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                      As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                      Journal of Statistical Software 27

                                      Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                      + n lt- NROW(dat)

                                      + if(tmaxdev)

                                      + return((n-1) choose(n-1 2))

                                      + odeg lt- degree(dat cmode = outdegree)

                                      + choose(odeg 2)

                                      +

                                      Rgt apply(g 1 centralization o2scent)

                                      [1] 002160494 020370370 054012346 008950617 014506173

                                      Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                      24 Connectivity and subgraph statistics

                                      Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                      sumNj=1

                                      sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                      is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                      At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                      28 Social Network Analysis with sna

                                      subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                      Example

                                      To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                      Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                      Rgt apply(dyadcensus(g1) 2 mean)

                                      Mut Asym Null100 1284 3116

                                      Rgt apply(triadcensus(g1) 2 mean)

                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                      Journal of Statistical Software 29

                                      120C 210 300030 000 000

                                      Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                      Rgt apply(dyadcensus(g2) 2 mean)

                                      Mut Asym Null884 926 2690

                                      Rgt apply(triadcensus(g2) 2 mean)

                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                      Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                      Rgt apply(dyadcensus(g3) 2 mean)

                                      Mut Asym Null894 2044 1562

                                      Rgt apply(triadcensus(g3) 2 mean)

                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                      Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                      + dyadictabulation = bylength)$pathcount

                                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                      Rgt kcyclecensus(g3[1] maxlen = 5

                                      + cyclecomembership = bylength)$cyclecount

                                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                      30 Social Network Analysis with sna

                                      Rgt componentdist(g3[1])

                                      $membership[1] 1 1 1 1 1 1 1 1 1 1

                                      $csize[1] 10

                                      $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                      Rgt structurestatistics(g3[1])

                                      0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                      In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                      Rgt g4 lt- g1[12]

                                      Rgt g4[2] lt- g2[1]

                                      Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                      + g1 = 1 g2 = 2)

                                      Rgt summary(cug)

                                      CUG Test Results

                                      Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                      Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                      Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                      Rgt summary(cug)

                                      Journal of Statistical Software 31

                                      CUG Test Results

                                      Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                      Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                      A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                      25 Position and role analysis

                                      The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                      In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                      32 Social Network Analysis with sna

                                      This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                      After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                      The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                      Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                      Example

                                      To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                      Journal of Statistical Software 33

                                      with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                      Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                      Rgt g lt- rgraph(20 tprob = gp)

                                      Rgt eq lt- equivclust(g)

                                      Rgt b lt- blockmodel(g eq h = 15)

                                      Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                      Rgt ge

                                      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                      26 Exploratory edge set comparison

                                      One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                      cov(GH) =

                                      sum(ij)

                                      (AG

                                      ij minus microG

                                      )(AH

                                      ij minus microH

                                      )|V | (|V | minus 1)

                                      (3)

                                      34 Social Network Analysis with sna

                                      where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                      (ij)AXij is the graph mean The graph variance is then cov(GG)

                                      and the graph correlation ρ(GH) = cov(GH)radic

                                      cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                      The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                      Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                      In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                      Journal of Statistical Software 35

                                      Example

                                      We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                      Rgt g1 lt- rgraph(5)

                                      Rgt g2 lt -rgraph(5)

                                      Rgt g3 lt- rmperm(g2)

                                      Rgt gcor(g1 g2)

                                      [1] -01336306

                                      Rgt gcor(g1 g3)

                                      [1] 008908708

                                      Rgt gcor(g2 g3)

                                      [1] -04583333

                                      Rgt gscor(g1 g2 reps = 1e5)

                                      [1] 05345225

                                      Rgt gscor(g1 g3 reps = 1e5)

                                      [1] 05345225

                                      Rgt gscor(g2 g3 reps = 1e5)

                                      [1] 1

                                      Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                      Rgt x lt- rgraph(20 4)

                                      Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                      Rgt nl lt- netlm(y x)

                                      Rgt summary(nl)

                                      36 Social Network Analysis with sna

                                      OLS Network Model

                                      Residuals0 25 50 75 100

                                      -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                      CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                      (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                      Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                      Test Diagnostics

                                      Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                      (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                      As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                      Rgt x lt- rgraph(20 4)

                                      Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                      Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                      Rgt y lt- rgraph(20 tprob = yp)

                                      Rgt nl lt- netlogit(y x)

                                      Rgt summary(nl)

                                      Network Logit Model

                                      Coefficients

                                      Journal of Statistical Software 37

                                      Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                      Goodness of Fit Statistics

                                      Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                      3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                      (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                      Contingency Table (predicted (rows) x actual (cols))

                                      0 10 0 01 39 341

                                      Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                      Test Diagnostics

                                      Null Hypothesis qapReplications 1000Distribution Summary

                                      (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                      It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                      38 Social Network Analysis with sna

                                      parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                      27 Network inference and process models

                                      A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                      Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                      Journal of Statistical Software 39

                                      of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                      )prodk

                                      (1minusPr(Bk)

                                      )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                      While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                      y =

                                      (wsum

                                      i=1

                                      θiWi

                                      )y + Xβ + ε (4)

                                      ε =

                                      (zsum

                                      i=1

                                      ψiZi

                                      )ε+ ν (5)

                                      where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                      40 Social Network Analysis with sna

                                      Example

                                      To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                      Rgt g lt- rgraph(20)

                                      Rgt ep lt- rbeta(20 1 25)

                                      Rgt em lt- rbeta(20 15 25)

                                      Rgt dat lt- array(dim = c(20 20 20))

                                      Rgt for(i in 120)

                                      + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                      Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                      Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                      Rgt pem[1] lt- 2

                                      Rgt pem[2] lt- 11

                                      Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                      Rgt pep[1] lt- 2

                                      Rgt pep[2] lt- 11

                                      Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                      + epprior = pep burntime = 300 draws = 100)

                                      Rgt summary(b)

                                      Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                      Multiple Error Probability Model

                                      Marginal Posterior Network Distribution

                                      a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                      Journal of Statistical Software 41

                                      a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                      a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                      Marginal Posterior Global Error Distribution

                                      e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                      Marginal Posterior Error Distribution (by observer)

                                      Probability of False Negatives (e^-)

                                      Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                      42 Social Network Analysis with sna

                                      o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                      Probability of False Positives (e^+)

                                      Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                      MCMC Diagnostics

                                      Replicate Chains 5Burn Time 300

                                      Journal of Statistical Software 43

                                      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                      Max 1003116Med 09992194IQR 00004545115

                                      Rgt cor(em apply(b$em 2 median))

                                      [1] 09187894

                                      Rgt cor(ep apply(b$ep 2 median))

                                      [1] 0971649

                                      Rgt mean(apply(b$net c(2 3) median) == g)

                                      [1] 1

                                      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                      Rgt mean(consensus(dat method = LASintersection) == g)

                                      [1] 07725

                                      Rgt mean(consensus(dat method = LASunion) == g)

                                      [1] 0905

                                      Rgt mean(consensus(dat method = centralgraph) == g)

                                      [1] 09575

                                      Rgt mean(consensus(dat method = romneybatchelder) == g)

                                      44 Social Network Analysis with sna

                                      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                      Rgt w1 lt- rgraph(50)

                                      Rgt w2 lt- rgraph(50)

                                      Rgt x lt- matrix(rnorm(50 5) 50 5)

                                      Rgt r1 lt- 02

                                      Rgt r2 lt- 03

                                      Rgt sigma lt- 01

                                      Rgt beta lt- rnorm(5)

                                      Rgt nu lt- rnorm(50 0 sigma)

                                      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                      Rgt fit lt- lnam(y x w1 w2)

                                      Rgt summary(fit)

                                      Calllnam(y = y x = x W1 = w1 W2 = w2)

                                      ResidualsMin 1Q Median 3Q Max

                                      -052052 -018305 001156 015557 062082

                                      CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                      Journal of Statistical Software 45

                                      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                      Estimate Std ErrorSigma 009597 922e-05

                                      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                      3 Closing comments

                                      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                      Acknowledgments

                                      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                      46 Social Network Analysis with sna

                                      minus3 minus2 minus1 0 1 2

                                      minus3minus2

                                      minus10

                                      12

                                      Fitted vs Observed Values

                                      y

                                      y

                                      minus3 minus2 minus1 0 1 2

                                      minus02

                                      minus01

                                      00

                                      01

                                      02

                                      Fitted Values vs Estimated Disturbances

                                      y

                                      ν

                                      minus2 minus1 0 1 2

                                      minus04

                                      minus02

                                      00

                                      02

                                      04

                                      06

                                      Normal QminusQ Residual Plot

                                      Theoretical Quantiles

                                      Sam

                                      ple

                                      Qua

                                      ntile

                                      s

                                      Net Influence Plot

                                      Figure 6 Plot method output for lnam

                                      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                      References

                                      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                      Journal of Statistical Software 47

                                      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                      48 Social Network Analysis with sna

                                      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                      Journal of Statistical Software 49

                                      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                      50 Social Network Analysis with sna

                                      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                      Journal of Statistical Software 51

                                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                      Affiliation

                                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                      • Introduction and overview
                                        • Package history
                                        • sna and statnet
                                        • Functionality
                                        • Terminology and data representation
                                          • Importing relational data into R
                                              • Package highlights
                                                • Random graph generation
                                                  • Example
                                                    • Visualization and data manipulation
                                                      • Neighborhood and ego net functions
                                                      • Visualization
                                                        • Descriptive indices
                                                          • Node-level indices
                                                          • Graph-level indices
                                                            • Connectivity and subgraph statistics
                                                              • Example
                                                                • Position and role analysis
                                                                  • Example
                                                                    • Exploratory edge set comparison
                                                                      • Example
                                                                        • Network inference and process models
                                                                          • Example
                                                                              • Closing comments

                                        20 Social Network Analysis with sna

                                        closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

                                        Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

                                        An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

                                        Journal of Statistical Software 21

                                        the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                        To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                        Rgt dat lt- rgraph(10)

                                        Rgt degree(dat cmode = indegree)

                                        [1] 4 4 8 2 4 5 4 4 3 6

                                        Rgt degree(dat cmode = outdegree)

                                        [1] 6 3 5 2 5 4 4 4 5 6

                                        Rgt degree(dat)

                                        [1] 10 7 13 4 9 9 8 8 8 12

                                        Rgt closeness(dat)

                                        [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                        Rgt betweenness(dat)

                                        [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                        Rgt stresscent(dat)

                                        [1] 21 6 27 1 14 15 6 7 7 21

                                        Rgt graphcent(dat)

                                        [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                        Rgt evcent(dat)

                                        [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                        22 Social Network Analysis with sna

                                        Rgt infocent(dat)

                                        [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                        As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                        Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                        [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                        Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                        + evcent(dat rescale = TRUE)) lt 1e-10)

                                        [1] TRUE

                                        Rgt bonpow(dat exponent = -05)

                                        [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                        As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                        Rgt memb lt- sample(13 10 replace = TRUE)

                                        Rgt summary(brokerage(dat memb))

                                        Gould-Fernandez Brokerage Analysis

                                        Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                        w_I 50000 58638 27314 -03162 07518

                                        Journal of Statistical Software 23

                                        w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                        Individual Properties (by Group)

                                        Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                        [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                        b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                        Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                        [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                        t[1] -07838541[2] 14877951

                                        Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                        [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                        b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                        Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                        24 Social Network Analysis with sna

                                        for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                        Graph-level indices

                                        Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                        C(G) =|V |sumi=1

                                        [(maxvisinV

                                        c (vG))minus c (vi G)

                                        ] (1)

                                        ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                        C(G) = |V | [clowast(G)minus c(G)] (2)

                                        where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                        i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                        2For instance when all vertices are automorphically equivalent

                                        Journal of Statistical Software 25

                                        centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                        although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                        In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                        The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                        Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                        Rgt gden(g)

                                        [1] 006666667 031111111 054444444 072222222 093333333

                                        Rgt grecip(g)

                                        [1] 08666667 03777778 04888889 06666667 08666667

                                        Rgt grecip(g measure = edgewise)

                                        [1] 00000000 00000000 05306122 07692308 09285714

                                        Rgt grecip(g) == 1 - hierarchy(g)

                                        [1] TRUE TRUE TRUE TRUE TRUE

                                        Rgt gtrans(g)

                                        [1] 10000000 02957746 05047619 06809651 09326923

                                        Rgt gtrans(g measure = weakcensus)

                                        3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                        26 Social Network Analysis with sna

                                        [1] 0 21 106 254 582

                                        Rgt connectedness(g)

                                        [1] 04666667 10000000 10000000 10000000 10000000

                                        Rgt efficiency(g)

                                        [1] 100000000 076543210 050617284 030864198 007407407

                                        Rgt hierarchy(g measure = krackhardt)

                                        [1] 10 02 00 00 00

                                        Rgt lubness(g)

                                        [1] 02 10 10 10 10

                                        centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                        Rgt centralization(g degree cmode = outdegree)

                                        [1] 01728395

                                        Rgt centralization(g betweenness)

                                        [1] 0

                                        Rgt apply(g 1 centralization degree cmode = outdegree)

                                        [1] 017283951 027160494 038271605 006172840 007407407

                                        Rgt apply(g 1 centralization betweenness)

                                        [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                        As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                        Journal of Statistical Software 27

                                        Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                        + n lt- NROW(dat)

                                        + if(tmaxdev)

                                        + return((n-1) choose(n-1 2))

                                        + odeg lt- degree(dat cmode = outdegree)

                                        + choose(odeg 2)

                                        +

                                        Rgt apply(g 1 centralization o2scent)

                                        [1] 002160494 020370370 054012346 008950617 014506173

                                        Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                        24 Connectivity and subgraph statistics

                                        Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                        sumNj=1

                                        sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                        is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                        At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                        28 Social Network Analysis with sna

                                        subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                        Example

                                        To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                        Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                        Rgt apply(dyadcensus(g1) 2 mean)

                                        Mut Asym Null100 1284 3116

                                        Rgt apply(triadcensus(g1) 2 mean)

                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                        Journal of Statistical Software 29

                                        120C 210 300030 000 000

                                        Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                        Rgt apply(dyadcensus(g2) 2 mean)

                                        Mut Asym Null884 926 2690

                                        Rgt apply(triadcensus(g2) 2 mean)

                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                        Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                        Rgt apply(dyadcensus(g3) 2 mean)

                                        Mut Asym Null894 2044 1562

                                        Rgt apply(triadcensus(g3) 2 mean)

                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                        Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                        + dyadictabulation = bylength)$pathcount

                                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                        Rgt kcyclecensus(g3[1] maxlen = 5

                                        + cyclecomembership = bylength)$cyclecount

                                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                        30 Social Network Analysis with sna

                                        Rgt componentdist(g3[1])

                                        $membership[1] 1 1 1 1 1 1 1 1 1 1

                                        $csize[1] 10

                                        $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                        Rgt structurestatistics(g3[1])

                                        0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                        In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                        Rgt g4 lt- g1[12]

                                        Rgt g4[2] lt- g2[1]

                                        Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                        + g1 = 1 g2 = 2)

                                        Rgt summary(cug)

                                        CUG Test Results

                                        Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                        Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                        Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                        Rgt summary(cug)

                                        Journal of Statistical Software 31

                                        CUG Test Results

                                        Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                        Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                        A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                        25 Position and role analysis

                                        The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                        In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                        32 Social Network Analysis with sna

                                        This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                        After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                        The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                        Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                        Example

                                        To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                        Journal of Statistical Software 33

                                        with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                        Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                        Rgt g lt- rgraph(20 tprob = gp)

                                        Rgt eq lt- equivclust(g)

                                        Rgt b lt- blockmodel(g eq h = 15)

                                        Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                        Rgt ge

                                        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                        26 Exploratory edge set comparison

                                        One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                        cov(GH) =

                                        sum(ij)

                                        (AG

                                        ij minus microG

                                        )(AH

                                        ij minus microH

                                        )|V | (|V | minus 1)

                                        (3)

                                        34 Social Network Analysis with sna

                                        where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                        (ij)AXij is the graph mean The graph variance is then cov(GG)

                                        and the graph correlation ρ(GH) = cov(GH)radic

                                        cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                        The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                        Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                        In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                        Journal of Statistical Software 35

                                        Example

                                        We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                        Rgt g1 lt- rgraph(5)

                                        Rgt g2 lt -rgraph(5)

                                        Rgt g3 lt- rmperm(g2)

                                        Rgt gcor(g1 g2)

                                        [1] -01336306

                                        Rgt gcor(g1 g3)

                                        [1] 008908708

                                        Rgt gcor(g2 g3)

                                        [1] -04583333

                                        Rgt gscor(g1 g2 reps = 1e5)

                                        [1] 05345225

                                        Rgt gscor(g1 g3 reps = 1e5)

                                        [1] 05345225

                                        Rgt gscor(g2 g3 reps = 1e5)

                                        [1] 1

                                        Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                        Rgt x lt- rgraph(20 4)

                                        Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                        Rgt nl lt- netlm(y x)

                                        Rgt summary(nl)

                                        36 Social Network Analysis with sna

                                        OLS Network Model

                                        Residuals0 25 50 75 100

                                        -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                        CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                        (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                        Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                        Test Diagnostics

                                        Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                        (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                        As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                        Rgt x lt- rgraph(20 4)

                                        Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                        Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                        Rgt y lt- rgraph(20 tprob = yp)

                                        Rgt nl lt- netlogit(y x)

                                        Rgt summary(nl)

                                        Network Logit Model

                                        Coefficients

                                        Journal of Statistical Software 37

                                        Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                        Goodness of Fit Statistics

                                        Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                        3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                        (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                        Contingency Table (predicted (rows) x actual (cols))

                                        0 10 0 01 39 341

                                        Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                        Test Diagnostics

                                        Null Hypothesis qapReplications 1000Distribution Summary

                                        (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                        It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                        38 Social Network Analysis with sna

                                        parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                        27 Network inference and process models

                                        A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                        Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                        Journal of Statistical Software 39

                                        of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                        )prodk

                                        (1minusPr(Bk)

                                        )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                        While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                        y =

                                        (wsum

                                        i=1

                                        θiWi

                                        )y + Xβ + ε (4)

                                        ε =

                                        (zsum

                                        i=1

                                        ψiZi

                                        )ε+ ν (5)

                                        where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                        40 Social Network Analysis with sna

                                        Example

                                        To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                        Rgt g lt- rgraph(20)

                                        Rgt ep lt- rbeta(20 1 25)

                                        Rgt em lt- rbeta(20 15 25)

                                        Rgt dat lt- array(dim = c(20 20 20))

                                        Rgt for(i in 120)

                                        + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                        Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                        Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                        Rgt pem[1] lt- 2

                                        Rgt pem[2] lt- 11

                                        Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                        Rgt pep[1] lt- 2

                                        Rgt pep[2] lt- 11

                                        Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                        + epprior = pep burntime = 300 draws = 100)

                                        Rgt summary(b)

                                        Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                        Multiple Error Probability Model

                                        Marginal Posterior Network Distribution

                                        a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                        Journal of Statistical Software 41

                                        a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                        a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                        Marginal Posterior Global Error Distribution

                                        e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                        Marginal Posterior Error Distribution (by observer)

                                        Probability of False Negatives (e^-)

                                        Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                        42 Social Network Analysis with sna

                                        o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                        Probability of False Positives (e^+)

                                        Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                        MCMC Diagnostics

                                        Replicate Chains 5Burn Time 300

                                        Journal of Statistical Software 43

                                        Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                        Max 1003116Med 09992194IQR 00004545115

                                        Rgt cor(em apply(b$em 2 median))

                                        [1] 09187894

                                        Rgt cor(ep apply(b$ep 2 median))

                                        [1] 0971649

                                        Rgt mean(apply(b$net c(2 3) median) == g)

                                        [1] 1

                                        Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                        Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                        Rgt mean(consensus(dat method = LASintersection) == g)

                                        [1] 07725

                                        Rgt mean(consensus(dat method = LASunion) == g)

                                        [1] 0905

                                        Rgt mean(consensus(dat method = centralgraph) == g)

                                        [1] 09575

                                        Rgt mean(consensus(dat method = romneybatchelder) == g)

                                        44 Social Network Analysis with sna

                                        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                        Rgt w1 lt- rgraph(50)

                                        Rgt w2 lt- rgraph(50)

                                        Rgt x lt- matrix(rnorm(50 5) 50 5)

                                        Rgt r1 lt- 02

                                        Rgt r2 lt- 03

                                        Rgt sigma lt- 01

                                        Rgt beta lt- rnorm(5)

                                        Rgt nu lt- rnorm(50 0 sigma)

                                        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                        Rgt fit lt- lnam(y x w1 w2)

                                        Rgt summary(fit)

                                        Calllnam(y = y x = x W1 = w1 W2 = w2)

                                        ResidualsMin 1Q Median 3Q Max

                                        -052052 -018305 001156 015557 062082

                                        CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                        Journal of Statistical Software 45

                                        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                        Estimate Std ErrorSigma 009597 922e-05

                                        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                        3 Closing comments

                                        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                        Acknowledgments

                                        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                        46 Social Network Analysis with sna

                                        minus3 minus2 minus1 0 1 2

                                        minus3minus2

                                        minus10

                                        12

                                        Fitted vs Observed Values

                                        y

                                        y

                                        minus3 minus2 minus1 0 1 2

                                        minus02

                                        minus01

                                        00

                                        01

                                        02

                                        Fitted Values vs Estimated Disturbances

                                        y

                                        ν

                                        minus2 minus1 0 1 2

                                        minus04

                                        minus02

                                        00

                                        02

                                        04

                                        06

                                        Normal QminusQ Residual Plot

                                        Theoretical Quantiles

                                        Sam

                                        ple

                                        Qua

                                        ntile

                                        s

                                        Net Influence Plot

                                        Figure 6 Plot method output for lnam

                                        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                        References

                                        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                        Journal of Statistical Software 47

                                        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                        48 Social Network Analysis with sna

                                        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                        Journal of Statistical Software 49

                                        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                        50 Social Network Analysis with sna

                                        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                        Journal of Statistical Software 51

                                        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                        Affiliation

                                        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                        • Introduction and overview
                                          • Package history
                                          • sna and statnet
                                          • Functionality
                                          • Terminology and data representation
                                            • Importing relational data into R
                                                • Package highlights
                                                  • Random graph generation
                                                    • Example
                                                      • Visualization and data manipulation
                                                        • Neighborhood and ego net functions
                                                        • Visualization
                                                          • Descriptive indices
                                                            • Node-level indices
                                                            • Graph-level indices
                                                              • Connectivity and subgraph statistics
                                                                • Example
                                                                  • Position and role analysis
                                                                    • Example
                                                                      • Exploratory edge set comparison
                                                                        • Example
                                                                          • Network inference and process models
                                                                            • Example
                                                                                • Closing comments

                                          Journal of Statistical Software 21

                                          the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

                                          To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

                                          Rgt dat lt- rgraph(10)

                                          Rgt degree(dat cmode = indegree)

                                          [1] 4 4 8 2 4 5 4 4 3 6

                                          Rgt degree(dat cmode = outdegree)

                                          [1] 6 3 5 2 5 4 4 4 5 6

                                          Rgt degree(dat)

                                          [1] 10 7 13 4 9 9 8 8 8 12

                                          Rgt closeness(dat)

                                          [1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

                                          Rgt betweenness(dat)

                                          [1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

                                          Rgt stresscent(dat)

                                          [1] 21 6 27 1 14 15 6 7 7 21

                                          Rgt graphcent(dat)

                                          [1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

                                          Rgt evcent(dat)

                                          [1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

                                          22 Social Network Analysis with sna

                                          Rgt infocent(dat)

                                          [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                          As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                          Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                          [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                          Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                          + evcent(dat rescale = TRUE)) lt 1e-10)

                                          [1] TRUE

                                          Rgt bonpow(dat exponent = -05)

                                          [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                          As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                          Rgt memb lt- sample(13 10 replace = TRUE)

                                          Rgt summary(brokerage(dat memb))

                                          Gould-Fernandez Brokerage Analysis

                                          Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                          w_I 50000 58638 27314 -03162 07518

                                          Journal of Statistical Software 23

                                          w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                          Individual Properties (by Group)

                                          Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                          [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                          b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                          Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                          [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                          t[1] -07838541[2] 14877951

                                          Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                          [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                          b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                          Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                          24 Social Network Analysis with sna

                                          for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                          Graph-level indices

                                          Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                          C(G) =|V |sumi=1

                                          [(maxvisinV

                                          c (vG))minus c (vi G)

                                          ] (1)

                                          ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                          C(G) = |V | [clowast(G)minus c(G)] (2)

                                          where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                          i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                          2For instance when all vertices are automorphically equivalent

                                          Journal of Statistical Software 25

                                          centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                          although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                          In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                          The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                          Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                          Rgt gden(g)

                                          [1] 006666667 031111111 054444444 072222222 093333333

                                          Rgt grecip(g)

                                          [1] 08666667 03777778 04888889 06666667 08666667

                                          Rgt grecip(g measure = edgewise)

                                          [1] 00000000 00000000 05306122 07692308 09285714

                                          Rgt grecip(g) == 1 - hierarchy(g)

                                          [1] TRUE TRUE TRUE TRUE TRUE

                                          Rgt gtrans(g)

                                          [1] 10000000 02957746 05047619 06809651 09326923

                                          Rgt gtrans(g measure = weakcensus)

                                          3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                          26 Social Network Analysis with sna

                                          [1] 0 21 106 254 582

                                          Rgt connectedness(g)

                                          [1] 04666667 10000000 10000000 10000000 10000000

                                          Rgt efficiency(g)

                                          [1] 100000000 076543210 050617284 030864198 007407407

                                          Rgt hierarchy(g measure = krackhardt)

                                          [1] 10 02 00 00 00

                                          Rgt lubness(g)

                                          [1] 02 10 10 10 10

                                          centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                          Rgt centralization(g degree cmode = outdegree)

                                          [1] 01728395

                                          Rgt centralization(g betweenness)

                                          [1] 0

                                          Rgt apply(g 1 centralization degree cmode = outdegree)

                                          [1] 017283951 027160494 038271605 006172840 007407407

                                          Rgt apply(g 1 centralization betweenness)

                                          [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                          As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                          Journal of Statistical Software 27

                                          Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                          + n lt- NROW(dat)

                                          + if(tmaxdev)

                                          + return((n-1) choose(n-1 2))

                                          + odeg lt- degree(dat cmode = outdegree)

                                          + choose(odeg 2)

                                          +

                                          Rgt apply(g 1 centralization o2scent)

                                          [1] 002160494 020370370 054012346 008950617 014506173

                                          Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                          24 Connectivity and subgraph statistics

                                          Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                          sumNj=1

                                          sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                          is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                          At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                          28 Social Network Analysis with sna

                                          subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                          Example

                                          To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                          Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                          Rgt apply(dyadcensus(g1) 2 mean)

                                          Mut Asym Null100 1284 3116

                                          Rgt apply(triadcensus(g1) 2 mean)

                                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                          Journal of Statistical Software 29

                                          120C 210 300030 000 000

                                          Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                          Rgt apply(dyadcensus(g2) 2 mean)

                                          Mut Asym Null884 926 2690

                                          Rgt apply(triadcensus(g2) 2 mean)

                                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                          Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                          Rgt apply(dyadcensus(g3) 2 mean)

                                          Mut Asym Null894 2044 1562

                                          Rgt apply(triadcensus(g3) 2 mean)

                                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                          Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                          + dyadictabulation = bylength)$pathcount

                                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                          Rgt kcyclecensus(g3[1] maxlen = 5

                                          + cyclecomembership = bylength)$cyclecount

                                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                          30 Social Network Analysis with sna

                                          Rgt componentdist(g3[1])

                                          $membership[1] 1 1 1 1 1 1 1 1 1 1

                                          $csize[1] 10

                                          $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                          Rgt structurestatistics(g3[1])

                                          0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                          In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                          Rgt g4 lt- g1[12]

                                          Rgt g4[2] lt- g2[1]

                                          Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                          + g1 = 1 g2 = 2)

                                          Rgt summary(cug)

                                          CUG Test Results

                                          Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                          Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                          Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                          Rgt summary(cug)

                                          Journal of Statistical Software 31

                                          CUG Test Results

                                          Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                          Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                          A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                          25 Position and role analysis

                                          The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                          In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                          32 Social Network Analysis with sna

                                          This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                          After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                          The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                          Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                          Example

                                          To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                          Journal of Statistical Software 33

                                          with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                          Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                          Rgt g lt- rgraph(20 tprob = gp)

                                          Rgt eq lt- equivclust(g)

                                          Rgt b lt- blockmodel(g eq h = 15)

                                          Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                          Rgt ge

                                          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                          26 Exploratory edge set comparison

                                          One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                          cov(GH) =

                                          sum(ij)

                                          (AG

                                          ij minus microG

                                          )(AH

                                          ij minus microH

                                          )|V | (|V | minus 1)

                                          (3)

                                          34 Social Network Analysis with sna

                                          where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                          (ij)AXij is the graph mean The graph variance is then cov(GG)

                                          and the graph correlation ρ(GH) = cov(GH)radic

                                          cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                          The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                          Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                          In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                          Journal of Statistical Software 35

                                          Example

                                          We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                          Rgt g1 lt- rgraph(5)

                                          Rgt g2 lt -rgraph(5)

                                          Rgt g3 lt- rmperm(g2)

                                          Rgt gcor(g1 g2)

                                          [1] -01336306

                                          Rgt gcor(g1 g3)

                                          [1] 008908708

                                          Rgt gcor(g2 g3)

                                          [1] -04583333

                                          Rgt gscor(g1 g2 reps = 1e5)

                                          [1] 05345225

                                          Rgt gscor(g1 g3 reps = 1e5)

                                          [1] 05345225

                                          Rgt gscor(g2 g3 reps = 1e5)

                                          [1] 1

                                          Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                          Rgt x lt- rgraph(20 4)

                                          Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                          Rgt nl lt- netlm(y x)

                                          Rgt summary(nl)

                                          36 Social Network Analysis with sna

                                          OLS Network Model

                                          Residuals0 25 50 75 100

                                          -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                          CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                          (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                          Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                          Test Diagnostics

                                          Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                          (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                          As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                          Rgt x lt- rgraph(20 4)

                                          Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                          Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                          Rgt y lt- rgraph(20 tprob = yp)

                                          Rgt nl lt- netlogit(y x)

                                          Rgt summary(nl)

                                          Network Logit Model

                                          Coefficients

                                          Journal of Statistical Software 37

                                          Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                          Goodness of Fit Statistics

                                          Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                          3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                          (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                          Contingency Table (predicted (rows) x actual (cols))

                                          0 10 0 01 39 341

                                          Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                          Test Diagnostics

                                          Null Hypothesis qapReplications 1000Distribution Summary

                                          (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                          It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                          38 Social Network Analysis with sna

                                          parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                          27 Network inference and process models

                                          A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                          Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                          Journal of Statistical Software 39

                                          of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                          )prodk

                                          (1minusPr(Bk)

                                          )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                          While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                          y =

                                          (wsum

                                          i=1

                                          θiWi

                                          )y + Xβ + ε (4)

                                          ε =

                                          (zsum

                                          i=1

                                          ψiZi

                                          )ε+ ν (5)

                                          where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                          40 Social Network Analysis with sna

                                          Example

                                          To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                          Rgt g lt- rgraph(20)

                                          Rgt ep lt- rbeta(20 1 25)

                                          Rgt em lt- rbeta(20 15 25)

                                          Rgt dat lt- array(dim = c(20 20 20))

                                          Rgt for(i in 120)

                                          + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                          Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                          Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                          Rgt pem[1] lt- 2

                                          Rgt pem[2] lt- 11

                                          Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                          Rgt pep[1] lt- 2

                                          Rgt pep[2] lt- 11

                                          Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                          + epprior = pep burntime = 300 draws = 100)

                                          Rgt summary(b)

                                          Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                          Multiple Error Probability Model

                                          Marginal Posterior Network Distribution

                                          a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                          Journal of Statistical Software 41

                                          a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                          a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                          Marginal Posterior Global Error Distribution

                                          e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                          Marginal Posterior Error Distribution (by observer)

                                          Probability of False Negatives (e^-)

                                          Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                          42 Social Network Analysis with sna

                                          o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                          Probability of False Positives (e^+)

                                          Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                          MCMC Diagnostics

                                          Replicate Chains 5Burn Time 300

                                          Journal of Statistical Software 43

                                          Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                          Max 1003116Med 09992194IQR 00004545115

                                          Rgt cor(em apply(b$em 2 median))

                                          [1] 09187894

                                          Rgt cor(ep apply(b$ep 2 median))

                                          [1] 0971649

                                          Rgt mean(apply(b$net c(2 3) median) == g)

                                          [1] 1

                                          Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                          Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                          Rgt mean(consensus(dat method = LASintersection) == g)

                                          [1] 07725

                                          Rgt mean(consensus(dat method = LASunion) == g)

                                          [1] 0905

                                          Rgt mean(consensus(dat method = centralgraph) == g)

                                          [1] 09575

                                          Rgt mean(consensus(dat method = romneybatchelder) == g)

                                          44 Social Network Analysis with sna

                                          Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                          For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                          As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                          Rgt w1 lt- rgraph(50)

                                          Rgt w2 lt- rgraph(50)

                                          Rgt x lt- matrix(rnorm(50 5) 50 5)

                                          Rgt r1 lt- 02

                                          Rgt r2 lt- 03

                                          Rgt sigma lt- 01

                                          Rgt beta lt- rnorm(5)

                                          Rgt nu lt- rnorm(50 0 sigma)

                                          Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                          Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                          Rgt fit lt- lnam(y x w1 w2)

                                          Rgt summary(fit)

                                          Calllnam(y = y x = x W1 = w1 W2 = w2)

                                          ResidualsMin 1Q Median 3Q Max

                                          -052052 -018305 001156 015557 062082

                                          CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                          X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                          Journal of Statistical Software 45

                                          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                          Estimate Std ErrorSigma 009597 922e-05

                                          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                          3 Closing comments

                                          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                          Acknowledgments

                                          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                          46 Social Network Analysis with sna

                                          minus3 minus2 minus1 0 1 2

                                          minus3minus2

                                          minus10

                                          12

                                          Fitted vs Observed Values

                                          y

                                          y

                                          minus3 minus2 minus1 0 1 2

                                          minus02

                                          minus01

                                          00

                                          01

                                          02

                                          Fitted Values vs Estimated Disturbances

                                          y

                                          ν

                                          minus2 minus1 0 1 2

                                          minus04

                                          minus02

                                          00

                                          02

                                          04

                                          06

                                          Normal QminusQ Residual Plot

                                          Theoretical Quantiles

                                          Sam

                                          ple

                                          Qua

                                          ntile

                                          s

                                          Net Influence Plot

                                          Figure 6 Plot method output for lnam

                                          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                          References

                                          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                          Journal of Statistical Software 47

                                          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                          48 Social Network Analysis with sna

                                          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                          Journal of Statistical Software 49

                                          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                          50 Social Network Analysis with sna

                                          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                          Journal of Statistical Software 51

                                          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                          Affiliation

                                          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                          • Introduction and overview
                                            • Package history
                                            • sna and statnet
                                            • Functionality
                                            • Terminology and data representation
                                              • Importing relational data into R
                                                  • Package highlights
                                                    • Random graph generation
                                                      • Example
                                                        • Visualization and data manipulation
                                                          • Neighborhood and ego net functions
                                                          • Visualization
                                                            • Descriptive indices
                                                              • Node-level indices
                                                              • Graph-level indices
                                                                • Connectivity and subgraph statistics
                                                                  • Example
                                                                    • Position and role analysis
                                                                      • Example
                                                                        • Exploratory edge set comparison
                                                                          • Example
                                                                            • Network inference and process models
                                                                              • Example
                                                                                  • Closing comments

                                            22 Social Network Analysis with sna

                                            Rgt infocent(dat)

                                            [1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

                                            As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

                                            Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

                                            [1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

                                            Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

                                            + evcent(dat rescale = TRUE)) lt 1e-10)

                                            [1] TRUE

                                            Rgt bonpow(dat exponent = -05)

                                            [1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

                                            As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

                                            Rgt memb lt- sample(13 10 replace = TRUE)

                                            Rgt summary(brokerage(dat memb))

                                            Gould-Fernandez Brokerage Analysis

                                            Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

                                            w_I 50000 58638 27314 -03162 07518

                                            Journal of Statistical Software 23

                                            w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                            Individual Properties (by Group)

                                            Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                            [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                            b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                            Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                            [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                            t[1] -07838541[2] 14877951

                                            Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                            [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                            b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                            Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                            24 Social Network Analysis with sna

                                            for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                            Graph-level indices

                                            Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                            C(G) =|V |sumi=1

                                            [(maxvisinV

                                            c (vG))minus c (vi G)

                                            ] (1)

                                            ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                            C(G) = |V | [clowast(G)minus c(G)] (2)

                                            where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                            i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                            2For instance when all vertices are automorphically equivalent

                                            Journal of Statistical Software 25

                                            centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                            although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                            In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                            The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                            Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                            Rgt gden(g)

                                            [1] 006666667 031111111 054444444 072222222 093333333

                                            Rgt grecip(g)

                                            [1] 08666667 03777778 04888889 06666667 08666667

                                            Rgt grecip(g measure = edgewise)

                                            [1] 00000000 00000000 05306122 07692308 09285714

                                            Rgt grecip(g) == 1 - hierarchy(g)

                                            [1] TRUE TRUE TRUE TRUE TRUE

                                            Rgt gtrans(g)

                                            [1] 10000000 02957746 05047619 06809651 09326923

                                            Rgt gtrans(g measure = weakcensus)

                                            3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                            26 Social Network Analysis with sna

                                            [1] 0 21 106 254 582

                                            Rgt connectedness(g)

                                            [1] 04666667 10000000 10000000 10000000 10000000

                                            Rgt efficiency(g)

                                            [1] 100000000 076543210 050617284 030864198 007407407

                                            Rgt hierarchy(g measure = krackhardt)

                                            [1] 10 02 00 00 00

                                            Rgt lubness(g)

                                            [1] 02 10 10 10 10

                                            centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                            Rgt centralization(g degree cmode = outdegree)

                                            [1] 01728395

                                            Rgt centralization(g betweenness)

                                            [1] 0

                                            Rgt apply(g 1 centralization degree cmode = outdegree)

                                            [1] 017283951 027160494 038271605 006172840 007407407

                                            Rgt apply(g 1 centralization betweenness)

                                            [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                            As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                            Journal of Statistical Software 27

                                            Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                            + n lt- NROW(dat)

                                            + if(tmaxdev)

                                            + return((n-1) choose(n-1 2))

                                            + odeg lt- degree(dat cmode = outdegree)

                                            + choose(odeg 2)

                                            +

                                            Rgt apply(g 1 centralization o2scent)

                                            [1] 002160494 020370370 054012346 008950617 014506173

                                            Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                            24 Connectivity and subgraph statistics

                                            Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                            sumNj=1

                                            sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                            is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                            At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                            28 Social Network Analysis with sna

                                            subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                            Example

                                            To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                            Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                            Rgt apply(dyadcensus(g1) 2 mean)

                                            Mut Asym Null100 1284 3116

                                            Rgt apply(triadcensus(g1) 2 mean)

                                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                            Journal of Statistical Software 29

                                            120C 210 300030 000 000

                                            Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                            Rgt apply(dyadcensus(g2) 2 mean)

                                            Mut Asym Null884 926 2690

                                            Rgt apply(triadcensus(g2) 2 mean)

                                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                            Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                            Rgt apply(dyadcensus(g3) 2 mean)

                                            Mut Asym Null894 2044 1562

                                            Rgt apply(triadcensus(g3) 2 mean)

                                            003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                            Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                            + dyadictabulation = bylength)$pathcount

                                            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                            Rgt kcyclecensus(g3[1] maxlen = 5

                                            + cyclecomembership = bylength)$cyclecount

                                            Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                            30 Social Network Analysis with sna

                                            Rgt componentdist(g3[1])

                                            $membership[1] 1 1 1 1 1 1 1 1 1 1

                                            $csize[1] 10

                                            $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                            Rgt structurestatistics(g3[1])

                                            0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                            In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                            Rgt g4 lt- g1[12]

                                            Rgt g4[2] lt- g2[1]

                                            Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                            + g1 = 1 g2 = 2)

                                            Rgt summary(cug)

                                            CUG Test Results

                                            Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                            Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                            Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                            Rgt summary(cug)

                                            Journal of Statistical Software 31

                                            CUG Test Results

                                            Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                            Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                            A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                            25 Position and role analysis

                                            The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                            In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                            32 Social Network Analysis with sna

                                            This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                            After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                            The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                            Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                            Example

                                            To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                            Journal of Statistical Software 33

                                            with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                            Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                            Rgt g lt- rgraph(20 tprob = gp)

                                            Rgt eq lt- equivclust(g)

                                            Rgt b lt- blockmodel(g eq h = 15)

                                            Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                            Rgt ge

                                            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                            26 Exploratory edge set comparison

                                            One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                            cov(GH) =

                                            sum(ij)

                                            (AG

                                            ij minus microG

                                            )(AH

                                            ij minus microH

                                            )|V | (|V | minus 1)

                                            (3)

                                            34 Social Network Analysis with sna

                                            where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                            (ij)AXij is the graph mean The graph variance is then cov(GG)

                                            and the graph correlation ρ(GH) = cov(GH)radic

                                            cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                            The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                            Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                            In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                            Journal of Statistical Software 35

                                            Example

                                            We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                            Rgt g1 lt- rgraph(5)

                                            Rgt g2 lt -rgraph(5)

                                            Rgt g3 lt- rmperm(g2)

                                            Rgt gcor(g1 g2)

                                            [1] -01336306

                                            Rgt gcor(g1 g3)

                                            [1] 008908708

                                            Rgt gcor(g2 g3)

                                            [1] -04583333

                                            Rgt gscor(g1 g2 reps = 1e5)

                                            [1] 05345225

                                            Rgt gscor(g1 g3 reps = 1e5)

                                            [1] 05345225

                                            Rgt gscor(g2 g3 reps = 1e5)

                                            [1] 1

                                            Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                            Rgt x lt- rgraph(20 4)

                                            Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                            Rgt nl lt- netlm(y x)

                                            Rgt summary(nl)

                                            36 Social Network Analysis with sna

                                            OLS Network Model

                                            Residuals0 25 50 75 100

                                            -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                            CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                            (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                            Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                            Test Diagnostics

                                            Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                            (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                            As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                            Rgt x lt- rgraph(20 4)

                                            Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                            Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                            Rgt y lt- rgraph(20 tprob = yp)

                                            Rgt nl lt- netlogit(y x)

                                            Rgt summary(nl)

                                            Network Logit Model

                                            Coefficients

                                            Journal of Statistical Software 37

                                            Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                            Goodness of Fit Statistics

                                            Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                            3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                            (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                            Contingency Table (predicted (rows) x actual (cols))

                                            0 10 0 01 39 341

                                            Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                            Test Diagnostics

                                            Null Hypothesis qapReplications 1000Distribution Summary

                                            (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                            It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                            38 Social Network Analysis with sna

                                            parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                            27 Network inference and process models

                                            A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                            Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                            Journal of Statistical Software 39

                                            of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                            )prodk

                                            (1minusPr(Bk)

                                            )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                            While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                            y =

                                            (wsum

                                            i=1

                                            θiWi

                                            )y + Xβ + ε (4)

                                            ε =

                                            (zsum

                                            i=1

                                            ψiZi

                                            )ε+ ν (5)

                                            where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                            40 Social Network Analysis with sna

                                            Example

                                            To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                            Rgt g lt- rgraph(20)

                                            Rgt ep lt- rbeta(20 1 25)

                                            Rgt em lt- rbeta(20 15 25)

                                            Rgt dat lt- array(dim = c(20 20 20))

                                            Rgt for(i in 120)

                                            + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                            Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                            Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                            Rgt pem[1] lt- 2

                                            Rgt pem[2] lt- 11

                                            Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                            Rgt pep[1] lt- 2

                                            Rgt pep[2] lt- 11

                                            Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                            + epprior = pep burntime = 300 draws = 100)

                                            Rgt summary(b)

                                            Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                            Multiple Error Probability Model

                                            Marginal Posterior Network Distribution

                                            a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                            Journal of Statistical Software 41

                                            a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                            a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                            Marginal Posterior Global Error Distribution

                                            e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                            Marginal Posterior Error Distribution (by observer)

                                            Probability of False Negatives (e^-)

                                            Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                            42 Social Network Analysis with sna

                                            o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                            Probability of False Positives (e^+)

                                            Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                            MCMC Diagnostics

                                            Replicate Chains 5Burn Time 300

                                            Journal of Statistical Software 43

                                            Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                            Max 1003116Med 09992194IQR 00004545115

                                            Rgt cor(em apply(b$em 2 median))

                                            [1] 09187894

                                            Rgt cor(ep apply(b$ep 2 median))

                                            [1] 0971649

                                            Rgt mean(apply(b$net c(2 3) median) == g)

                                            [1] 1

                                            Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                            Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                            Rgt mean(consensus(dat method = LASintersection) == g)

                                            [1] 07725

                                            Rgt mean(consensus(dat method = LASunion) == g)

                                            [1] 0905

                                            Rgt mean(consensus(dat method = centralgraph) == g)

                                            [1] 09575

                                            Rgt mean(consensus(dat method = romneybatchelder) == g)

                                            44 Social Network Analysis with sna

                                            Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                            For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                            As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                            Rgt w1 lt- rgraph(50)

                                            Rgt w2 lt- rgraph(50)

                                            Rgt x lt- matrix(rnorm(50 5) 50 5)

                                            Rgt r1 lt- 02

                                            Rgt r2 lt- 03

                                            Rgt sigma lt- 01

                                            Rgt beta lt- rnorm(5)

                                            Rgt nu lt- rnorm(50 0 sigma)

                                            Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                            Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                            Rgt fit lt- lnam(y x w1 w2)

                                            Rgt summary(fit)

                                            Calllnam(y = y x = x W1 = w1 W2 = w2)

                                            ResidualsMin 1Q Median 3Q Max

                                            -052052 -018305 001156 015557 062082

                                            CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                            X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                            Journal of Statistical Software 45

                                            X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                            Estimate Std ErrorSigma 009597 922e-05

                                            Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                            Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                            In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                            3 Closing comments

                                            The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                            Acknowledgments

                                            The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                            46 Social Network Analysis with sna

                                            minus3 minus2 minus1 0 1 2

                                            minus3minus2

                                            minus10

                                            12

                                            Fitted vs Observed Values

                                            y

                                            y

                                            minus3 minus2 minus1 0 1 2

                                            minus02

                                            minus01

                                            00

                                            01

                                            02

                                            Fitted Values vs Estimated Disturbances

                                            y

                                            ν

                                            minus2 minus1 0 1 2

                                            minus04

                                            minus02

                                            00

                                            02

                                            04

                                            06

                                            Normal QminusQ Residual Plot

                                            Theoretical Quantiles

                                            Sam

                                            ple

                                            Qua

                                            ntile

                                            s

                                            Net Influence Plot

                                            Figure 6 Plot method output for lnam

                                            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                            References

                                            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                            Journal of Statistical Software 47

                                            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                            48 Social Network Analysis with sna

                                            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                            Journal of Statistical Software 49

                                            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                            50 Social Network Analysis with sna

                                            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                            Journal of Statistical Software 51

                                            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                            Affiliation

                                            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                            • Introduction and overview
                                              • Package history
                                              • sna and statnet
                                              • Functionality
                                              • Terminology and data representation
                                                • Importing relational data into R
                                                    • Package highlights
                                                      • Random graph generation
                                                        • Example
                                                          • Visualization and data manipulation
                                                            • Neighborhood and ego net functions
                                                            • Visualization
                                                              • Descriptive indices
                                                                • Node-level indices
                                                                • Graph-level indices
                                                                  • Connectivity and subgraph statistics
                                                                    • Example
                                                                      • Position and role analysis
                                                                        • Example
                                                                          • Exploratory edge set comparison
                                                                            • Example
                                                                              • Network inference and process models
                                                                                • Example
                                                                                    • Closing comments

                                              Journal of Statistical Software 23

                                              w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

                                              Individual Properties (by Group)

                                              Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                              [1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

                                              b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

                                              Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

                                              [1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

                                              t[1] -07838541[2] 14877951

                                              Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

                                              [1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

                                              b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

                                              Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

                                              24 Social Network Analysis with sna

                                              for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                              Graph-level indices

                                              Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                              C(G) =|V |sumi=1

                                              [(maxvisinV

                                              c (vG))minus c (vi G)

                                              ] (1)

                                              ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                              C(G) = |V | [clowast(G)minus c(G)] (2)

                                              where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                              i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                              2For instance when all vertices are automorphically equivalent

                                              Journal of Statistical Software 25

                                              centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                              although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                              In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                              The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                              Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                              Rgt gden(g)

                                              [1] 006666667 031111111 054444444 072222222 093333333

                                              Rgt grecip(g)

                                              [1] 08666667 03777778 04888889 06666667 08666667

                                              Rgt grecip(g measure = edgewise)

                                              [1] 00000000 00000000 05306122 07692308 09285714

                                              Rgt grecip(g) == 1 - hierarchy(g)

                                              [1] TRUE TRUE TRUE TRUE TRUE

                                              Rgt gtrans(g)

                                              [1] 10000000 02957746 05047619 06809651 09326923

                                              Rgt gtrans(g measure = weakcensus)

                                              3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                              26 Social Network Analysis with sna

                                              [1] 0 21 106 254 582

                                              Rgt connectedness(g)

                                              [1] 04666667 10000000 10000000 10000000 10000000

                                              Rgt efficiency(g)

                                              [1] 100000000 076543210 050617284 030864198 007407407

                                              Rgt hierarchy(g measure = krackhardt)

                                              [1] 10 02 00 00 00

                                              Rgt lubness(g)

                                              [1] 02 10 10 10 10

                                              centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                              Rgt centralization(g degree cmode = outdegree)

                                              [1] 01728395

                                              Rgt centralization(g betweenness)

                                              [1] 0

                                              Rgt apply(g 1 centralization degree cmode = outdegree)

                                              [1] 017283951 027160494 038271605 006172840 007407407

                                              Rgt apply(g 1 centralization betweenness)

                                              [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                              As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                              Journal of Statistical Software 27

                                              Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                              + n lt- NROW(dat)

                                              + if(tmaxdev)

                                              + return((n-1) choose(n-1 2))

                                              + odeg lt- degree(dat cmode = outdegree)

                                              + choose(odeg 2)

                                              +

                                              Rgt apply(g 1 centralization o2scent)

                                              [1] 002160494 020370370 054012346 008950617 014506173

                                              Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                              24 Connectivity and subgraph statistics

                                              Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                              sumNj=1

                                              sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                              is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                              At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                              28 Social Network Analysis with sna

                                              subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                              Example

                                              To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                              Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                              Rgt apply(dyadcensus(g1) 2 mean)

                                              Mut Asym Null100 1284 3116

                                              Rgt apply(triadcensus(g1) 2 mean)

                                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                              Journal of Statistical Software 29

                                              120C 210 300030 000 000

                                              Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                              Rgt apply(dyadcensus(g2) 2 mean)

                                              Mut Asym Null884 926 2690

                                              Rgt apply(triadcensus(g2) 2 mean)

                                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                              Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                              Rgt apply(dyadcensus(g3) 2 mean)

                                              Mut Asym Null894 2044 1562

                                              Rgt apply(triadcensus(g3) 2 mean)

                                              003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                              Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                              + dyadictabulation = bylength)$pathcount

                                              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                              Rgt kcyclecensus(g3[1] maxlen = 5

                                              + cyclecomembership = bylength)$cyclecount

                                              Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                              30 Social Network Analysis with sna

                                              Rgt componentdist(g3[1])

                                              $membership[1] 1 1 1 1 1 1 1 1 1 1

                                              $csize[1] 10

                                              $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                              Rgt structurestatistics(g3[1])

                                              0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                              In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                              Rgt g4 lt- g1[12]

                                              Rgt g4[2] lt- g2[1]

                                              Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                              + g1 = 1 g2 = 2)

                                              Rgt summary(cug)

                                              CUG Test Results

                                              Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                              Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                              Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                              Rgt summary(cug)

                                              Journal of Statistical Software 31

                                              CUG Test Results

                                              Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                              Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                              A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                              25 Position and role analysis

                                              The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                              In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                              32 Social Network Analysis with sna

                                              This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                              After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                              The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                              Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                              Example

                                              To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                              Journal of Statistical Software 33

                                              with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                              Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                              Rgt g lt- rgraph(20 tprob = gp)

                                              Rgt eq lt- equivclust(g)

                                              Rgt b lt- blockmodel(g eq h = 15)

                                              Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                              Rgt ge

                                              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                              26 Exploratory edge set comparison

                                              One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                              cov(GH) =

                                              sum(ij)

                                              (AG

                                              ij minus microG

                                              )(AH

                                              ij minus microH

                                              )|V | (|V | minus 1)

                                              (3)

                                              34 Social Network Analysis with sna

                                              where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                              (ij)AXij is the graph mean The graph variance is then cov(GG)

                                              and the graph correlation ρ(GH) = cov(GH)radic

                                              cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                              The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                              Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                              In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                              Journal of Statistical Software 35

                                              Example

                                              We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                              Rgt g1 lt- rgraph(5)

                                              Rgt g2 lt -rgraph(5)

                                              Rgt g3 lt- rmperm(g2)

                                              Rgt gcor(g1 g2)

                                              [1] -01336306

                                              Rgt gcor(g1 g3)

                                              [1] 008908708

                                              Rgt gcor(g2 g3)

                                              [1] -04583333

                                              Rgt gscor(g1 g2 reps = 1e5)

                                              [1] 05345225

                                              Rgt gscor(g1 g3 reps = 1e5)

                                              [1] 05345225

                                              Rgt gscor(g2 g3 reps = 1e5)

                                              [1] 1

                                              Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                              Rgt x lt- rgraph(20 4)

                                              Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                              Rgt nl lt- netlm(y x)

                                              Rgt summary(nl)

                                              36 Social Network Analysis with sna

                                              OLS Network Model

                                              Residuals0 25 50 75 100

                                              -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                              CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                              (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                              Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                              Test Diagnostics

                                              Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                              (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                              As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                              Rgt x lt- rgraph(20 4)

                                              Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                              Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                              Rgt y lt- rgraph(20 tprob = yp)

                                              Rgt nl lt- netlogit(y x)

                                              Rgt summary(nl)

                                              Network Logit Model

                                              Coefficients

                                              Journal of Statistical Software 37

                                              Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                              Goodness of Fit Statistics

                                              Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                              3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                              (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                              Contingency Table (predicted (rows) x actual (cols))

                                              0 10 0 01 39 341

                                              Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                              Test Diagnostics

                                              Null Hypothesis qapReplications 1000Distribution Summary

                                              (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                              It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                              38 Social Network Analysis with sna

                                              parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                              27 Network inference and process models

                                              A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                              Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                              Journal of Statistical Software 39

                                              of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                              )prodk

                                              (1minusPr(Bk)

                                              )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                              While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                              y =

                                              (wsum

                                              i=1

                                              θiWi

                                              )y + Xβ + ε (4)

                                              ε =

                                              (zsum

                                              i=1

                                              ψiZi

                                              )ε+ ν (5)

                                              where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                              40 Social Network Analysis with sna

                                              Example

                                              To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                              Rgt g lt- rgraph(20)

                                              Rgt ep lt- rbeta(20 1 25)

                                              Rgt em lt- rbeta(20 15 25)

                                              Rgt dat lt- array(dim = c(20 20 20))

                                              Rgt for(i in 120)

                                              + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                              Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                              Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                              Rgt pem[1] lt- 2

                                              Rgt pem[2] lt- 11

                                              Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                              Rgt pep[1] lt- 2

                                              Rgt pep[2] lt- 11

                                              Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                              + epprior = pep burntime = 300 draws = 100)

                                              Rgt summary(b)

                                              Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                              Multiple Error Probability Model

                                              Marginal Posterior Network Distribution

                                              a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                              Journal of Statistical Software 41

                                              a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                              a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                              Marginal Posterior Global Error Distribution

                                              e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                              Marginal Posterior Error Distribution (by observer)

                                              Probability of False Negatives (e^-)

                                              Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                              42 Social Network Analysis with sna

                                              o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                              Probability of False Positives (e^+)

                                              Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                              MCMC Diagnostics

                                              Replicate Chains 5Burn Time 300

                                              Journal of Statistical Software 43

                                              Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                              Max 1003116Med 09992194IQR 00004545115

                                              Rgt cor(em apply(b$em 2 median))

                                              [1] 09187894

                                              Rgt cor(ep apply(b$ep 2 median))

                                              [1] 0971649

                                              Rgt mean(apply(b$net c(2 3) median) == g)

                                              [1] 1

                                              Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                              Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                              Rgt mean(consensus(dat method = LASintersection) == g)

                                              [1] 07725

                                              Rgt mean(consensus(dat method = LASunion) == g)

                                              [1] 0905

                                              Rgt mean(consensus(dat method = centralgraph) == g)

                                              [1] 09575

                                              Rgt mean(consensus(dat method = romneybatchelder) == g)

                                              44 Social Network Analysis with sna

                                              Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                              For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                              As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                              Rgt w1 lt- rgraph(50)

                                              Rgt w2 lt- rgraph(50)

                                              Rgt x lt- matrix(rnorm(50 5) 50 5)

                                              Rgt r1 lt- 02

                                              Rgt r2 lt- 03

                                              Rgt sigma lt- 01

                                              Rgt beta lt- rnorm(5)

                                              Rgt nu lt- rnorm(50 0 sigma)

                                              Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                              Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                              Rgt fit lt- lnam(y x w1 w2)

                                              Rgt summary(fit)

                                              Calllnam(y = y x = x W1 = w1 W2 = w2)

                                              ResidualsMin 1Q Median 3Q Max

                                              -052052 -018305 001156 015557 062082

                                              CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                              X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                              Journal of Statistical Software 45

                                              X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                              Estimate Std ErrorSigma 009597 922e-05

                                              Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                              Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                              In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                              3 Closing comments

                                              The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                              Acknowledgments

                                              The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                              46 Social Network Analysis with sna

                                              minus3 minus2 minus1 0 1 2

                                              minus3minus2

                                              minus10

                                              12

                                              Fitted vs Observed Values

                                              y

                                              y

                                              minus3 minus2 minus1 0 1 2

                                              minus02

                                              minus01

                                              00

                                              01

                                              02

                                              Fitted Values vs Estimated Disturbances

                                              y

                                              ν

                                              minus2 minus1 0 1 2

                                              minus04

                                              minus02

                                              00

                                              02

                                              04

                                              06

                                              Normal QminusQ Residual Plot

                                              Theoretical Quantiles

                                              Sam

                                              ple

                                              Qua

                                              ntile

                                              s

                                              Net Influence Plot

                                              Figure 6 Plot method output for lnam

                                              team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                              References

                                              Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                              Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                              Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                              Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                              Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                              Journal of Statistical Software 47

                                              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                              48 Social Network Analysis with sna

                                              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                              Journal of Statistical Software 49

                                              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                              50 Social Network Analysis with sna

                                              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                              Journal of Statistical Software 51

                                              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                              Affiliation

                                              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                              • Introduction and overview
                                                • Package history
                                                • sna and statnet
                                                • Functionality
                                                • Terminology and data representation
                                                  • Importing relational data into R
                                                      • Package highlights
                                                        • Random graph generation
                                                          • Example
                                                            • Visualization and data manipulation
                                                              • Neighborhood and ego net functions
                                                              • Visualization
                                                                • Descriptive indices
                                                                  • Node-level indices
                                                                  • Graph-level indices
                                                                    • Connectivity and subgraph statistics
                                                                      • Example
                                                                        • Position and role analysis
                                                                          • Example
                                                                            • Exploratory edge set comparison
                                                                              • Example
                                                                                • Network inference and process models
                                                                                  • Example
                                                                                      • Closing comments

                                                24 Social Network Analysis with sna

                                                for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

                                                Graph-level indices

                                                Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

                                                C(G) =|V |sumi=1

                                                [(maxvisinV

                                                c (vG))minus c (vi G)

                                                ] (1)

                                                ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

                                                C(G) = |V | [clowast(G)minus c(G)] (2)

                                                where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

                                                i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

                                                2For instance when all vertices are automorphically equivalent

                                                Journal of Statistical Software 25

                                                centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                                although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                                In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                                The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                                Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                                Rgt gden(g)

                                                [1] 006666667 031111111 054444444 072222222 093333333

                                                Rgt grecip(g)

                                                [1] 08666667 03777778 04888889 06666667 08666667

                                                Rgt grecip(g measure = edgewise)

                                                [1] 00000000 00000000 05306122 07692308 09285714

                                                Rgt grecip(g) == 1 - hierarchy(g)

                                                [1] TRUE TRUE TRUE TRUE TRUE

                                                Rgt gtrans(g)

                                                [1] 10000000 02957746 05047619 06809651 09326923

                                                Rgt gtrans(g measure = weakcensus)

                                                3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                                26 Social Network Analysis with sna

                                                [1] 0 21 106 254 582

                                                Rgt connectedness(g)

                                                [1] 04666667 10000000 10000000 10000000 10000000

                                                Rgt efficiency(g)

                                                [1] 100000000 076543210 050617284 030864198 007407407

                                                Rgt hierarchy(g measure = krackhardt)

                                                [1] 10 02 00 00 00

                                                Rgt lubness(g)

                                                [1] 02 10 10 10 10

                                                centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                                Rgt centralization(g degree cmode = outdegree)

                                                [1] 01728395

                                                Rgt centralization(g betweenness)

                                                [1] 0

                                                Rgt apply(g 1 centralization degree cmode = outdegree)

                                                [1] 017283951 027160494 038271605 006172840 007407407

                                                Rgt apply(g 1 centralization betweenness)

                                                [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                                As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                                Journal of Statistical Software 27

                                                Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                                + n lt- NROW(dat)

                                                + if(tmaxdev)

                                                + return((n-1) choose(n-1 2))

                                                + odeg lt- degree(dat cmode = outdegree)

                                                + choose(odeg 2)

                                                +

                                                Rgt apply(g 1 centralization o2scent)

                                                [1] 002160494 020370370 054012346 008950617 014506173

                                                Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                                24 Connectivity and subgraph statistics

                                                Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                                sumNj=1

                                                sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                                is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                                At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                                28 Social Network Analysis with sna

                                                subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                                Example

                                                To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                                Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                                Rgt apply(dyadcensus(g1) 2 mean)

                                                Mut Asym Null100 1284 3116

                                                Rgt apply(triadcensus(g1) 2 mean)

                                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                                Journal of Statistical Software 29

                                                120C 210 300030 000 000

                                                Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                Rgt apply(dyadcensus(g2) 2 mean)

                                                Mut Asym Null884 926 2690

                                                Rgt apply(triadcensus(g2) 2 mean)

                                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                Rgt apply(dyadcensus(g3) 2 mean)

                                                Mut Asym Null894 2044 1562

                                                Rgt apply(triadcensus(g3) 2 mean)

                                                003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                + dyadictabulation = bylength)$pathcount

                                                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                Rgt kcyclecensus(g3[1] maxlen = 5

                                                + cyclecomembership = bylength)$cyclecount

                                                Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                30 Social Network Analysis with sna

                                                Rgt componentdist(g3[1])

                                                $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                $csize[1] 10

                                                $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                Rgt structurestatistics(g3[1])

                                                0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                Rgt g4 lt- g1[12]

                                                Rgt g4[2] lt- g2[1]

                                                Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                + g1 = 1 g2 = 2)

                                                Rgt summary(cug)

                                                CUG Test Results

                                                Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                Rgt summary(cug)

                                                Journal of Statistical Software 31

                                                CUG Test Results

                                                Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                25 Position and role analysis

                                                The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                32 Social Network Analysis with sna

                                                This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                Example

                                                To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                Journal of Statistical Software 33

                                                with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                Rgt g lt- rgraph(20 tprob = gp)

                                                Rgt eq lt- equivclust(g)

                                                Rgt b lt- blockmodel(g eq h = 15)

                                                Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                Rgt ge

                                                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                26 Exploratory edge set comparison

                                                One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                cov(GH) =

                                                sum(ij)

                                                (AG

                                                ij minus microG

                                                )(AH

                                                ij minus microH

                                                )|V | (|V | minus 1)

                                                (3)

                                                34 Social Network Analysis with sna

                                                where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                and the graph correlation ρ(GH) = cov(GH)radic

                                                cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                Journal of Statistical Software 35

                                                Example

                                                We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                Rgt g1 lt- rgraph(5)

                                                Rgt g2 lt -rgraph(5)

                                                Rgt g3 lt- rmperm(g2)

                                                Rgt gcor(g1 g2)

                                                [1] -01336306

                                                Rgt gcor(g1 g3)

                                                [1] 008908708

                                                Rgt gcor(g2 g3)

                                                [1] -04583333

                                                Rgt gscor(g1 g2 reps = 1e5)

                                                [1] 05345225

                                                Rgt gscor(g1 g3 reps = 1e5)

                                                [1] 05345225

                                                Rgt gscor(g2 g3 reps = 1e5)

                                                [1] 1

                                                Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                Rgt x lt- rgraph(20 4)

                                                Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                Rgt nl lt- netlm(y x)

                                                Rgt summary(nl)

                                                36 Social Network Analysis with sna

                                                OLS Network Model

                                                Residuals0 25 50 75 100

                                                -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                Test Diagnostics

                                                Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                Rgt x lt- rgraph(20 4)

                                                Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                Rgt y lt- rgraph(20 tprob = yp)

                                                Rgt nl lt- netlogit(y x)

                                                Rgt summary(nl)

                                                Network Logit Model

                                                Coefficients

                                                Journal of Statistical Software 37

                                                Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                Goodness of Fit Statistics

                                                Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                Contingency Table (predicted (rows) x actual (cols))

                                                0 10 0 01 39 341

                                                Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                Test Diagnostics

                                                Null Hypothesis qapReplications 1000Distribution Summary

                                                (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                38 Social Network Analysis with sna

                                                parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                27 Network inference and process models

                                                A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                Journal of Statistical Software 39

                                                of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                )prodk

                                                (1minusPr(Bk)

                                                )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                y =

                                                (wsum

                                                i=1

                                                θiWi

                                                )y + Xβ + ε (4)

                                                ε =

                                                (zsum

                                                i=1

                                                ψiZi

                                                )ε+ ν (5)

                                                where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                40 Social Network Analysis with sna

                                                Example

                                                To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                Rgt g lt- rgraph(20)

                                                Rgt ep lt- rbeta(20 1 25)

                                                Rgt em lt- rbeta(20 15 25)

                                                Rgt dat lt- array(dim = c(20 20 20))

                                                Rgt for(i in 120)

                                                + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                Rgt pem[1] lt- 2

                                                Rgt pem[2] lt- 11

                                                Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                Rgt pep[1] lt- 2

                                                Rgt pep[2] lt- 11

                                                Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                + epprior = pep burntime = 300 draws = 100)

                                                Rgt summary(b)

                                                Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                Multiple Error Probability Model

                                                Marginal Posterior Network Distribution

                                                a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                Journal of Statistical Software 41

                                                a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                Marginal Posterior Global Error Distribution

                                                e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                Marginal Posterior Error Distribution (by observer)

                                                Probability of False Negatives (e^-)

                                                Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                42 Social Network Analysis with sna

                                                o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                Probability of False Positives (e^+)

                                                Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                MCMC Diagnostics

                                                Replicate Chains 5Burn Time 300

                                                Journal of Statistical Software 43

                                                Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                Max 1003116Med 09992194IQR 00004545115

                                                Rgt cor(em apply(b$em 2 median))

                                                [1] 09187894

                                                Rgt cor(ep apply(b$ep 2 median))

                                                [1] 0971649

                                                Rgt mean(apply(b$net c(2 3) median) == g)

                                                [1] 1

                                                Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                Rgt mean(consensus(dat method = LASintersection) == g)

                                                [1] 07725

                                                Rgt mean(consensus(dat method = LASunion) == g)

                                                [1] 0905

                                                Rgt mean(consensus(dat method = centralgraph) == g)

                                                [1] 09575

                                                Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                44 Social Network Analysis with sna

                                                Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                Rgt w1 lt- rgraph(50)

                                                Rgt w2 lt- rgraph(50)

                                                Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                Rgt r1 lt- 02

                                                Rgt r2 lt- 03

                                                Rgt sigma lt- 01

                                                Rgt beta lt- rnorm(5)

                                                Rgt nu lt- rnorm(50 0 sigma)

                                                Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                Rgt fit lt- lnam(y x w1 w2)

                                                Rgt summary(fit)

                                                Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                ResidualsMin 1Q Median 3Q Max

                                                -052052 -018305 001156 015557 062082

                                                CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                Journal of Statistical Software 45

                                                X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                Estimate Std ErrorSigma 009597 922e-05

                                                Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                3 Closing comments

                                                The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                Acknowledgments

                                                The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                46 Social Network Analysis with sna

                                                minus3 minus2 minus1 0 1 2

                                                minus3minus2

                                                minus10

                                                12

                                                Fitted vs Observed Values

                                                y

                                                y

                                                minus3 minus2 minus1 0 1 2

                                                minus02

                                                minus01

                                                00

                                                01

                                                02

                                                Fitted Values vs Estimated Disturbances

                                                y

                                                ν

                                                minus2 minus1 0 1 2

                                                minus04

                                                minus02

                                                00

                                                02

                                                04

                                                06

                                                Normal QminusQ Residual Plot

                                                Theoretical Quantiles

                                                Sam

                                                ple

                                                Qua

                                                ntile

                                                s

                                                Net Influence Plot

                                                Figure 6 Plot method output for lnam

                                                team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                References

                                                Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                Journal of Statistical Software 47

                                                Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                48 Social Network Analysis with sna

                                                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                Journal of Statistical Software 49

                                                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                50 Social Network Analysis with sna

                                                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                Journal of Statistical Software 51

                                                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                Affiliation

                                                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                • Introduction and overview
                                                  • Package history
                                                  • sna and statnet
                                                  • Functionality
                                                  • Terminology and data representation
                                                    • Importing relational data into R
                                                        • Package highlights
                                                          • Random graph generation
                                                            • Example
                                                              • Visualization and data manipulation
                                                                • Neighborhood and ego net functions
                                                                • Visualization
                                                                  • Descriptive indices
                                                                    • Node-level indices
                                                                    • Graph-level indices
                                                                      • Connectivity and subgraph statistics
                                                                        • Example
                                                                          • Position and role analysis
                                                                            • Example
                                                                              • Exploratory edge set comparison
                                                                                • Example
                                                                                  • Network inference and process models
                                                                                    • Example
                                                                                        • Closing comments

                                                  Journal of Statistical Software 25

                                                  centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

                                                  although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

                                                  In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

                                                  The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

                                                  Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

                                                  Rgt gden(g)

                                                  [1] 006666667 031111111 054444444 072222222 093333333

                                                  Rgt grecip(g)

                                                  [1] 08666667 03777778 04888889 06666667 08666667

                                                  Rgt grecip(g measure = edgewise)

                                                  [1] 00000000 00000000 05306122 07692308 09285714

                                                  Rgt grecip(g) == 1 - hierarchy(g)

                                                  [1] TRUE TRUE TRUE TRUE TRUE

                                                  Rgt gtrans(g)

                                                  [1] 10000000 02957746 05047619 06809651 09326923

                                                  Rgt gtrans(g measure = weakcensus)

                                                  3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

                                                  26 Social Network Analysis with sna

                                                  [1] 0 21 106 254 582

                                                  Rgt connectedness(g)

                                                  [1] 04666667 10000000 10000000 10000000 10000000

                                                  Rgt efficiency(g)

                                                  [1] 100000000 076543210 050617284 030864198 007407407

                                                  Rgt hierarchy(g measure = krackhardt)

                                                  [1] 10 02 00 00 00

                                                  Rgt lubness(g)

                                                  [1] 02 10 10 10 10

                                                  centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                                  Rgt centralization(g degree cmode = outdegree)

                                                  [1] 01728395

                                                  Rgt centralization(g betweenness)

                                                  [1] 0

                                                  Rgt apply(g 1 centralization degree cmode = outdegree)

                                                  [1] 017283951 027160494 038271605 006172840 007407407

                                                  Rgt apply(g 1 centralization betweenness)

                                                  [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                                  As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                                  Journal of Statistical Software 27

                                                  Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                                  + n lt- NROW(dat)

                                                  + if(tmaxdev)

                                                  + return((n-1) choose(n-1 2))

                                                  + odeg lt- degree(dat cmode = outdegree)

                                                  + choose(odeg 2)

                                                  +

                                                  Rgt apply(g 1 centralization o2scent)

                                                  [1] 002160494 020370370 054012346 008950617 014506173

                                                  Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                                  24 Connectivity and subgraph statistics

                                                  Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                                  sumNj=1

                                                  sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                                  is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                                  At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                                  28 Social Network Analysis with sna

                                                  subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                                  Example

                                                  To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                                  Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                                  Rgt apply(dyadcensus(g1) 2 mean)

                                                  Mut Asym Null100 1284 3116

                                                  Rgt apply(triadcensus(g1) 2 mean)

                                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                                  Journal of Statistical Software 29

                                                  120C 210 300030 000 000

                                                  Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                  Rgt apply(dyadcensus(g2) 2 mean)

                                                  Mut Asym Null884 926 2690

                                                  Rgt apply(triadcensus(g2) 2 mean)

                                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                  Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                  Rgt apply(dyadcensus(g3) 2 mean)

                                                  Mut Asym Null894 2044 1562

                                                  Rgt apply(triadcensus(g3) 2 mean)

                                                  003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                  Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                  + dyadictabulation = bylength)$pathcount

                                                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                  Rgt kcyclecensus(g3[1] maxlen = 5

                                                  + cyclecomembership = bylength)$cyclecount

                                                  Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                  30 Social Network Analysis with sna

                                                  Rgt componentdist(g3[1])

                                                  $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                  $csize[1] 10

                                                  $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                  Rgt structurestatistics(g3[1])

                                                  0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                  In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                  Rgt g4 lt- g1[12]

                                                  Rgt g4[2] lt- g2[1]

                                                  Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                  + g1 = 1 g2 = 2)

                                                  Rgt summary(cug)

                                                  CUG Test Results

                                                  Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                  Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                  Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                  Rgt summary(cug)

                                                  Journal of Statistical Software 31

                                                  CUG Test Results

                                                  Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                  Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                  Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                  A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                  25 Position and role analysis

                                                  The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                  In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                  32 Social Network Analysis with sna

                                                  This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                  After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                  The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                  Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                  Example

                                                  To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                  Journal of Statistical Software 33

                                                  with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                  Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                  Rgt g lt- rgraph(20 tprob = gp)

                                                  Rgt eq lt- equivclust(g)

                                                  Rgt b lt- blockmodel(g eq h = 15)

                                                  Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                  Rgt ge

                                                  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                  26 Exploratory edge set comparison

                                                  One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                  cov(GH) =

                                                  sum(ij)

                                                  (AG

                                                  ij minus microG

                                                  )(AH

                                                  ij minus microH

                                                  )|V | (|V | minus 1)

                                                  (3)

                                                  34 Social Network Analysis with sna

                                                  where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                  (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                  and the graph correlation ρ(GH) = cov(GH)radic

                                                  cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                  The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                  Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                  In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                  Journal of Statistical Software 35

                                                  Example

                                                  We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                  Rgt g1 lt- rgraph(5)

                                                  Rgt g2 lt -rgraph(5)

                                                  Rgt g3 lt- rmperm(g2)

                                                  Rgt gcor(g1 g2)

                                                  [1] -01336306

                                                  Rgt gcor(g1 g3)

                                                  [1] 008908708

                                                  Rgt gcor(g2 g3)

                                                  [1] -04583333

                                                  Rgt gscor(g1 g2 reps = 1e5)

                                                  [1] 05345225

                                                  Rgt gscor(g1 g3 reps = 1e5)

                                                  [1] 05345225

                                                  Rgt gscor(g2 g3 reps = 1e5)

                                                  [1] 1

                                                  Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                  Rgt x lt- rgraph(20 4)

                                                  Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                  Rgt nl lt- netlm(y x)

                                                  Rgt summary(nl)

                                                  36 Social Network Analysis with sna

                                                  OLS Network Model

                                                  Residuals0 25 50 75 100

                                                  -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                  CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                  (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                  Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                  Test Diagnostics

                                                  Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                  (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                  As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                  Rgt x lt- rgraph(20 4)

                                                  Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                  Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                  Rgt y lt- rgraph(20 tprob = yp)

                                                  Rgt nl lt- netlogit(y x)

                                                  Rgt summary(nl)

                                                  Network Logit Model

                                                  Coefficients

                                                  Journal of Statistical Software 37

                                                  Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                  Goodness of Fit Statistics

                                                  Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                  3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                  (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                  Contingency Table (predicted (rows) x actual (cols))

                                                  0 10 0 01 39 341

                                                  Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                  Test Diagnostics

                                                  Null Hypothesis qapReplications 1000Distribution Summary

                                                  (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                  It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                  38 Social Network Analysis with sna

                                                  parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                  27 Network inference and process models

                                                  A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                  Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                  Journal of Statistical Software 39

                                                  of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                  )prodk

                                                  (1minusPr(Bk)

                                                  )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                  While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                  y =

                                                  (wsum

                                                  i=1

                                                  θiWi

                                                  )y + Xβ + ε (4)

                                                  ε =

                                                  (zsum

                                                  i=1

                                                  ψiZi

                                                  )ε+ ν (5)

                                                  where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                  40 Social Network Analysis with sna

                                                  Example

                                                  To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                  Rgt g lt- rgraph(20)

                                                  Rgt ep lt- rbeta(20 1 25)

                                                  Rgt em lt- rbeta(20 15 25)

                                                  Rgt dat lt- array(dim = c(20 20 20))

                                                  Rgt for(i in 120)

                                                  + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                  Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                  Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                  Rgt pem[1] lt- 2

                                                  Rgt pem[2] lt- 11

                                                  Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                  Rgt pep[1] lt- 2

                                                  Rgt pep[2] lt- 11

                                                  Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                  + epprior = pep burntime = 300 draws = 100)

                                                  Rgt summary(b)

                                                  Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                  Multiple Error Probability Model

                                                  Marginal Posterior Network Distribution

                                                  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                  Journal of Statistical Software 41

                                                  a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                  a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                  Marginal Posterior Global Error Distribution

                                                  e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                  Marginal Posterior Error Distribution (by observer)

                                                  Probability of False Negatives (e^-)

                                                  Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                  42 Social Network Analysis with sna

                                                  o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                  Probability of False Positives (e^+)

                                                  Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                  MCMC Diagnostics

                                                  Replicate Chains 5Burn Time 300

                                                  Journal of Statistical Software 43

                                                  Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                  Max 1003116Med 09992194IQR 00004545115

                                                  Rgt cor(em apply(b$em 2 median))

                                                  [1] 09187894

                                                  Rgt cor(ep apply(b$ep 2 median))

                                                  [1] 0971649

                                                  Rgt mean(apply(b$net c(2 3) median) == g)

                                                  [1] 1

                                                  Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                  Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                  Rgt mean(consensus(dat method = LASintersection) == g)

                                                  [1] 07725

                                                  Rgt mean(consensus(dat method = LASunion) == g)

                                                  [1] 0905

                                                  Rgt mean(consensus(dat method = centralgraph) == g)

                                                  [1] 09575

                                                  Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                  44 Social Network Analysis with sna

                                                  Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                  For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                  As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                  Rgt w1 lt- rgraph(50)

                                                  Rgt w2 lt- rgraph(50)

                                                  Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                  Rgt r1 lt- 02

                                                  Rgt r2 lt- 03

                                                  Rgt sigma lt- 01

                                                  Rgt beta lt- rnorm(5)

                                                  Rgt nu lt- rnorm(50 0 sigma)

                                                  Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                  Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                  Rgt fit lt- lnam(y x w1 w2)

                                                  Rgt summary(fit)

                                                  Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                  ResidualsMin 1Q Median 3Q Max

                                                  -052052 -018305 001156 015557 062082

                                                  CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                  X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                  Journal of Statistical Software 45

                                                  X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                  Estimate Std ErrorSigma 009597 922e-05

                                                  Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                  Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                  In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                  3 Closing comments

                                                  The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                  Acknowledgments

                                                  The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                  46 Social Network Analysis with sna

                                                  minus3 minus2 minus1 0 1 2

                                                  minus3minus2

                                                  minus10

                                                  12

                                                  Fitted vs Observed Values

                                                  y

                                                  y

                                                  minus3 minus2 minus1 0 1 2

                                                  minus02

                                                  minus01

                                                  00

                                                  01

                                                  02

                                                  Fitted Values vs Estimated Disturbances

                                                  y

                                                  ν

                                                  minus2 minus1 0 1 2

                                                  minus04

                                                  minus02

                                                  00

                                                  02

                                                  04

                                                  06

                                                  Normal QminusQ Residual Plot

                                                  Theoretical Quantiles

                                                  Sam

                                                  ple

                                                  Qua

                                                  ntile

                                                  s

                                                  Net Influence Plot

                                                  Figure 6 Plot method output for lnam

                                                  team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                  References

                                                  Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                  Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                  Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                  Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                  Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                  Journal of Statistical Software 47

                                                  Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                  Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                  Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                  Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                  Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                  Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                  Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                  Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                  Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                  Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                  Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                  Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                  Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                  Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                  Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                  Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                  Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                  48 Social Network Analysis with sna

                                                  Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                  Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                  Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                  Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                  Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                  Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                  Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                  Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                  Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                  Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                  Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                  Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                  Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                  Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                  Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                  Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                  Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                  Journal of Statistical Software 49

                                                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                  50 Social Network Analysis with sna

                                                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                  Journal of Statistical Software 51

                                                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                  Affiliation

                                                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                  • Introduction and overview
                                                    • Package history
                                                    • sna and statnet
                                                    • Functionality
                                                    • Terminology and data representation
                                                      • Importing relational data into R
                                                          • Package highlights
                                                            • Random graph generation
                                                              • Example
                                                                • Visualization and data manipulation
                                                                  • Neighborhood and ego net functions
                                                                  • Visualization
                                                                    • Descriptive indices
                                                                      • Node-level indices
                                                                      • Graph-level indices
                                                                        • Connectivity and subgraph statistics
                                                                          • Example
                                                                            • Position and role analysis
                                                                              • Example
                                                                                • Exploratory edge set comparison
                                                                                  • Example
                                                                                    • Network inference and process models
                                                                                      • Example
                                                                                          • Closing comments

                                                    26 Social Network Analysis with sna

                                                    [1] 0 21 106 254 582

                                                    Rgt connectedness(g)

                                                    [1] 04666667 10000000 10000000 10000000 10000000

                                                    Rgt efficiency(g)

                                                    [1] 100000000 076543210 050617284 030864198 007407407

                                                    Rgt hierarchy(g measure = krackhardt)

                                                    [1] 10 02 00 00 00

                                                    Rgt lubness(g)

                                                    [1] 02 10 10 10 10

                                                    centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

                                                    Rgt centralization(g degree cmode = outdegree)

                                                    [1] 01728395

                                                    Rgt centralization(g betweenness)

                                                    [1] 0

                                                    Rgt apply(g 1 centralization degree cmode = outdegree)

                                                    [1] 017283951 027160494 038271605 006172840 007407407

                                                    Rgt apply(g 1 centralization betweenness)

                                                    [1] 0000000000 0135802469 0043467078 0021237507 0004151969

                                                    As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

                                                    Journal of Statistical Software 27

                                                    Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                                    + n lt- NROW(dat)

                                                    + if(tmaxdev)

                                                    + return((n-1) choose(n-1 2))

                                                    + odeg lt- degree(dat cmode = outdegree)

                                                    + choose(odeg 2)

                                                    +

                                                    Rgt apply(g 1 centralization o2scent)

                                                    [1] 002160494 020370370 054012346 008950617 014506173

                                                    Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                                    24 Connectivity and subgraph statistics

                                                    Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                                    sumNj=1

                                                    sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                                    is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                                    At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                                    28 Social Network Analysis with sna

                                                    subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                                    Example

                                                    To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                                    Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                                    Rgt apply(dyadcensus(g1) 2 mean)

                                                    Mut Asym Null100 1284 3116

                                                    Rgt apply(triadcensus(g1) 2 mean)

                                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                                    Journal of Statistical Software 29

                                                    120C 210 300030 000 000

                                                    Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                    Rgt apply(dyadcensus(g2) 2 mean)

                                                    Mut Asym Null884 926 2690

                                                    Rgt apply(triadcensus(g2) 2 mean)

                                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                    Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                    Rgt apply(dyadcensus(g3) 2 mean)

                                                    Mut Asym Null894 2044 1562

                                                    Rgt apply(triadcensus(g3) 2 mean)

                                                    003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                    Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                    + dyadictabulation = bylength)$pathcount

                                                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                    Rgt kcyclecensus(g3[1] maxlen = 5

                                                    + cyclecomembership = bylength)$cyclecount

                                                    Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                    30 Social Network Analysis with sna

                                                    Rgt componentdist(g3[1])

                                                    $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                    $csize[1] 10

                                                    $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                    Rgt structurestatistics(g3[1])

                                                    0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                    In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                    Rgt g4 lt- g1[12]

                                                    Rgt g4[2] lt- g2[1]

                                                    Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                    + g1 = 1 g2 = 2)

                                                    Rgt summary(cug)

                                                    CUG Test Results

                                                    Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                    Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                    Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                    Rgt summary(cug)

                                                    Journal of Statistical Software 31

                                                    CUG Test Results

                                                    Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                    Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                    Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                    A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                    25 Position and role analysis

                                                    The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                    In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                    32 Social Network Analysis with sna

                                                    This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                    After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                    The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                    Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                    Example

                                                    To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                    Journal of Statistical Software 33

                                                    with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                    Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                    Rgt g lt- rgraph(20 tprob = gp)

                                                    Rgt eq lt- equivclust(g)

                                                    Rgt b lt- blockmodel(g eq h = 15)

                                                    Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                    Rgt ge

                                                    [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                    26 Exploratory edge set comparison

                                                    One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                    cov(GH) =

                                                    sum(ij)

                                                    (AG

                                                    ij minus microG

                                                    )(AH

                                                    ij minus microH

                                                    )|V | (|V | minus 1)

                                                    (3)

                                                    34 Social Network Analysis with sna

                                                    where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                    (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                    and the graph correlation ρ(GH) = cov(GH)radic

                                                    cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                    The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                    Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                    In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                    Journal of Statistical Software 35

                                                    Example

                                                    We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                    Rgt g1 lt- rgraph(5)

                                                    Rgt g2 lt -rgraph(5)

                                                    Rgt g3 lt- rmperm(g2)

                                                    Rgt gcor(g1 g2)

                                                    [1] -01336306

                                                    Rgt gcor(g1 g3)

                                                    [1] 008908708

                                                    Rgt gcor(g2 g3)

                                                    [1] -04583333

                                                    Rgt gscor(g1 g2 reps = 1e5)

                                                    [1] 05345225

                                                    Rgt gscor(g1 g3 reps = 1e5)

                                                    [1] 05345225

                                                    Rgt gscor(g2 g3 reps = 1e5)

                                                    [1] 1

                                                    Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                    Rgt x lt- rgraph(20 4)

                                                    Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                    Rgt nl lt- netlm(y x)

                                                    Rgt summary(nl)

                                                    36 Social Network Analysis with sna

                                                    OLS Network Model

                                                    Residuals0 25 50 75 100

                                                    -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                    CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                    (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                    Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                    Test Diagnostics

                                                    Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                    (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                    As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                    Rgt x lt- rgraph(20 4)

                                                    Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                    Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                    Rgt y lt- rgraph(20 tprob = yp)

                                                    Rgt nl lt- netlogit(y x)

                                                    Rgt summary(nl)

                                                    Network Logit Model

                                                    Coefficients

                                                    Journal of Statistical Software 37

                                                    Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                    Goodness of Fit Statistics

                                                    Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                    3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                    (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                    Contingency Table (predicted (rows) x actual (cols))

                                                    0 10 0 01 39 341

                                                    Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                    Test Diagnostics

                                                    Null Hypothesis qapReplications 1000Distribution Summary

                                                    (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                    It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                    38 Social Network Analysis with sna

                                                    parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                    27 Network inference and process models

                                                    A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                    Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                    Journal of Statistical Software 39

                                                    of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                    )prodk

                                                    (1minusPr(Bk)

                                                    )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                    While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                    y =

                                                    (wsum

                                                    i=1

                                                    θiWi

                                                    )y + Xβ + ε (4)

                                                    ε =

                                                    (zsum

                                                    i=1

                                                    ψiZi

                                                    )ε+ ν (5)

                                                    where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                    40 Social Network Analysis with sna

                                                    Example

                                                    To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                    Rgt g lt- rgraph(20)

                                                    Rgt ep lt- rbeta(20 1 25)

                                                    Rgt em lt- rbeta(20 15 25)

                                                    Rgt dat lt- array(dim = c(20 20 20))

                                                    Rgt for(i in 120)

                                                    + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                    Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                    Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                    Rgt pem[1] lt- 2

                                                    Rgt pem[2] lt- 11

                                                    Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                    Rgt pep[1] lt- 2

                                                    Rgt pep[2] lt- 11

                                                    Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                    + epprior = pep burntime = 300 draws = 100)

                                                    Rgt summary(b)

                                                    Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                    Multiple Error Probability Model

                                                    Marginal Posterior Network Distribution

                                                    a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                    Journal of Statistical Software 41

                                                    a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                    a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                    Marginal Posterior Global Error Distribution

                                                    e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                    Marginal Posterior Error Distribution (by observer)

                                                    Probability of False Negatives (e^-)

                                                    Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                    42 Social Network Analysis with sna

                                                    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                    Probability of False Positives (e^+)

                                                    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                    MCMC Diagnostics

                                                    Replicate Chains 5Burn Time 300

                                                    Journal of Statistical Software 43

                                                    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                    Max 1003116Med 09992194IQR 00004545115

                                                    Rgt cor(em apply(b$em 2 median))

                                                    [1] 09187894

                                                    Rgt cor(ep apply(b$ep 2 median))

                                                    [1] 0971649

                                                    Rgt mean(apply(b$net c(2 3) median) == g)

                                                    [1] 1

                                                    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                    Rgt mean(consensus(dat method = LASintersection) == g)

                                                    [1] 07725

                                                    Rgt mean(consensus(dat method = LASunion) == g)

                                                    [1] 0905

                                                    Rgt mean(consensus(dat method = centralgraph) == g)

                                                    [1] 09575

                                                    Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                    44 Social Network Analysis with sna

                                                    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                    Rgt w1 lt- rgraph(50)

                                                    Rgt w2 lt- rgraph(50)

                                                    Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                    Rgt r1 lt- 02

                                                    Rgt r2 lt- 03

                                                    Rgt sigma lt- 01

                                                    Rgt beta lt- rnorm(5)

                                                    Rgt nu lt- rnorm(50 0 sigma)

                                                    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                    Rgt fit lt- lnam(y x w1 w2)

                                                    Rgt summary(fit)

                                                    Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                    ResidualsMin 1Q Median 3Q Max

                                                    -052052 -018305 001156 015557 062082

                                                    CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                    Journal of Statistical Software 45

                                                    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                    Estimate Std ErrorSigma 009597 922e-05

                                                    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                    3 Closing comments

                                                    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                    Acknowledgments

                                                    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                    46 Social Network Analysis with sna

                                                    minus3 minus2 minus1 0 1 2

                                                    minus3minus2

                                                    minus10

                                                    12

                                                    Fitted vs Observed Values

                                                    y

                                                    y

                                                    minus3 minus2 minus1 0 1 2

                                                    minus02

                                                    minus01

                                                    00

                                                    01

                                                    02

                                                    Fitted Values vs Estimated Disturbances

                                                    y

                                                    ν

                                                    minus2 minus1 0 1 2

                                                    minus04

                                                    minus02

                                                    00

                                                    02

                                                    04

                                                    06

                                                    Normal QminusQ Residual Plot

                                                    Theoretical Quantiles

                                                    Sam

                                                    ple

                                                    Qua

                                                    ntile

                                                    s

                                                    Net Influence Plot

                                                    Figure 6 Plot method output for lnam

                                                    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                    References

                                                    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                    Journal of Statistical Software 47

                                                    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                    48 Social Network Analysis with sna

                                                    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                    Journal of Statistical Software 49

                                                    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                    50 Social Network Analysis with sna

                                                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                    Journal of Statistical Software 51

                                                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                    Affiliation

                                                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                    • Introduction and overview
                                                      • Package history
                                                      • sna and statnet
                                                      • Functionality
                                                      • Terminology and data representation
                                                        • Importing relational data into R
                                                            • Package highlights
                                                              • Random graph generation
                                                                • Example
                                                                  • Visualization and data manipulation
                                                                    • Neighborhood and ego net functions
                                                                    • Visualization
                                                                      • Descriptive indices
                                                                        • Node-level indices
                                                                        • Graph-level indices
                                                                          • Connectivity and subgraph statistics
                                                                            • Example
                                                                              • Position and role analysis
                                                                                • Example
                                                                                  • Exploratory edge set comparison
                                                                                    • Example
                                                                                      • Network inference and process models
                                                                                        • Example
                                                                                            • Closing comments

                                                      Journal of Statistical Software 27

                                                      Rgt o2scent lt- function(dat tmaxdev = FALSE )

                                                      + n lt- NROW(dat)

                                                      + if(tmaxdev)

                                                      + return((n-1) choose(n-1 2))

                                                      + odeg lt- degree(dat cmode = outdegree)

                                                      + choose(odeg 2)

                                                      +

                                                      Rgt apply(g 1 centralization o2scent)

                                                      [1] 002160494 020370370 054012346 008950617 014506173

                                                      Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

                                                      24 Connectivity and subgraph statistics

                                                      Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

                                                      sumNj=1

                                                      sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

                                                      is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

                                                      At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

                                                      28 Social Network Analysis with sna

                                                      subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                                      Example

                                                      To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                                      Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                                      Rgt apply(dyadcensus(g1) 2 mean)

                                                      Mut Asym Null100 1284 3116

                                                      Rgt apply(triadcensus(g1) 2 mean)

                                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                                      Journal of Statistical Software 29

                                                      120C 210 300030 000 000

                                                      Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                      Rgt apply(dyadcensus(g2) 2 mean)

                                                      Mut Asym Null884 926 2690

                                                      Rgt apply(triadcensus(g2) 2 mean)

                                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                      Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                      Rgt apply(dyadcensus(g3) 2 mean)

                                                      Mut Asym Null894 2044 1562

                                                      Rgt apply(triadcensus(g3) 2 mean)

                                                      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                      Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                      + dyadictabulation = bylength)$pathcount

                                                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                      Rgt kcyclecensus(g3[1] maxlen = 5

                                                      + cyclecomembership = bylength)$cyclecount

                                                      Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                      30 Social Network Analysis with sna

                                                      Rgt componentdist(g3[1])

                                                      $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                      $csize[1] 10

                                                      $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                      Rgt structurestatistics(g3[1])

                                                      0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                      In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                      Rgt g4 lt- g1[12]

                                                      Rgt g4[2] lt- g2[1]

                                                      Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                      + g1 = 1 g2 = 2)

                                                      Rgt summary(cug)

                                                      CUG Test Results

                                                      Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                      Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                      Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                      Rgt summary(cug)

                                                      Journal of Statistical Software 31

                                                      CUG Test Results

                                                      Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                      Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                      Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                      A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                      25 Position and role analysis

                                                      The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                      In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                      32 Social Network Analysis with sna

                                                      This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                      After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                      The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                      Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                      Example

                                                      To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                      Journal of Statistical Software 33

                                                      with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                      Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                      Rgt g lt- rgraph(20 tprob = gp)

                                                      Rgt eq lt- equivclust(g)

                                                      Rgt b lt- blockmodel(g eq h = 15)

                                                      Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                      Rgt ge

                                                      [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                      26 Exploratory edge set comparison

                                                      One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                      cov(GH) =

                                                      sum(ij)

                                                      (AG

                                                      ij minus microG

                                                      )(AH

                                                      ij minus microH

                                                      )|V | (|V | minus 1)

                                                      (3)

                                                      34 Social Network Analysis with sna

                                                      where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                      (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                      and the graph correlation ρ(GH) = cov(GH)radic

                                                      cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                      The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                      Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                      In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                      Journal of Statistical Software 35

                                                      Example

                                                      We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                      Rgt g1 lt- rgraph(5)

                                                      Rgt g2 lt -rgraph(5)

                                                      Rgt g3 lt- rmperm(g2)

                                                      Rgt gcor(g1 g2)

                                                      [1] -01336306

                                                      Rgt gcor(g1 g3)

                                                      [1] 008908708

                                                      Rgt gcor(g2 g3)

                                                      [1] -04583333

                                                      Rgt gscor(g1 g2 reps = 1e5)

                                                      [1] 05345225

                                                      Rgt gscor(g1 g3 reps = 1e5)

                                                      [1] 05345225

                                                      Rgt gscor(g2 g3 reps = 1e5)

                                                      [1] 1

                                                      Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                      Rgt x lt- rgraph(20 4)

                                                      Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                      Rgt nl lt- netlm(y x)

                                                      Rgt summary(nl)

                                                      36 Social Network Analysis with sna

                                                      OLS Network Model

                                                      Residuals0 25 50 75 100

                                                      -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                      CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                      (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                      Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                      Test Diagnostics

                                                      Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                      (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                      As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                      Rgt x lt- rgraph(20 4)

                                                      Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                      Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                      Rgt y lt- rgraph(20 tprob = yp)

                                                      Rgt nl lt- netlogit(y x)

                                                      Rgt summary(nl)

                                                      Network Logit Model

                                                      Coefficients

                                                      Journal of Statistical Software 37

                                                      Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                      Goodness of Fit Statistics

                                                      Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                      3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                      (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                      Contingency Table (predicted (rows) x actual (cols))

                                                      0 10 0 01 39 341

                                                      Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                      Test Diagnostics

                                                      Null Hypothesis qapReplications 1000Distribution Summary

                                                      (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                      It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                      38 Social Network Analysis with sna

                                                      parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                      27 Network inference and process models

                                                      A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                      Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                      Journal of Statistical Software 39

                                                      of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                      )prodk

                                                      (1minusPr(Bk)

                                                      )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                      While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                      y =

                                                      (wsum

                                                      i=1

                                                      θiWi

                                                      )y + Xβ + ε (4)

                                                      ε =

                                                      (zsum

                                                      i=1

                                                      ψiZi

                                                      )ε+ ν (5)

                                                      where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                      40 Social Network Analysis with sna

                                                      Example

                                                      To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                      Rgt g lt- rgraph(20)

                                                      Rgt ep lt- rbeta(20 1 25)

                                                      Rgt em lt- rbeta(20 15 25)

                                                      Rgt dat lt- array(dim = c(20 20 20))

                                                      Rgt for(i in 120)

                                                      + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                      Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                      Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                      Rgt pem[1] lt- 2

                                                      Rgt pem[2] lt- 11

                                                      Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                      Rgt pep[1] lt- 2

                                                      Rgt pep[2] lt- 11

                                                      Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                      + epprior = pep burntime = 300 draws = 100)

                                                      Rgt summary(b)

                                                      Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                      Multiple Error Probability Model

                                                      Marginal Posterior Network Distribution

                                                      a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                      Journal of Statistical Software 41

                                                      a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                      a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                      Marginal Posterior Global Error Distribution

                                                      e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                      Marginal Posterior Error Distribution (by observer)

                                                      Probability of False Negatives (e^-)

                                                      Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                      42 Social Network Analysis with sna

                                                      o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                      Probability of False Positives (e^+)

                                                      Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                      MCMC Diagnostics

                                                      Replicate Chains 5Burn Time 300

                                                      Journal of Statistical Software 43

                                                      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                      Max 1003116Med 09992194IQR 00004545115

                                                      Rgt cor(em apply(b$em 2 median))

                                                      [1] 09187894

                                                      Rgt cor(ep apply(b$ep 2 median))

                                                      [1] 0971649

                                                      Rgt mean(apply(b$net c(2 3) median) == g)

                                                      [1] 1

                                                      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                      Rgt mean(consensus(dat method = LASintersection) == g)

                                                      [1] 07725

                                                      Rgt mean(consensus(dat method = LASunion) == g)

                                                      [1] 0905

                                                      Rgt mean(consensus(dat method = centralgraph) == g)

                                                      [1] 09575

                                                      Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                      44 Social Network Analysis with sna

                                                      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                      Rgt w1 lt- rgraph(50)

                                                      Rgt w2 lt- rgraph(50)

                                                      Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                      Rgt r1 lt- 02

                                                      Rgt r2 lt- 03

                                                      Rgt sigma lt- 01

                                                      Rgt beta lt- rnorm(5)

                                                      Rgt nu lt- rnorm(50 0 sigma)

                                                      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                      Rgt fit lt- lnam(y x w1 w2)

                                                      Rgt summary(fit)

                                                      Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                      ResidualsMin 1Q Median 3Q Max

                                                      -052052 -018305 001156 015557 062082

                                                      CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                      Journal of Statistical Software 45

                                                      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                      Estimate Std ErrorSigma 009597 922e-05

                                                      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                      3 Closing comments

                                                      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                      Acknowledgments

                                                      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                      46 Social Network Analysis with sna

                                                      minus3 minus2 minus1 0 1 2

                                                      minus3minus2

                                                      minus10

                                                      12

                                                      Fitted vs Observed Values

                                                      y

                                                      y

                                                      minus3 minus2 minus1 0 1 2

                                                      minus02

                                                      minus01

                                                      00

                                                      01

                                                      02

                                                      Fitted Values vs Estimated Disturbances

                                                      y

                                                      ν

                                                      minus2 minus1 0 1 2

                                                      minus04

                                                      minus02

                                                      00

                                                      02

                                                      04

                                                      06

                                                      Normal QminusQ Residual Plot

                                                      Theoretical Quantiles

                                                      Sam

                                                      ple

                                                      Qua

                                                      ntile

                                                      s

                                                      Net Influence Plot

                                                      Figure 6 Plot method output for lnam

                                                      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                      References

                                                      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                      Journal of Statistical Software 47

                                                      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                      48 Social Network Analysis with sna

                                                      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                      Journal of Statistical Software 49

                                                      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                      50 Social Network Analysis with sna

                                                      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                      Journal of Statistical Software 51

                                                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                      Affiliation

                                                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                      • Introduction and overview
                                                        • Package history
                                                        • sna and statnet
                                                        • Functionality
                                                        • Terminology and data representation
                                                          • Importing relational data into R
                                                              • Package highlights
                                                                • Random graph generation
                                                                  • Example
                                                                    • Visualization and data manipulation
                                                                      • Neighborhood and ego net functions
                                                                      • Visualization
                                                                        • Descriptive indices
                                                                          • Node-level indices
                                                                          • Graph-level indices
                                                                            • Connectivity and subgraph statistics
                                                                              • Example
                                                                                • Position and role analysis
                                                                                  • Example
                                                                                    • Exploratory edge set comparison
                                                                                      • Example
                                                                                        • Network inference and process models
                                                                                          • Example
                                                                                              • Closing comments

                                                        28 Social Network Analysis with sna

                                                        subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

                                                        Example

                                                        To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

                                                        Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

                                                        Rgt apply(dyadcensus(g1) 2 mean)

                                                        Mut Asym Null100 1284 3116

                                                        Rgt apply(triadcensus(g1) 2 mean)

                                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

                                                        Journal of Statistical Software 29

                                                        120C 210 300030 000 000

                                                        Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                        Rgt apply(dyadcensus(g2) 2 mean)

                                                        Mut Asym Null884 926 2690

                                                        Rgt apply(triadcensus(g2) 2 mean)

                                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                        Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                        Rgt apply(dyadcensus(g3) 2 mean)

                                                        Mut Asym Null894 2044 1562

                                                        Rgt apply(triadcensus(g3) 2 mean)

                                                        003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                        Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                        + dyadictabulation = bylength)$pathcount

                                                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                        Rgt kcyclecensus(g3[1] maxlen = 5

                                                        + cyclecomembership = bylength)$cyclecount

                                                        Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                        30 Social Network Analysis with sna

                                                        Rgt componentdist(g3[1])

                                                        $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                        $csize[1] 10

                                                        $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                        Rgt structurestatistics(g3[1])

                                                        0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                        In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                        Rgt g4 lt- g1[12]

                                                        Rgt g4[2] lt- g2[1]

                                                        Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                        + g1 = 1 g2 = 2)

                                                        Rgt summary(cug)

                                                        CUG Test Results

                                                        Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                        Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                        Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                        Rgt summary(cug)

                                                        Journal of Statistical Software 31

                                                        CUG Test Results

                                                        Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                        Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                        Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                        A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                        25 Position and role analysis

                                                        The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                        In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                        32 Social Network Analysis with sna

                                                        This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                        After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                        The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                        Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                        Example

                                                        To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                        Journal of Statistical Software 33

                                                        with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                        Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                        Rgt g lt- rgraph(20 tprob = gp)

                                                        Rgt eq lt- equivclust(g)

                                                        Rgt b lt- blockmodel(g eq h = 15)

                                                        Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                        Rgt ge

                                                        [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                        26 Exploratory edge set comparison

                                                        One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                        cov(GH) =

                                                        sum(ij)

                                                        (AG

                                                        ij minus microG

                                                        )(AH

                                                        ij minus microH

                                                        )|V | (|V | minus 1)

                                                        (3)

                                                        34 Social Network Analysis with sna

                                                        where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                        (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                        and the graph correlation ρ(GH) = cov(GH)radic

                                                        cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                        The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                        Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                        In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                        Journal of Statistical Software 35

                                                        Example

                                                        We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                        Rgt g1 lt- rgraph(5)

                                                        Rgt g2 lt -rgraph(5)

                                                        Rgt g3 lt- rmperm(g2)

                                                        Rgt gcor(g1 g2)

                                                        [1] -01336306

                                                        Rgt gcor(g1 g3)

                                                        [1] 008908708

                                                        Rgt gcor(g2 g3)

                                                        [1] -04583333

                                                        Rgt gscor(g1 g2 reps = 1e5)

                                                        [1] 05345225

                                                        Rgt gscor(g1 g3 reps = 1e5)

                                                        [1] 05345225

                                                        Rgt gscor(g2 g3 reps = 1e5)

                                                        [1] 1

                                                        Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                        Rgt x lt- rgraph(20 4)

                                                        Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                        Rgt nl lt- netlm(y x)

                                                        Rgt summary(nl)

                                                        36 Social Network Analysis with sna

                                                        OLS Network Model

                                                        Residuals0 25 50 75 100

                                                        -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                        CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                        (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                        Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                        Test Diagnostics

                                                        Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                        (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                        As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                        Rgt x lt- rgraph(20 4)

                                                        Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                        Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                        Rgt y lt- rgraph(20 tprob = yp)

                                                        Rgt nl lt- netlogit(y x)

                                                        Rgt summary(nl)

                                                        Network Logit Model

                                                        Coefficients

                                                        Journal of Statistical Software 37

                                                        Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                        Goodness of Fit Statistics

                                                        Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                        3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                        (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                        Contingency Table (predicted (rows) x actual (cols))

                                                        0 10 0 01 39 341

                                                        Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                        Test Diagnostics

                                                        Null Hypothesis qapReplications 1000Distribution Summary

                                                        (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                        It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                        38 Social Network Analysis with sna

                                                        parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                        27 Network inference and process models

                                                        A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                        Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                        Journal of Statistical Software 39

                                                        of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                        )prodk

                                                        (1minusPr(Bk)

                                                        )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                        While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                        y =

                                                        (wsum

                                                        i=1

                                                        θiWi

                                                        )y + Xβ + ε (4)

                                                        ε =

                                                        (zsum

                                                        i=1

                                                        ψiZi

                                                        )ε+ ν (5)

                                                        where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                        40 Social Network Analysis with sna

                                                        Example

                                                        To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                        Rgt g lt- rgraph(20)

                                                        Rgt ep lt- rbeta(20 1 25)

                                                        Rgt em lt- rbeta(20 15 25)

                                                        Rgt dat lt- array(dim = c(20 20 20))

                                                        Rgt for(i in 120)

                                                        + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                        Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                        Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                        Rgt pem[1] lt- 2

                                                        Rgt pem[2] lt- 11

                                                        Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                        Rgt pep[1] lt- 2

                                                        Rgt pep[2] lt- 11

                                                        Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                        + epprior = pep burntime = 300 draws = 100)

                                                        Rgt summary(b)

                                                        Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                        Multiple Error Probability Model

                                                        Marginal Posterior Network Distribution

                                                        a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                        Journal of Statistical Software 41

                                                        a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                        a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                        Marginal Posterior Global Error Distribution

                                                        e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                        Marginal Posterior Error Distribution (by observer)

                                                        Probability of False Negatives (e^-)

                                                        Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                        42 Social Network Analysis with sna

                                                        o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                        Probability of False Positives (e^+)

                                                        Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                        MCMC Diagnostics

                                                        Replicate Chains 5Burn Time 300

                                                        Journal of Statistical Software 43

                                                        Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                        Max 1003116Med 09992194IQR 00004545115

                                                        Rgt cor(em apply(b$em 2 median))

                                                        [1] 09187894

                                                        Rgt cor(ep apply(b$ep 2 median))

                                                        [1] 0971649

                                                        Rgt mean(apply(b$net c(2 3) median) == g)

                                                        [1] 1

                                                        Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                        Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                        Rgt mean(consensus(dat method = LASintersection) == g)

                                                        [1] 07725

                                                        Rgt mean(consensus(dat method = LASunion) == g)

                                                        [1] 0905

                                                        Rgt mean(consensus(dat method = centralgraph) == g)

                                                        [1] 09575

                                                        Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                        44 Social Network Analysis with sna

                                                        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                        Rgt w1 lt- rgraph(50)

                                                        Rgt w2 lt- rgraph(50)

                                                        Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                        Rgt r1 lt- 02

                                                        Rgt r2 lt- 03

                                                        Rgt sigma lt- 01

                                                        Rgt beta lt- rnorm(5)

                                                        Rgt nu lt- rnorm(50 0 sigma)

                                                        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                        Rgt fit lt- lnam(y x w1 w2)

                                                        Rgt summary(fit)

                                                        Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                        ResidualsMin 1Q Median 3Q Max

                                                        -052052 -018305 001156 015557 062082

                                                        CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                        Journal of Statistical Software 45

                                                        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                        Estimate Std ErrorSigma 009597 922e-05

                                                        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                        3 Closing comments

                                                        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                        Acknowledgments

                                                        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                        46 Social Network Analysis with sna

                                                        minus3 minus2 minus1 0 1 2

                                                        minus3minus2

                                                        minus10

                                                        12

                                                        Fitted vs Observed Values

                                                        y

                                                        y

                                                        minus3 minus2 minus1 0 1 2

                                                        minus02

                                                        minus01

                                                        00

                                                        01

                                                        02

                                                        Fitted Values vs Estimated Disturbances

                                                        y

                                                        ν

                                                        minus2 minus1 0 1 2

                                                        minus04

                                                        minus02

                                                        00

                                                        02

                                                        04

                                                        06

                                                        Normal QminusQ Residual Plot

                                                        Theoretical Quantiles

                                                        Sam

                                                        ple

                                                        Qua

                                                        ntile

                                                        s

                                                        Net Influence Plot

                                                        Figure 6 Plot method output for lnam

                                                        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                        References

                                                        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                        Journal of Statistical Software 47

                                                        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                        48 Social Network Analysis with sna

                                                        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                        Journal of Statistical Software 49

                                                        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                        50 Social Network Analysis with sna

                                                        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                        Journal of Statistical Software 51

                                                        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                        Affiliation

                                                        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                        • Introduction and overview
                                                          • Package history
                                                          • sna and statnet
                                                          • Functionality
                                                          • Terminology and data representation
                                                            • Importing relational data into R
                                                                • Package highlights
                                                                  • Random graph generation
                                                                    • Example
                                                                      • Visualization and data manipulation
                                                                        • Neighborhood and ego net functions
                                                                        • Visualization
                                                                          • Descriptive indices
                                                                            • Node-level indices
                                                                            • Graph-level indices
                                                                              • Connectivity and subgraph statistics
                                                                                • Example
                                                                                  • Position and role analysis
                                                                                    • Example
                                                                                      • Exploratory edge set comparison
                                                                                        • Example
                                                                                          • Network inference and process models
                                                                                            • Example
                                                                                                • Closing comments

                                                          Journal of Statistical Software 29

                                                          120C 210 300030 000 000

                                                          Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

                                                          Rgt apply(dyadcensus(g2) 2 mean)

                                                          Mut Asym Null884 926 2690

                                                          Rgt apply(triadcensus(g2) 2 mean)

                                                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

                                                          Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

                                                          Rgt apply(dyadcensus(g3) 2 mean)

                                                          Mut Asym Null894 2044 1562

                                                          Rgt apply(triadcensus(g3) 2 mean)

                                                          003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

                                                          Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

                                                          + dyadictabulation = bylength)$pathcount

                                                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

                                                          Rgt kcyclecensus(g3[1] maxlen = 5

                                                          + cyclecomembership = bylength)$cyclecount

                                                          Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

                                                          30 Social Network Analysis with sna

                                                          Rgt componentdist(g3[1])

                                                          $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                          $csize[1] 10

                                                          $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                          Rgt structurestatistics(g3[1])

                                                          0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                          In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                          Rgt g4 lt- g1[12]

                                                          Rgt g4[2] lt- g2[1]

                                                          Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                          + g1 = 1 g2 = 2)

                                                          Rgt summary(cug)

                                                          CUG Test Results

                                                          Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                          Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                          Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                          Rgt summary(cug)

                                                          Journal of Statistical Software 31

                                                          CUG Test Results

                                                          Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                          Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                          Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                          A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                          25 Position and role analysis

                                                          The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                          In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                          32 Social Network Analysis with sna

                                                          This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                          After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                          The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                          Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                          Example

                                                          To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                          Journal of Statistical Software 33

                                                          with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                          Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                          Rgt g lt- rgraph(20 tprob = gp)

                                                          Rgt eq lt- equivclust(g)

                                                          Rgt b lt- blockmodel(g eq h = 15)

                                                          Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                          Rgt ge

                                                          [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                          26 Exploratory edge set comparison

                                                          One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                          cov(GH) =

                                                          sum(ij)

                                                          (AG

                                                          ij minus microG

                                                          )(AH

                                                          ij minus microH

                                                          )|V | (|V | minus 1)

                                                          (3)

                                                          34 Social Network Analysis with sna

                                                          where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                          (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                          and the graph correlation ρ(GH) = cov(GH)radic

                                                          cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                          The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                          Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                          In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                          Journal of Statistical Software 35

                                                          Example

                                                          We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                          Rgt g1 lt- rgraph(5)

                                                          Rgt g2 lt -rgraph(5)

                                                          Rgt g3 lt- rmperm(g2)

                                                          Rgt gcor(g1 g2)

                                                          [1] -01336306

                                                          Rgt gcor(g1 g3)

                                                          [1] 008908708

                                                          Rgt gcor(g2 g3)

                                                          [1] -04583333

                                                          Rgt gscor(g1 g2 reps = 1e5)

                                                          [1] 05345225

                                                          Rgt gscor(g1 g3 reps = 1e5)

                                                          [1] 05345225

                                                          Rgt gscor(g2 g3 reps = 1e5)

                                                          [1] 1

                                                          Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                          Rgt x lt- rgraph(20 4)

                                                          Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                          Rgt nl lt- netlm(y x)

                                                          Rgt summary(nl)

                                                          36 Social Network Analysis with sna

                                                          OLS Network Model

                                                          Residuals0 25 50 75 100

                                                          -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                          CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                          (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                          Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                          Test Diagnostics

                                                          Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                          (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                          As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                          Rgt x lt- rgraph(20 4)

                                                          Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                          Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                          Rgt y lt- rgraph(20 tprob = yp)

                                                          Rgt nl lt- netlogit(y x)

                                                          Rgt summary(nl)

                                                          Network Logit Model

                                                          Coefficients

                                                          Journal of Statistical Software 37

                                                          Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                          Goodness of Fit Statistics

                                                          Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                          3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                          (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                          Contingency Table (predicted (rows) x actual (cols))

                                                          0 10 0 01 39 341

                                                          Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                          Test Diagnostics

                                                          Null Hypothesis qapReplications 1000Distribution Summary

                                                          (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                          It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                          38 Social Network Analysis with sna

                                                          parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                          27 Network inference and process models

                                                          A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                          Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                          Journal of Statistical Software 39

                                                          of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                          )prodk

                                                          (1minusPr(Bk)

                                                          )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                          While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                          y =

                                                          (wsum

                                                          i=1

                                                          θiWi

                                                          )y + Xβ + ε (4)

                                                          ε =

                                                          (zsum

                                                          i=1

                                                          ψiZi

                                                          )ε+ ν (5)

                                                          where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                          40 Social Network Analysis with sna

                                                          Example

                                                          To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                          Rgt g lt- rgraph(20)

                                                          Rgt ep lt- rbeta(20 1 25)

                                                          Rgt em lt- rbeta(20 15 25)

                                                          Rgt dat lt- array(dim = c(20 20 20))

                                                          Rgt for(i in 120)

                                                          + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                          Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                          Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                          Rgt pem[1] lt- 2

                                                          Rgt pem[2] lt- 11

                                                          Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                          Rgt pep[1] lt- 2

                                                          Rgt pep[2] lt- 11

                                                          Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                          + epprior = pep burntime = 300 draws = 100)

                                                          Rgt summary(b)

                                                          Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                          Multiple Error Probability Model

                                                          Marginal Posterior Network Distribution

                                                          a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                          Journal of Statistical Software 41

                                                          a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                          a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                          Marginal Posterior Global Error Distribution

                                                          e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                          Marginal Posterior Error Distribution (by observer)

                                                          Probability of False Negatives (e^-)

                                                          Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                          42 Social Network Analysis with sna

                                                          o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                          Probability of False Positives (e^+)

                                                          Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                          MCMC Diagnostics

                                                          Replicate Chains 5Burn Time 300

                                                          Journal of Statistical Software 43

                                                          Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                          Max 1003116Med 09992194IQR 00004545115

                                                          Rgt cor(em apply(b$em 2 median))

                                                          [1] 09187894

                                                          Rgt cor(ep apply(b$ep 2 median))

                                                          [1] 0971649

                                                          Rgt mean(apply(b$net c(2 3) median) == g)

                                                          [1] 1

                                                          Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                          Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                          Rgt mean(consensus(dat method = LASintersection) == g)

                                                          [1] 07725

                                                          Rgt mean(consensus(dat method = LASunion) == g)

                                                          [1] 0905

                                                          Rgt mean(consensus(dat method = centralgraph) == g)

                                                          [1] 09575

                                                          Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                          44 Social Network Analysis with sna

                                                          Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                          For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                          As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                          Rgt w1 lt- rgraph(50)

                                                          Rgt w2 lt- rgraph(50)

                                                          Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                          Rgt r1 lt- 02

                                                          Rgt r2 lt- 03

                                                          Rgt sigma lt- 01

                                                          Rgt beta lt- rnorm(5)

                                                          Rgt nu lt- rnorm(50 0 sigma)

                                                          Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                          Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                          Rgt fit lt- lnam(y x w1 w2)

                                                          Rgt summary(fit)

                                                          Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                          ResidualsMin 1Q Median 3Q Max

                                                          -052052 -018305 001156 015557 062082

                                                          CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                          X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                          Journal of Statistical Software 45

                                                          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                          Estimate Std ErrorSigma 009597 922e-05

                                                          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                          3 Closing comments

                                                          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                          Acknowledgments

                                                          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                          46 Social Network Analysis with sna

                                                          minus3 minus2 minus1 0 1 2

                                                          minus3minus2

                                                          minus10

                                                          12

                                                          Fitted vs Observed Values

                                                          y

                                                          y

                                                          minus3 minus2 minus1 0 1 2

                                                          minus02

                                                          minus01

                                                          00

                                                          01

                                                          02

                                                          Fitted Values vs Estimated Disturbances

                                                          y

                                                          ν

                                                          minus2 minus1 0 1 2

                                                          minus04

                                                          minus02

                                                          00

                                                          02

                                                          04

                                                          06

                                                          Normal QminusQ Residual Plot

                                                          Theoretical Quantiles

                                                          Sam

                                                          ple

                                                          Qua

                                                          ntile

                                                          s

                                                          Net Influence Plot

                                                          Figure 6 Plot method output for lnam

                                                          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                          References

                                                          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                          Journal of Statistical Software 47

                                                          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                          48 Social Network Analysis with sna

                                                          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                          Journal of Statistical Software 49

                                                          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                          50 Social Network Analysis with sna

                                                          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                          Journal of Statistical Software 51

                                                          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                          Affiliation

                                                          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                          • Introduction and overview
                                                            • Package history
                                                            • sna and statnet
                                                            • Functionality
                                                            • Terminology and data representation
                                                              • Importing relational data into R
                                                                  • Package highlights
                                                                    • Random graph generation
                                                                      • Example
                                                                        • Visualization and data manipulation
                                                                          • Neighborhood and ego net functions
                                                                          • Visualization
                                                                            • Descriptive indices
                                                                              • Node-level indices
                                                                              • Graph-level indices
                                                                                • Connectivity and subgraph statistics
                                                                                  • Example
                                                                                    • Position and role analysis
                                                                                      • Example
                                                                                        • Exploratory edge set comparison
                                                                                          • Example
                                                                                            • Network inference and process models
                                                                                              • Example
                                                                                                  • Closing comments

                                                            30 Social Network Analysis with sna

                                                            Rgt componentdist(g3[1])

                                                            $membership[1] 1 1 1 1 1 1 1 1 1 1

                                                            $csize[1] 10

                                                            $cdist[1] 0 0 0 0 0 0 0 0 0 1

                                                            Rgt structurestatistics(g3[1])

                                                            0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

                                                            In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

                                                            Rgt g4 lt- g1[12]

                                                            Rgt g4[2] lt- g2[1]

                                                            Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

                                                            + g1 = 1 g2 = 2)

                                                            Rgt summary(cug)

                                                            CUG Test Results

                                                            Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

                                                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                            Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

                                                            Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

                                                            Rgt summary(cug)

                                                            Journal of Statistical Software 31

                                                            CUG Test Results

                                                            Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                            Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                            Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                            A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                            25 Position and role analysis

                                                            The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                            In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                            32 Social Network Analysis with sna

                                                            This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                            After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                            The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                            Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                            Example

                                                            To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                            Journal of Statistical Software 33

                                                            with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                            Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                            Rgt g lt- rgraph(20 tprob = gp)

                                                            Rgt eq lt- equivclust(g)

                                                            Rgt b lt- blockmodel(g eq h = 15)

                                                            Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                            Rgt ge

                                                            [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                            26 Exploratory edge set comparison

                                                            One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                            cov(GH) =

                                                            sum(ij)

                                                            (AG

                                                            ij minus microG

                                                            )(AH

                                                            ij minus microH

                                                            )|V | (|V | minus 1)

                                                            (3)

                                                            34 Social Network Analysis with sna

                                                            where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                            (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                            and the graph correlation ρ(GH) = cov(GH)radic

                                                            cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                            The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                            Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                            In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                            Journal of Statistical Software 35

                                                            Example

                                                            We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                            Rgt g1 lt- rgraph(5)

                                                            Rgt g2 lt -rgraph(5)

                                                            Rgt g3 lt- rmperm(g2)

                                                            Rgt gcor(g1 g2)

                                                            [1] -01336306

                                                            Rgt gcor(g1 g3)

                                                            [1] 008908708

                                                            Rgt gcor(g2 g3)

                                                            [1] -04583333

                                                            Rgt gscor(g1 g2 reps = 1e5)

                                                            [1] 05345225

                                                            Rgt gscor(g1 g3 reps = 1e5)

                                                            [1] 05345225

                                                            Rgt gscor(g2 g3 reps = 1e5)

                                                            [1] 1

                                                            Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                            Rgt x lt- rgraph(20 4)

                                                            Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                            Rgt nl lt- netlm(y x)

                                                            Rgt summary(nl)

                                                            36 Social Network Analysis with sna

                                                            OLS Network Model

                                                            Residuals0 25 50 75 100

                                                            -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                            CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                            (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                            Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                            Test Diagnostics

                                                            Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                            (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                            As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                            Rgt x lt- rgraph(20 4)

                                                            Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                            Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                            Rgt y lt- rgraph(20 tprob = yp)

                                                            Rgt nl lt- netlogit(y x)

                                                            Rgt summary(nl)

                                                            Network Logit Model

                                                            Coefficients

                                                            Journal of Statistical Software 37

                                                            Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                            Goodness of Fit Statistics

                                                            Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                            3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                            (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                            Contingency Table (predicted (rows) x actual (cols))

                                                            0 10 0 01 39 341

                                                            Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                            Test Diagnostics

                                                            Null Hypothesis qapReplications 1000Distribution Summary

                                                            (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                            It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                            38 Social Network Analysis with sna

                                                            parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                            27 Network inference and process models

                                                            A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                            Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                            Journal of Statistical Software 39

                                                            of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                            )prodk

                                                            (1minusPr(Bk)

                                                            )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                            While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                            y =

                                                            (wsum

                                                            i=1

                                                            θiWi

                                                            )y + Xβ + ε (4)

                                                            ε =

                                                            (zsum

                                                            i=1

                                                            ψiZi

                                                            )ε+ ν (5)

                                                            where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                            40 Social Network Analysis with sna

                                                            Example

                                                            To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                            Rgt g lt- rgraph(20)

                                                            Rgt ep lt- rbeta(20 1 25)

                                                            Rgt em lt- rbeta(20 15 25)

                                                            Rgt dat lt- array(dim = c(20 20 20))

                                                            Rgt for(i in 120)

                                                            + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                            Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                            Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                            Rgt pem[1] lt- 2

                                                            Rgt pem[2] lt- 11

                                                            Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                            Rgt pep[1] lt- 2

                                                            Rgt pep[2] lt- 11

                                                            Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                            + epprior = pep burntime = 300 draws = 100)

                                                            Rgt summary(b)

                                                            Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                            Multiple Error Probability Model

                                                            Marginal Posterior Network Distribution

                                                            a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                            Journal of Statistical Software 41

                                                            a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                            a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                            Marginal Posterior Global Error Distribution

                                                            e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                            Marginal Posterior Error Distribution (by observer)

                                                            Probability of False Negatives (e^-)

                                                            Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                            42 Social Network Analysis with sna

                                                            o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                            Probability of False Positives (e^+)

                                                            Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                            MCMC Diagnostics

                                                            Replicate Chains 5Burn Time 300

                                                            Journal of Statistical Software 43

                                                            Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                            Max 1003116Med 09992194IQR 00004545115

                                                            Rgt cor(em apply(b$em 2 median))

                                                            [1] 09187894

                                                            Rgt cor(ep apply(b$ep 2 median))

                                                            [1] 0971649

                                                            Rgt mean(apply(b$net c(2 3) median) == g)

                                                            [1] 1

                                                            Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                            Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                            Rgt mean(consensus(dat method = LASintersection) == g)

                                                            [1] 07725

                                                            Rgt mean(consensus(dat method = LASunion) == g)

                                                            [1] 0905

                                                            Rgt mean(consensus(dat method = centralgraph) == g)

                                                            [1] 09575

                                                            Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                            44 Social Network Analysis with sna

                                                            Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                            For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                            As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                            Rgt w1 lt- rgraph(50)

                                                            Rgt w2 lt- rgraph(50)

                                                            Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                            Rgt r1 lt- 02

                                                            Rgt r2 lt- 03

                                                            Rgt sigma lt- 01

                                                            Rgt beta lt- rnorm(5)

                                                            Rgt nu lt- rnorm(50 0 sigma)

                                                            Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                            Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                            Rgt fit lt- lnam(y x w1 w2)

                                                            Rgt summary(fit)

                                                            Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                            ResidualsMin 1Q Median 3Q Max

                                                            -052052 -018305 001156 015557 062082

                                                            CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                            X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                            Journal of Statistical Software 45

                                                            X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                            Estimate Std ErrorSigma 009597 922e-05

                                                            Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                            Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                            In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                            3 Closing comments

                                                            The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                            Acknowledgments

                                                            The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                            46 Social Network Analysis with sna

                                                            minus3 minus2 minus1 0 1 2

                                                            minus3minus2

                                                            minus10

                                                            12

                                                            Fitted vs Observed Values

                                                            y

                                                            y

                                                            minus3 minus2 minus1 0 1 2

                                                            minus02

                                                            minus01

                                                            00

                                                            01

                                                            02

                                                            Fitted Values vs Estimated Disturbances

                                                            y

                                                            ν

                                                            minus2 minus1 0 1 2

                                                            minus04

                                                            minus02

                                                            00

                                                            02

                                                            04

                                                            06

                                                            Normal QminusQ Residual Plot

                                                            Theoretical Quantiles

                                                            Sam

                                                            ple

                                                            Qua

                                                            ntile

                                                            s

                                                            Net Influence Plot

                                                            Figure 6 Plot method output for lnam

                                                            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                            References

                                                            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                            Journal of Statistical Software 47

                                                            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                            48 Social Network Analysis with sna

                                                            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                            Journal of Statistical Software 49

                                                            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                            50 Social Network Analysis with sna

                                                            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                            Journal of Statistical Software 51

                                                            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                            Affiliation

                                                            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                            • Introduction and overview
                                                              • Package history
                                                              • sna and statnet
                                                              • Functionality
                                                              • Terminology and data representation
                                                                • Importing relational data into R
                                                                    • Package highlights
                                                                      • Random graph generation
                                                                        • Example
                                                                          • Visualization and data manipulation
                                                                            • Neighborhood and ego net functions
                                                                            • Visualization
                                                                              • Descriptive indices
                                                                                • Node-level indices
                                                                                • Graph-level indices
                                                                                  • Connectivity and subgraph statistics
                                                                                    • Example
                                                                                      • Position and role analysis
                                                                                        • Example
                                                                                          • Exploratory edge set comparison
                                                                                            • Example
                                                                                              • Network inference and process models
                                                                                                • Example
                                                                                                    • Closing comments

                                                              Journal of Statistical Software 31

                                                              CUG Test Results

                                                              Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

                                                              Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

                                                              Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

                                                              A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

                                                              25 Position and role analysis

                                                              The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

                                                              In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

                                                              32 Social Network Analysis with sna

                                                              This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                              After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                              The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                              Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                              Example

                                                              To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                              Journal of Statistical Software 33

                                                              with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                              Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                              Rgt g lt- rgraph(20 tprob = gp)

                                                              Rgt eq lt- equivclust(g)

                                                              Rgt b lt- blockmodel(g eq h = 15)

                                                              Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                              Rgt ge

                                                              [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                              26 Exploratory edge set comparison

                                                              One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                              cov(GH) =

                                                              sum(ij)

                                                              (AG

                                                              ij minus microG

                                                              )(AH

                                                              ij minus microH

                                                              )|V | (|V | minus 1)

                                                              (3)

                                                              34 Social Network Analysis with sna

                                                              where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                              (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                              and the graph correlation ρ(GH) = cov(GH)radic

                                                              cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                              The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                              Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                              In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                              Journal of Statistical Software 35

                                                              Example

                                                              We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                              Rgt g1 lt- rgraph(5)

                                                              Rgt g2 lt -rgraph(5)

                                                              Rgt g3 lt- rmperm(g2)

                                                              Rgt gcor(g1 g2)

                                                              [1] -01336306

                                                              Rgt gcor(g1 g3)

                                                              [1] 008908708

                                                              Rgt gcor(g2 g3)

                                                              [1] -04583333

                                                              Rgt gscor(g1 g2 reps = 1e5)

                                                              [1] 05345225

                                                              Rgt gscor(g1 g3 reps = 1e5)

                                                              [1] 05345225

                                                              Rgt gscor(g2 g3 reps = 1e5)

                                                              [1] 1

                                                              Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                              Rgt x lt- rgraph(20 4)

                                                              Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                              Rgt nl lt- netlm(y x)

                                                              Rgt summary(nl)

                                                              36 Social Network Analysis with sna

                                                              OLS Network Model

                                                              Residuals0 25 50 75 100

                                                              -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                              CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                              (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                              Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                              Test Diagnostics

                                                              Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                              (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                              As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                              Rgt x lt- rgraph(20 4)

                                                              Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                              Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                              Rgt y lt- rgraph(20 tprob = yp)

                                                              Rgt nl lt- netlogit(y x)

                                                              Rgt summary(nl)

                                                              Network Logit Model

                                                              Coefficients

                                                              Journal of Statistical Software 37

                                                              Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                              Goodness of Fit Statistics

                                                              Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                              3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                              (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                              Contingency Table (predicted (rows) x actual (cols))

                                                              0 10 0 01 39 341

                                                              Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                              Test Diagnostics

                                                              Null Hypothesis qapReplications 1000Distribution Summary

                                                              (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                              It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                              38 Social Network Analysis with sna

                                                              parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                              27 Network inference and process models

                                                              A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                              Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                              Journal of Statistical Software 39

                                                              of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                              )prodk

                                                              (1minusPr(Bk)

                                                              )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                              While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                              y =

                                                              (wsum

                                                              i=1

                                                              θiWi

                                                              )y + Xβ + ε (4)

                                                              ε =

                                                              (zsum

                                                              i=1

                                                              ψiZi

                                                              )ε+ ν (5)

                                                              where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                              40 Social Network Analysis with sna

                                                              Example

                                                              To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                              Rgt g lt- rgraph(20)

                                                              Rgt ep lt- rbeta(20 1 25)

                                                              Rgt em lt- rbeta(20 15 25)

                                                              Rgt dat lt- array(dim = c(20 20 20))

                                                              Rgt for(i in 120)

                                                              + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                              Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                              Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                              Rgt pem[1] lt- 2

                                                              Rgt pem[2] lt- 11

                                                              Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                              Rgt pep[1] lt- 2

                                                              Rgt pep[2] lt- 11

                                                              Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                              + epprior = pep burntime = 300 draws = 100)

                                                              Rgt summary(b)

                                                              Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                              Multiple Error Probability Model

                                                              Marginal Posterior Network Distribution

                                                              a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                              Journal of Statistical Software 41

                                                              a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                              a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                              Marginal Posterior Global Error Distribution

                                                              e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                              Marginal Posterior Error Distribution (by observer)

                                                              Probability of False Negatives (e^-)

                                                              Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                              42 Social Network Analysis with sna

                                                              o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                              Probability of False Positives (e^+)

                                                              Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                              MCMC Diagnostics

                                                              Replicate Chains 5Burn Time 300

                                                              Journal of Statistical Software 43

                                                              Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                              Max 1003116Med 09992194IQR 00004545115

                                                              Rgt cor(em apply(b$em 2 median))

                                                              [1] 09187894

                                                              Rgt cor(ep apply(b$ep 2 median))

                                                              [1] 0971649

                                                              Rgt mean(apply(b$net c(2 3) median) == g)

                                                              [1] 1

                                                              Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                              Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                              Rgt mean(consensus(dat method = LASintersection) == g)

                                                              [1] 07725

                                                              Rgt mean(consensus(dat method = LASunion) == g)

                                                              [1] 0905

                                                              Rgt mean(consensus(dat method = centralgraph) == g)

                                                              [1] 09575

                                                              Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                              44 Social Network Analysis with sna

                                                              Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                              For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                              As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                              Rgt w1 lt- rgraph(50)

                                                              Rgt w2 lt- rgraph(50)

                                                              Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                              Rgt r1 lt- 02

                                                              Rgt r2 lt- 03

                                                              Rgt sigma lt- 01

                                                              Rgt beta lt- rnorm(5)

                                                              Rgt nu lt- rnorm(50 0 sigma)

                                                              Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                              Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                              Rgt fit lt- lnam(y x w1 w2)

                                                              Rgt summary(fit)

                                                              Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                              ResidualsMin 1Q Median 3Q Max

                                                              -052052 -018305 001156 015557 062082

                                                              CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                              X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                              Journal of Statistical Software 45

                                                              X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                              Estimate Std ErrorSigma 009597 922e-05

                                                              Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                              Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                              In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                              3 Closing comments

                                                              The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                              Acknowledgments

                                                              The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                              46 Social Network Analysis with sna

                                                              minus3 minus2 minus1 0 1 2

                                                              minus3minus2

                                                              minus10

                                                              12

                                                              Fitted vs Observed Values

                                                              y

                                                              y

                                                              minus3 minus2 minus1 0 1 2

                                                              minus02

                                                              minus01

                                                              00

                                                              01

                                                              02

                                                              Fitted Values vs Estimated Disturbances

                                                              y

                                                              ν

                                                              minus2 minus1 0 1 2

                                                              minus04

                                                              minus02

                                                              00

                                                              02

                                                              04

                                                              06

                                                              Normal QminusQ Residual Plot

                                                              Theoretical Quantiles

                                                              Sam

                                                              ple

                                                              Qua

                                                              ntile

                                                              s

                                                              Net Influence Plot

                                                              Figure 6 Plot method output for lnam

                                                              team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                              References

                                                              Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                              Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                              Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                              Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                              Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                              Journal of Statistical Software 47

                                                              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                              48 Social Network Analysis with sna

                                                              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                              Journal of Statistical Software 49

                                                              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                              50 Social Network Analysis with sna

                                                              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                              Journal of Statistical Software 51

                                                              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                              Affiliation

                                                              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                              • Introduction and overview
                                                                • Package history
                                                                • sna and statnet
                                                                • Functionality
                                                                • Terminology and data representation
                                                                  • Importing relational data into R
                                                                      • Package highlights
                                                                        • Random graph generation
                                                                          • Example
                                                                            • Visualization and data manipulation
                                                                              • Neighborhood and ego net functions
                                                                              • Visualization
                                                                                • Descriptive indices
                                                                                  • Node-level indices
                                                                                  • Graph-level indices
                                                                                    • Connectivity and subgraph statistics
                                                                                      • Example
                                                                                        • Position and role analysis
                                                                                          • Example
                                                                                            • Exploratory edge set comparison
                                                                                              • Example
                                                                                                • Network inference and process models
                                                                                                  • Example
                                                                                                      • Closing comments

                                                                32 Social Network Analysis with sna

                                                                This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

                                                                After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

                                                                The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

                                                                Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

                                                                Example

                                                                To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

                                                                Journal of Statistical Software 33

                                                                with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                                Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                                Rgt g lt- rgraph(20 tprob = gp)

                                                                Rgt eq lt- equivclust(g)

                                                                Rgt b lt- blockmodel(g eq h = 15)

                                                                Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                                Rgt ge

                                                                [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                                26 Exploratory edge set comparison

                                                                One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                                cov(GH) =

                                                                sum(ij)

                                                                (AG

                                                                ij minus microG

                                                                )(AH

                                                                ij minus microH

                                                                )|V | (|V | minus 1)

                                                                (3)

                                                                34 Social Network Analysis with sna

                                                                where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                                (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                                and the graph correlation ρ(GH) = cov(GH)radic

                                                                cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                                The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                                Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                                In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                                Journal of Statistical Software 35

                                                                Example

                                                                We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                                Rgt g1 lt- rgraph(5)

                                                                Rgt g2 lt -rgraph(5)

                                                                Rgt g3 lt- rmperm(g2)

                                                                Rgt gcor(g1 g2)

                                                                [1] -01336306

                                                                Rgt gcor(g1 g3)

                                                                [1] 008908708

                                                                Rgt gcor(g2 g3)

                                                                [1] -04583333

                                                                Rgt gscor(g1 g2 reps = 1e5)

                                                                [1] 05345225

                                                                Rgt gscor(g1 g3 reps = 1e5)

                                                                [1] 05345225

                                                                Rgt gscor(g2 g3 reps = 1e5)

                                                                [1] 1

                                                                Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                                Rgt x lt- rgraph(20 4)

                                                                Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                                Rgt nl lt- netlm(y x)

                                                                Rgt summary(nl)

                                                                36 Social Network Analysis with sna

                                                                OLS Network Model

                                                                Residuals0 25 50 75 100

                                                                -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                                CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                                (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                                Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                                Test Diagnostics

                                                                Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                                (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                                As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                                Rgt x lt- rgraph(20 4)

                                                                Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                                Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                                Rgt y lt- rgraph(20 tprob = yp)

                                                                Rgt nl lt- netlogit(y x)

                                                                Rgt summary(nl)

                                                                Network Logit Model

                                                                Coefficients

                                                                Journal of Statistical Software 37

                                                                Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                Goodness of Fit Statistics

                                                                Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                Contingency Table (predicted (rows) x actual (cols))

                                                                0 10 0 01 39 341

                                                                Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                Test Diagnostics

                                                                Null Hypothesis qapReplications 1000Distribution Summary

                                                                (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                38 Social Network Analysis with sna

                                                                parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                27 Network inference and process models

                                                                A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                Journal of Statistical Software 39

                                                                of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                )prodk

                                                                (1minusPr(Bk)

                                                                )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                y =

                                                                (wsum

                                                                i=1

                                                                θiWi

                                                                )y + Xβ + ε (4)

                                                                ε =

                                                                (zsum

                                                                i=1

                                                                ψiZi

                                                                )ε+ ν (5)

                                                                where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                40 Social Network Analysis with sna

                                                                Example

                                                                To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                Rgt g lt- rgraph(20)

                                                                Rgt ep lt- rbeta(20 1 25)

                                                                Rgt em lt- rbeta(20 15 25)

                                                                Rgt dat lt- array(dim = c(20 20 20))

                                                                Rgt for(i in 120)

                                                                + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                Rgt pem[1] lt- 2

                                                                Rgt pem[2] lt- 11

                                                                Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                Rgt pep[1] lt- 2

                                                                Rgt pep[2] lt- 11

                                                                Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                + epprior = pep burntime = 300 draws = 100)

                                                                Rgt summary(b)

                                                                Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                Multiple Error Probability Model

                                                                Marginal Posterior Network Distribution

                                                                a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                Journal of Statistical Software 41

                                                                a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                Marginal Posterior Global Error Distribution

                                                                e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                Marginal Posterior Error Distribution (by observer)

                                                                Probability of False Negatives (e^-)

                                                                Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                42 Social Network Analysis with sna

                                                                o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                Probability of False Positives (e^+)

                                                                Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                MCMC Diagnostics

                                                                Replicate Chains 5Burn Time 300

                                                                Journal of Statistical Software 43

                                                                Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                Max 1003116Med 09992194IQR 00004545115

                                                                Rgt cor(em apply(b$em 2 median))

                                                                [1] 09187894

                                                                Rgt cor(ep apply(b$ep 2 median))

                                                                [1] 0971649

                                                                Rgt mean(apply(b$net c(2 3) median) == g)

                                                                [1] 1

                                                                Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                Rgt mean(consensus(dat method = LASintersection) == g)

                                                                [1] 07725

                                                                Rgt mean(consensus(dat method = LASunion) == g)

                                                                [1] 0905

                                                                Rgt mean(consensus(dat method = centralgraph) == g)

                                                                [1] 09575

                                                                Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                44 Social Network Analysis with sna

                                                                Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                Rgt w1 lt- rgraph(50)

                                                                Rgt w2 lt- rgraph(50)

                                                                Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                Rgt r1 lt- 02

                                                                Rgt r2 lt- 03

                                                                Rgt sigma lt- 01

                                                                Rgt beta lt- rnorm(5)

                                                                Rgt nu lt- rnorm(50 0 sigma)

                                                                Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                Rgt fit lt- lnam(y x w1 w2)

                                                                Rgt summary(fit)

                                                                Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                ResidualsMin 1Q Median 3Q Max

                                                                -052052 -018305 001156 015557 062082

                                                                CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                Journal of Statistical Software 45

                                                                X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                Estimate Std ErrorSigma 009597 922e-05

                                                                Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                3 Closing comments

                                                                The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                Acknowledgments

                                                                The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                46 Social Network Analysis with sna

                                                                minus3 minus2 minus1 0 1 2

                                                                minus3minus2

                                                                minus10

                                                                12

                                                                Fitted vs Observed Values

                                                                y

                                                                y

                                                                minus3 minus2 minus1 0 1 2

                                                                minus02

                                                                minus01

                                                                00

                                                                01

                                                                02

                                                                Fitted Values vs Estimated Disturbances

                                                                y

                                                                ν

                                                                minus2 minus1 0 1 2

                                                                minus04

                                                                minus02

                                                                00

                                                                02

                                                                04

                                                                06

                                                                Normal QminusQ Residual Plot

                                                                Theoretical Quantiles

                                                                Sam

                                                                ple

                                                                Qua

                                                                ntile

                                                                s

                                                                Net Influence Plot

                                                                Figure 6 Plot method output for lnam

                                                                team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                References

                                                                Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                Journal of Statistical Software 47

                                                                Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                48 Social Network Analysis with sna

                                                                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                Journal of Statistical Software 49

                                                                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                50 Social Network Analysis with sna

                                                                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                Journal of Statistical Software 51

                                                                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                Affiliation

                                                                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                • Introduction and overview
                                                                  • Package history
                                                                  • sna and statnet
                                                                  • Functionality
                                                                  • Terminology and data representation
                                                                    • Importing relational data into R
                                                                        • Package highlights
                                                                          • Random graph generation
                                                                            • Example
                                                                              • Visualization and data manipulation
                                                                                • Neighborhood and ego net functions
                                                                                • Visualization
                                                                                  • Descriptive indices
                                                                                    • Node-level indices
                                                                                    • Graph-level indices
                                                                                      • Connectivity and subgraph statistics
                                                                                        • Example
                                                                                          • Position and role analysis
                                                                                            • Example
                                                                                              • Exploratory edge set comparison
                                                                                                • Example
                                                                                                  • Network inference and process models
                                                                                                    • Example
                                                                                                        • Closing comments

                                                                  Journal of Statistical Software 33

                                                                  with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

                                                                  Rgt gp lt- sapply(runif(20 0 1) rep 20)

                                                                  Rgt g lt- rgraph(20 tprob = gp)

                                                                  Rgt eq lt- equivclust(g)

                                                                  Rgt b lt- blockmodel(g eq h = 15)

                                                                  Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

                                                                  Rgt ge

                                                                  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

                                                                  26 Exploratory edge set comparison

                                                                  One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

                                                                  cov(GH) =

                                                                  sum(ij)

                                                                  (AG

                                                                  ij minus microG

                                                                  )(AH

                                                                  ij minus microH

                                                                  )|V | (|V | minus 1)

                                                                  (3)

                                                                  34 Social Network Analysis with sna

                                                                  where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                                  (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                                  and the graph correlation ρ(GH) = cov(GH)radic

                                                                  cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                                  The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                                  Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                                  In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                                  Journal of Statistical Software 35

                                                                  Example

                                                                  We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                                  Rgt g1 lt- rgraph(5)

                                                                  Rgt g2 lt -rgraph(5)

                                                                  Rgt g3 lt- rmperm(g2)

                                                                  Rgt gcor(g1 g2)

                                                                  [1] -01336306

                                                                  Rgt gcor(g1 g3)

                                                                  [1] 008908708

                                                                  Rgt gcor(g2 g3)

                                                                  [1] -04583333

                                                                  Rgt gscor(g1 g2 reps = 1e5)

                                                                  [1] 05345225

                                                                  Rgt gscor(g1 g3 reps = 1e5)

                                                                  [1] 05345225

                                                                  Rgt gscor(g2 g3 reps = 1e5)

                                                                  [1] 1

                                                                  Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                                  Rgt x lt- rgraph(20 4)

                                                                  Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                                  Rgt nl lt- netlm(y x)

                                                                  Rgt summary(nl)

                                                                  36 Social Network Analysis with sna

                                                                  OLS Network Model

                                                                  Residuals0 25 50 75 100

                                                                  -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                                  CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                                  (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                                  Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                                  Test Diagnostics

                                                                  Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                                  (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                                  As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                                  Rgt x lt- rgraph(20 4)

                                                                  Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                                  Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                                  Rgt y lt- rgraph(20 tprob = yp)

                                                                  Rgt nl lt- netlogit(y x)

                                                                  Rgt summary(nl)

                                                                  Network Logit Model

                                                                  Coefficients

                                                                  Journal of Statistical Software 37

                                                                  Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                  Goodness of Fit Statistics

                                                                  Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                  3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                  (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                  Contingency Table (predicted (rows) x actual (cols))

                                                                  0 10 0 01 39 341

                                                                  Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                  Test Diagnostics

                                                                  Null Hypothesis qapReplications 1000Distribution Summary

                                                                  (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                  It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                  38 Social Network Analysis with sna

                                                                  parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                  27 Network inference and process models

                                                                  A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                  Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                  Journal of Statistical Software 39

                                                                  of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                  )prodk

                                                                  (1minusPr(Bk)

                                                                  )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                  While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                  y =

                                                                  (wsum

                                                                  i=1

                                                                  θiWi

                                                                  )y + Xβ + ε (4)

                                                                  ε =

                                                                  (zsum

                                                                  i=1

                                                                  ψiZi

                                                                  )ε+ ν (5)

                                                                  where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                  40 Social Network Analysis with sna

                                                                  Example

                                                                  To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                  Rgt g lt- rgraph(20)

                                                                  Rgt ep lt- rbeta(20 1 25)

                                                                  Rgt em lt- rbeta(20 15 25)

                                                                  Rgt dat lt- array(dim = c(20 20 20))

                                                                  Rgt for(i in 120)

                                                                  + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                  Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                  Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                  Rgt pem[1] lt- 2

                                                                  Rgt pem[2] lt- 11

                                                                  Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                  Rgt pep[1] lt- 2

                                                                  Rgt pep[2] lt- 11

                                                                  Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                  + epprior = pep burntime = 300 draws = 100)

                                                                  Rgt summary(b)

                                                                  Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                  Multiple Error Probability Model

                                                                  Marginal Posterior Network Distribution

                                                                  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                  Journal of Statistical Software 41

                                                                  a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                  a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                  Marginal Posterior Global Error Distribution

                                                                  e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                  Marginal Posterior Error Distribution (by observer)

                                                                  Probability of False Negatives (e^-)

                                                                  Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                  42 Social Network Analysis with sna

                                                                  o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                  Probability of False Positives (e^+)

                                                                  Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                  MCMC Diagnostics

                                                                  Replicate Chains 5Burn Time 300

                                                                  Journal of Statistical Software 43

                                                                  Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                  Max 1003116Med 09992194IQR 00004545115

                                                                  Rgt cor(em apply(b$em 2 median))

                                                                  [1] 09187894

                                                                  Rgt cor(ep apply(b$ep 2 median))

                                                                  [1] 0971649

                                                                  Rgt mean(apply(b$net c(2 3) median) == g)

                                                                  [1] 1

                                                                  Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                  Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                  Rgt mean(consensus(dat method = LASintersection) == g)

                                                                  [1] 07725

                                                                  Rgt mean(consensus(dat method = LASunion) == g)

                                                                  [1] 0905

                                                                  Rgt mean(consensus(dat method = centralgraph) == g)

                                                                  [1] 09575

                                                                  Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                  44 Social Network Analysis with sna

                                                                  Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                  For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                  As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                  Rgt w1 lt- rgraph(50)

                                                                  Rgt w2 lt- rgraph(50)

                                                                  Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                  Rgt r1 lt- 02

                                                                  Rgt r2 lt- 03

                                                                  Rgt sigma lt- 01

                                                                  Rgt beta lt- rnorm(5)

                                                                  Rgt nu lt- rnorm(50 0 sigma)

                                                                  Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                  Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                  Rgt fit lt- lnam(y x w1 w2)

                                                                  Rgt summary(fit)

                                                                  Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                  ResidualsMin 1Q Median 3Q Max

                                                                  -052052 -018305 001156 015557 062082

                                                                  CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                  X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                  Journal of Statistical Software 45

                                                                  X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                  Estimate Std ErrorSigma 009597 922e-05

                                                                  Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                  Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                  In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                  3 Closing comments

                                                                  The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                  Acknowledgments

                                                                  The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                  46 Social Network Analysis with sna

                                                                  minus3 minus2 minus1 0 1 2

                                                                  minus3minus2

                                                                  minus10

                                                                  12

                                                                  Fitted vs Observed Values

                                                                  y

                                                                  y

                                                                  minus3 minus2 minus1 0 1 2

                                                                  minus02

                                                                  minus01

                                                                  00

                                                                  01

                                                                  02

                                                                  Fitted Values vs Estimated Disturbances

                                                                  y

                                                                  ν

                                                                  minus2 minus1 0 1 2

                                                                  minus04

                                                                  minus02

                                                                  00

                                                                  02

                                                                  04

                                                                  06

                                                                  Normal QminusQ Residual Plot

                                                                  Theoretical Quantiles

                                                                  Sam

                                                                  ple

                                                                  Qua

                                                                  ntile

                                                                  s

                                                                  Net Influence Plot

                                                                  Figure 6 Plot method output for lnam

                                                                  team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                  References

                                                                  Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                  Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                  Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                  Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                  Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                  Journal of Statistical Software 47

                                                                  Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                  Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                  Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                  Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                  Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                  Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                  Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                  Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                  Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                  Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                  Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                  Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                  Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                  Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                  Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                  Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                  Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                  48 Social Network Analysis with sna

                                                                  Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                  Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                  Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                  Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                  Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                  Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                  Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                  Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                  Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                  Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                  Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                  Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                  Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                  Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                  Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                  Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                  Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                  Journal of Statistical Software 49

                                                                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                  50 Social Network Analysis with sna

                                                                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                  Journal of Statistical Software 51

                                                                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                  Affiliation

                                                                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                  • Introduction and overview
                                                                    • Package history
                                                                    • sna and statnet
                                                                    • Functionality
                                                                    • Terminology and data representation
                                                                      • Importing relational data into R
                                                                          • Package highlights
                                                                            • Random graph generation
                                                                              • Example
                                                                                • Visualization and data manipulation
                                                                                  • Neighborhood and ego net functions
                                                                                  • Visualization
                                                                                    • Descriptive indices
                                                                                      • Node-level indices
                                                                                      • Graph-level indices
                                                                                        • Connectivity and subgraph statistics
                                                                                          • Example
                                                                                            • Position and role analysis
                                                                                              • Example
                                                                                                • Exploratory edge set comparison
                                                                                                  • Example
                                                                                                    • Network inference and process models
                                                                                                      • Example
                                                                                                          • Closing comments

                                                                    34 Social Network Analysis with sna

                                                                    where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

                                                                    (ij)AXij is the graph mean The graph variance is then cov(GG)

                                                                    and the graph correlation ρ(GH) = cov(GH)radic

                                                                    cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

                                                                    The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

                                                                    Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

                                                                    In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

                                                                    Journal of Statistical Software 35

                                                                    Example

                                                                    We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                                    Rgt g1 lt- rgraph(5)

                                                                    Rgt g2 lt -rgraph(5)

                                                                    Rgt g3 lt- rmperm(g2)

                                                                    Rgt gcor(g1 g2)

                                                                    [1] -01336306

                                                                    Rgt gcor(g1 g3)

                                                                    [1] 008908708

                                                                    Rgt gcor(g2 g3)

                                                                    [1] -04583333

                                                                    Rgt gscor(g1 g2 reps = 1e5)

                                                                    [1] 05345225

                                                                    Rgt gscor(g1 g3 reps = 1e5)

                                                                    [1] 05345225

                                                                    Rgt gscor(g2 g3 reps = 1e5)

                                                                    [1] 1

                                                                    Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                                    Rgt x lt- rgraph(20 4)

                                                                    Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                                    Rgt nl lt- netlm(y x)

                                                                    Rgt summary(nl)

                                                                    36 Social Network Analysis with sna

                                                                    OLS Network Model

                                                                    Residuals0 25 50 75 100

                                                                    -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                                    CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                                    (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                                    Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                                    Test Diagnostics

                                                                    Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                                    (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                                    As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                                    Rgt x lt- rgraph(20 4)

                                                                    Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                                    Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                                    Rgt y lt- rgraph(20 tprob = yp)

                                                                    Rgt nl lt- netlogit(y x)

                                                                    Rgt summary(nl)

                                                                    Network Logit Model

                                                                    Coefficients

                                                                    Journal of Statistical Software 37

                                                                    Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                    Goodness of Fit Statistics

                                                                    Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                    3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                    (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                    Contingency Table (predicted (rows) x actual (cols))

                                                                    0 10 0 01 39 341

                                                                    Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                    Test Diagnostics

                                                                    Null Hypothesis qapReplications 1000Distribution Summary

                                                                    (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                    It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                    38 Social Network Analysis with sna

                                                                    parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                    27 Network inference and process models

                                                                    A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                    Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                    Journal of Statistical Software 39

                                                                    of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                    )prodk

                                                                    (1minusPr(Bk)

                                                                    )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                    While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                    y =

                                                                    (wsum

                                                                    i=1

                                                                    θiWi

                                                                    )y + Xβ + ε (4)

                                                                    ε =

                                                                    (zsum

                                                                    i=1

                                                                    ψiZi

                                                                    )ε+ ν (5)

                                                                    where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                    40 Social Network Analysis with sna

                                                                    Example

                                                                    To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                    Rgt g lt- rgraph(20)

                                                                    Rgt ep lt- rbeta(20 1 25)

                                                                    Rgt em lt- rbeta(20 15 25)

                                                                    Rgt dat lt- array(dim = c(20 20 20))

                                                                    Rgt for(i in 120)

                                                                    + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                    Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                    Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                    Rgt pem[1] lt- 2

                                                                    Rgt pem[2] lt- 11

                                                                    Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                    Rgt pep[1] lt- 2

                                                                    Rgt pep[2] lt- 11

                                                                    Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                    + epprior = pep burntime = 300 draws = 100)

                                                                    Rgt summary(b)

                                                                    Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                    Multiple Error Probability Model

                                                                    Marginal Posterior Network Distribution

                                                                    a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                    Journal of Statistical Software 41

                                                                    a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                    a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                    Marginal Posterior Global Error Distribution

                                                                    e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                    Marginal Posterior Error Distribution (by observer)

                                                                    Probability of False Negatives (e^-)

                                                                    Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                    42 Social Network Analysis with sna

                                                                    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                    Probability of False Positives (e^+)

                                                                    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                    MCMC Diagnostics

                                                                    Replicate Chains 5Burn Time 300

                                                                    Journal of Statistical Software 43

                                                                    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                    Max 1003116Med 09992194IQR 00004545115

                                                                    Rgt cor(em apply(b$em 2 median))

                                                                    [1] 09187894

                                                                    Rgt cor(ep apply(b$ep 2 median))

                                                                    [1] 0971649

                                                                    Rgt mean(apply(b$net c(2 3) median) == g)

                                                                    [1] 1

                                                                    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                    Rgt mean(consensus(dat method = LASintersection) == g)

                                                                    [1] 07725

                                                                    Rgt mean(consensus(dat method = LASunion) == g)

                                                                    [1] 0905

                                                                    Rgt mean(consensus(dat method = centralgraph) == g)

                                                                    [1] 09575

                                                                    Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                    44 Social Network Analysis with sna

                                                                    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                    Rgt w1 lt- rgraph(50)

                                                                    Rgt w2 lt- rgraph(50)

                                                                    Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                    Rgt r1 lt- 02

                                                                    Rgt r2 lt- 03

                                                                    Rgt sigma lt- 01

                                                                    Rgt beta lt- rnorm(5)

                                                                    Rgt nu lt- rnorm(50 0 sigma)

                                                                    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                    Rgt fit lt- lnam(y x w1 w2)

                                                                    Rgt summary(fit)

                                                                    Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                    ResidualsMin 1Q Median 3Q Max

                                                                    -052052 -018305 001156 015557 062082

                                                                    CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                    Journal of Statistical Software 45

                                                                    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                    Estimate Std ErrorSigma 009597 922e-05

                                                                    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                    3 Closing comments

                                                                    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                    Acknowledgments

                                                                    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                    46 Social Network Analysis with sna

                                                                    minus3 minus2 minus1 0 1 2

                                                                    minus3minus2

                                                                    minus10

                                                                    12

                                                                    Fitted vs Observed Values

                                                                    y

                                                                    y

                                                                    minus3 minus2 minus1 0 1 2

                                                                    minus02

                                                                    minus01

                                                                    00

                                                                    01

                                                                    02

                                                                    Fitted Values vs Estimated Disturbances

                                                                    y

                                                                    ν

                                                                    minus2 minus1 0 1 2

                                                                    minus04

                                                                    minus02

                                                                    00

                                                                    02

                                                                    04

                                                                    06

                                                                    Normal QminusQ Residual Plot

                                                                    Theoretical Quantiles

                                                                    Sam

                                                                    ple

                                                                    Qua

                                                                    ntile

                                                                    s

                                                                    Net Influence Plot

                                                                    Figure 6 Plot method output for lnam

                                                                    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                    References

                                                                    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                    Journal of Statistical Software 47

                                                                    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                    48 Social Network Analysis with sna

                                                                    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                    Journal of Statistical Software 49

                                                                    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                    50 Social Network Analysis with sna

                                                                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                    Journal of Statistical Software 51

                                                                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                    Affiliation

                                                                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                    • Introduction and overview
                                                                      • Package history
                                                                      • sna and statnet
                                                                      • Functionality
                                                                      • Terminology and data representation
                                                                        • Importing relational data into R
                                                                            • Package highlights
                                                                              • Random graph generation
                                                                                • Example
                                                                                  • Visualization and data manipulation
                                                                                    • Neighborhood and ego net functions
                                                                                    • Visualization
                                                                                      • Descriptive indices
                                                                                        • Node-level indices
                                                                                        • Graph-level indices
                                                                                          • Connectivity and subgraph statistics
                                                                                            • Example
                                                                                              • Position and role analysis
                                                                                                • Example
                                                                                                  • Exploratory edge set comparison
                                                                                                    • Example
                                                                                                      • Network inference and process models
                                                                                                        • Example
                                                                                                            • Closing comments

                                                                      Journal of Statistical Software 35

                                                                      Example

                                                                      We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

                                                                      Rgt g1 lt- rgraph(5)

                                                                      Rgt g2 lt -rgraph(5)

                                                                      Rgt g3 lt- rmperm(g2)

                                                                      Rgt gcor(g1 g2)

                                                                      [1] -01336306

                                                                      Rgt gcor(g1 g3)

                                                                      [1] 008908708

                                                                      Rgt gcor(g2 g3)

                                                                      [1] -04583333

                                                                      Rgt gscor(g1 g2 reps = 1e5)

                                                                      [1] 05345225

                                                                      Rgt gscor(g1 g3 reps = 1e5)

                                                                      [1] 05345225

                                                                      Rgt gscor(g2 g3 reps = 1e5)

                                                                      [1] 1

                                                                      Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

                                                                      Rgt x lt- rgraph(20 4)

                                                                      Rgt y lt- x[1] + 4 x[2] + 2 x[3]

                                                                      Rgt nl lt- netlm(y x)

                                                                      Rgt summary(nl)

                                                                      36 Social Network Analysis with sna

                                                                      OLS Network Model

                                                                      Residuals0 25 50 75 100

                                                                      -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                                      CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                                      (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                                      Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                                      Test Diagnostics

                                                                      Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                                      (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                                      As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                                      Rgt x lt- rgraph(20 4)

                                                                      Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                                      Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                                      Rgt y lt- rgraph(20 tprob = yp)

                                                                      Rgt nl lt- netlogit(y x)

                                                                      Rgt summary(nl)

                                                                      Network Logit Model

                                                                      Coefficients

                                                                      Journal of Statistical Software 37

                                                                      Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                      Goodness of Fit Statistics

                                                                      Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                      3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                      (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                      Contingency Table (predicted (rows) x actual (cols))

                                                                      0 10 0 01 39 341

                                                                      Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                      Test Diagnostics

                                                                      Null Hypothesis qapReplications 1000Distribution Summary

                                                                      (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                      It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                      38 Social Network Analysis with sna

                                                                      parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                      27 Network inference and process models

                                                                      A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                      Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                      Journal of Statistical Software 39

                                                                      of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                      )prodk

                                                                      (1minusPr(Bk)

                                                                      )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                      While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                      y =

                                                                      (wsum

                                                                      i=1

                                                                      θiWi

                                                                      )y + Xβ + ε (4)

                                                                      ε =

                                                                      (zsum

                                                                      i=1

                                                                      ψiZi

                                                                      )ε+ ν (5)

                                                                      where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                      40 Social Network Analysis with sna

                                                                      Example

                                                                      To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                      Rgt g lt- rgraph(20)

                                                                      Rgt ep lt- rbeta(20 1 25)

                                                                      Rgt em lt- rbeta(20 15 25)

                                                                      Rgt dat lt- array(dim = c(20 20 20))

                                                                      Rgt for(i in 120)

                                                                      + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                      Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                      Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                      Rgt pem[1] lt- 2

                                                                      Rgt pem[2] lt- 11

                                                                      Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                      Rgt pep[1] lt- 2

                                                                      Rgt pep[2] lt- 11

                                                                      Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                      + epprior = pep burntime = 300 draws = 100)

                                                                      Rgt summary(b)

                                                                      Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                      Multiple Error Probability Model

                                                                      Marginal Posterior Network Distribution

                                                                      a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                      Journal of Statistical Software 41

                                                                      a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                      a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                      Marginal Posterior Global Error Distribution

                                                                      e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                      Marginal Posterior Error Distribution (by observer)

                                                                      Probability of False Negatives (e^-)

                                                                      Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                      42 Social Network Analysis with sna

                                                                      o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                      Probability of False Positives (e^+)

                                                                      Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                      MCMC Diagnostics

                                                                      Replicate Chains 5Burn Time 300

                                                                      Journal of Statistical Software 43

                                                                      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                      Max 1003116Med 09992194IQR 00004545115

                                                                      Rgt cor(em apply(b$em 2 median))

                                                                      [1] 09187894

                                                                      Rgt cor(ep apply(b$ep 2 median))

                                                                      [1] 0971649

                                                                      Rgt mean(apply(b$net c(2 3) median) == g)

                                                                      [1] 1

                                                                      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                      Rgt mean(consensus(dat method = LASintersection) == g)

                                                                      [1] 07725

                                                                      Rgt mean(consensus(dat method = LASunion) == g)

                                                                      [1] 0905

                                                                      Rgt mean(consensus(dat method = centralgraph) == g)

                                                                      [1] 09575

                                                                      Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                      44 Social Network Analysis with sna

                                                                      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                      Rgt w1 lt- rgraph(50)

                                                                      Rgt w2 lt- rgraph(50)

                                                                      Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                      Rgt r1 lt- 02

                                                                      Rgt r2 lt- 03

                                                                      Rgt sigma lt- 01

                                                                      Rgt beta lt- rnorm(5)

                                                                      Rgt nu lt- rnorm(50 0 sigma)

                                                                      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                      Rgt fit lt- lnam(y x w1 w2)

                                                                      Rgt summary(fit)

                                                                      Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                      ResidualsMin 1Q Median 3Q Max

                                                                      -052052 -018305 001156 015557 062082

                                                                      CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                      Journal of Statistical Software 45

                                                                      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                      Estimate Std ErrorSigma 009597 922e-05

                                                                      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                      3 Closing comments

                                                                      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                      Acknowledgments

                                                                      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                      46 Social Network Analysis with sna

                                                                      minus3 minus2 minus1 0 1 2

                                                                      minus3minus2

                                                                      minus10

                                                                      12

                                                                      Fitted vs Observed Values

                                                                      y

                                                                      y

                                                                      minus3 minus2 minus1 0 1 2

                                                                      minus02

                                                                      minus01

                                                                      00

                                                                      01

                                                                      02

                                                                      Fitted Values vs Estimated Disturbances

                                                                      y

                                                                      ν

                                                                      minus2 minus1 0 1 2

                                                                      minus04

                                                                      minus02

                                                                      00

                                                                      02

                                                                      04

                                                                      06

                                                                      Normal QminusQ Residual Plot

                                                                      Theoretical Quantiles

                                                                      Sam

                                                                      ple

                                                                      Qua

                                                                      ntile

                                                                      s

                                                                      Net Influence Plot

                                                                      Figure 6 Plot method output for lnam

                                                                      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                      References

                                                                      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                      Journal of Statistical Software 47

                                                                      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                      48 Social Network Analysis with sna

                                                                      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                      Journal of Statistical Software 49

                                                                      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                      50 Social Network Analysis with sna

                                                                      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                      Journal of Statistical Software 51

                                                                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                      Affiliation

                                                                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                      • Introduction and overview
                                                                        • Package history
                                                                        • sna and statnet
                                                                        • Functionality
                                                                        • Terminology and data representation
                                                                          • Importing relational data into R
                                                                              • Package highlights
                                                                                • Random graph generation
                                                                                  • Example
                                                                                    • Visualization and data manipulation
                                                                                      • Neighborhood and ego net functions
                                                                                      • Visualization
                                                                                        • Descriptive indices
                                                                                          • Node-level indices
                                                                                          • Graph-level indices
                                                                                            • Connectivity and subgraph statistics
                                                                                              • Example
                                                                                                • Position and role analysis
                                                                                                  • Example
                                                                                                    • Exploratory edge set comparison
                                                                                                      • Example
                                                                                                        • Network inference and process models
                                                                                                          • Example
                                                                                                              • Closing comments

                                                                        36 Social Network Analysis with sna

                                                                        OLS Network Model

                                                                        Residuals0 25 50 75 100

                                                                        -2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

                                                                        CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

                                                                        (intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

                                                                        Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

                                                                        Test Diagnostics

                                                                        Null Hypothesis qapReplications 1000Coefficient Distribution Summary

                                                                        (intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

                                                                        As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

                                                                        Rgt x lt- rgraph(20 4)

                                                                        Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

                                                                        Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

                                                                        Rgt y lt- rgraph(20 tprob = yp)

                                                                        Rgt nl lt- netlogit(y x)

                                                                        Rgt summary(nl)

                                                                        Network Logit Model

                                                                        Coefficients

                                                                        Journal of Statistical Software 37

                                                                        Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                        Goodness of Fit Statistics

                                                                        Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                        3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                        (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                        Contingency Table (predicted (rows) x actual (cols))

                                                                        0 10 0 01 39 341

                                                                        Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                        Test Diagnostics

                                                                        Null Hypothesis qapReplications 1000Distribution Summary

                                                                        (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                        It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                        38 Social Network Analysis with sna

                                                                        parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                        27 Network inference and process models

                                                                        A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                        Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                        Journal of Statistical Software 39

                                                                        of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                        )prodk

                                                                        (1minusPr(Bk)

                                                                        )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                        While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                        y =

                                                                        (wsum

                                                                        i=1

                                                                        θiWi

                                                                        )y + Xβ + ε (4)

                                                                        ε =

                                                                        (zsum

                                                                        i=1

                                                                        ψiZi

                                                                        )ε+ ν (5)

                                                                        where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                        40 Social Network Analysis with sna

                                                                        Example

                                                                        To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                        Rgt g lt- rgraph(20)

                                                                        Rgt ep lt- rbeta(20 1 25)

                                                                        Rgt em lt- rbeta(20 15 25)

                                                                        Rgt dat lt- array(dim = c(20 20 20))

                                                                        Rgt for(i in 120)

                                                                        + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                        Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                        Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                        Rgt pem[1] lt- 2

                                                                        Rgt pem[2] lt- 11

                                                                        Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                        Rgt pep[1] lt- 2

                                                                        Rgt pep[2] lt- 11

                                                                        Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                        + epprior = pep burntime = 300 draws = 100)

                                                                        Rgt summary(b)

                                                                        Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                        Multiple Error Probability Model

                                                                        Marginal Posterior Network Distribution

                                                                        a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                        Journal of Statistical Software 41

                                                                        a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                        a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                        Marginal Posterior Global Error Distribution

                                                                        e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                        Marginal Posterior Error Distribution (by observer)

                                                                        Probability of False Negatives (e^-)

                                                                        Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                        42 Social Network Analysis with sna

                                                                        o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                        Probability of False Positives (e^+)

                                                                        Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                        MCMC Diagnostics

                                                                        Replicate Chains 5Burn Time 300

                                                                        Journal of Statistical Software 43

                                                                        Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                        Max 1003116Med 09992194IQR 00004545115

                                                                        Rgt cor(em apply(b$em 2 median))

                                                                        [1] 09187894

                                                                        Rgt cor(ep apply(b$ep 2 median))

                                                                        [1] 0971649

                                                                        Rgt mean(apply(b$net c(2 3) median) == g)

                                                                        [1] 1

                                                                        Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                        Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                        Rgt mean(consensus(dat method = LASintersection) == g)

                                                                        [1] 07725

                                                                        Rgt mean(consensus(dat method = LASunion) == g)

                                                                        [1] 0905

                                                                        Rgt mean(consensus(dat method = centralgraph) == g)

                                                                        [1] 09575

                                                                        Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                        44 Social Network Analysis with sna

                                                                        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                        Rgt w1 lt- rgraph(50)

                                                                        Rgt w2 lt- rgraph(50)

                                                                        Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                        Rgt r1 lt- 02

                                                                        Rgt r2 lt- 03

                                                                        Rgt sigma lt- 01

                                                                        Rgt beta lt- rnorm(5)

                                                                        Rgt nu lt- rnorm(50 0 sigma)

                                                                        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                        Rgt fit lt- lnam(y x w1 w2)

                                                                        Rgt summary(fit)

                                                                        Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                        ResidualsMin 1Q Median 3Q Max

                                                                        -052052 -018305 001156 015557 062082

                                                                        CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                        Journal of Statistical Software 45

                                                                        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                        Estimate Std ErrorSigma 009597 922e-05

                                                                        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                        3 Closing comments

                                                                        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                        Acknowledgments

                                                                        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                        46 Social Network Analysis with sna

                                                                        minus3 minus2 minus1 0 1 2

                                                                        minus3minus2

                                                                        minus10

                                                                        12

                                                                        Fitted vs Observed Values

                                                                        y

                                                                        y

                                                                        minus3 minus2 minus1 0 1 2

                                                                        minus02

                                                                        minus01

                                                                        00

                                                                        01

                                                                        02

                                                                        Fitted Values vs Estimated Disturbances

                                                                        y

                                                                        ν

                                                                        minus2 minus1 0 1 2

                                                                        minus04

                                                                        minus02

                                                                        00

                                                                        02

                                                                        04

                                                                        06

                                                                        Normal QminusQ Residual Plot

                                                                        Theoretical Quantiles

                                                                        Sam

                                                                        ple

                                                                        Qua

                                                                        ntile

                                                                        s

                                                                        Net Influence Plot

                                                                        Figure 6 Plot method output for lnam

                                                                        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                        References

                                                                        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                        Journal of Statistical Software 47

                                                                        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                        48 Social Network Analysis with sna

                                                                        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                        Journal of Statistical Software 49

                                                                        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                        50 Social Network Analysis with sna

                                                                        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                        Journal of Statistical Software 51

                                                                        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                        Affiliation

                                                                        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                        • Introduction and overview
                                                                          • Package history
                                                                          • sna and statnet
                                                                          • Functionality
                                                                          • Terminology and data representation
                                                                            • Importing relational data into R
                                                                                • Package highlights
                                                                                  • Random graph generation
                                                                                    • Example
                                                                                      • Visualization and data manipulation
                                                                                        • Neighborhood and ego net functions
                                                                                        • Visualization
                                                                                          • Descriptive indices
                                                                                            • Node-level indices
                                                                                            • Graph-level indices
                                                                                              • Connectivity and subgraph statistics
                                                                                                • Example
                                                                                                  • Position and role analysis
                                                                                                    • Example
                                                                                                      • Exploratory edge set comparison
                                                                                                        • Example
                                                                                                          • Network inference and process models
                                                                                                            • Example
                                                                                                                • Closing comments

                                                                          Journal of Statistical Software 37

                                                                          Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

                                                                          Goodness of Fit Statistics

                                                                          Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

                                                                          3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

                                                                          (Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

                                                                          Contingency Table (predicted (rows) x actual (cols))

                                                                          0 10 0 01 39 341

                                                                          Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

                                                                          Test Diagnostics

                                                                          Null Hypothesis qapReplications 1000Distribution Summary

                                                                          (intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

                                                                          It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

                                                                          38 Social Network Analysis with sna

                                                                          parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                          27 Network inference and process models

                                                                          A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                          Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                          Journal of Statistical Software 39

                                                                          of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                          )prodk

                                                                          (1minusPr(Bk)

                                                                          )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                          While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                          y =

                                                                          (wsum

                                                                          i=1

                                                                          θiWi

                                                                          )y + Xβ + ε (4)

                                                                          ε =

                                                                          (zsum

                                                                          i=1

                                                                          ψiZi

                                                                          )ε+ ν (5)

                                                                          where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                          40 Social Network Analysis with sna

                                                                          Example

                                                                          To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                          Rgt g lt- rgraph(20)

                                                                          Rgt ep lt- rbeta(20 1 25)

                                                                          Rgt em lt- rbeta(20 15 25)

                                                                          Rgt dat lt- array(dim = c(20 20 20))

                                                                          Rgt for(i in 120)

                                                                          + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                          Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                          Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                          Rgt pem[1] lt- 2

                                                                          Rgt pem[2] lt- 11

                                                                          Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                          Rgt pep[1] lt- 2

                                                                          Rgt pep[2] lt- 11

                                                                          Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                          + epprior = pep burntime = 300 draws = 100)

                                                                          Rgt summary(b)

                                                                          Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                          Multiple Error Probability Model

                                                                          Marginal Posterior Network Distribution

                                                                          a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                          Journal of Statistical Software 41

                                                                          a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                          a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                          Marginal Posterior Global Error Distribution

                                                                          e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                          Marginal Posterior Error Distribution (by observer)

                                                                          Probability of False Negatives (e^-)

                                                                          Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                          42 Social Network Analysis with sna

                                                                          o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                          Probability of False Positives (e^+)

                                                                          Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                          MCMC Diagnostics

                                                                          Replicate Chains 5Burn Time 300

                                                                          Journal of Statistical Software 43

                                                                          Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                          Max 1003116Med 09992194IQR 00004545115

                                                                          Rgt cor(em apply(b$em 2 median))

                                                                          [1] 09187894

                                                                          Rgt cor(ep apply(b$ep 2 median))

                                                                          [1] 0971649

                                                                          Rgt mean(apply(b$net c(2 3) median) == g)

                                                                          [1] 1

                                                                          Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                          Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                          Rgt mean(consensus(dat method = LASintersection) == g)

                                                                          [1] 07725

                                                                          Rgt mean(consensus(dat method = LASunion) == g)

                                                                          [1] 0905

                                                                          Rgt mean(consensus(dat method = centralgraph) == g)

                                                                          [1] 09575

                                                                          Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                          44 Social Network Analysis with sna

                                                                          Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                          For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                          As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                          Rgt w1 lt- rgraph(50)

                                                                          Rgt w2 lt- rgraph(50)

                                                                          Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                          Rgt r1 lt- 02

                                                                          Rgt r2 lt- 03

                                                                          Rgt sigma lt- 01

                                                                          Rgt beta lt- rnorm(5)

                                                                          Rgt nu lt- rnorm(50 0 sigma)

                                                                          Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                          Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                          Rgt fit lt- lnam(y x w1 w2)

                                                                          Rgt summary(fit)

                                                                          Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                          ResidualsMin 1Q Median 3Q Max

                                                                          -052052 -018305 001156 015557 062082

                                                                          CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                          X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                          Journal of Statistical Software 45

                                                                          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                          Estimate Std ErrorSigma 009597 922e-05

                                                                          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                          3 Closing comments

                                                                          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                          Acknowledgments

                                                                          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                          46 Social Network Analysis with sna

                                                                          minus3 minus2 minus1 0 1 2

                                                                          minus3minus2

                                                                          minus10

                                                                          12

                                                                          Fitted vs Observed Values

                                                                          y

                                                                          y

                                                                          minus3 minus2 minus1 0 1 2

                                                                          minus02

                                                                          minus01

                                                                          00

                                                                          01

                                                                          02

                                                                          Fitted Values vs Estimated Disturbances

                                                                          y

                                                                          ν

                                                                          minus2 minus1 0 1 2

                                                                          minus04

                                                                          minus02

                                                                          00

                                                                          02

                                                                          04

                                                                          06

                                                                          Normal QminusQ Residual Plot

                                                                          Theoretical Quantiles

                                                                          Sam

                                                                          ple

                                                                          Qua

                                                                          ntile

                                                                          s

                                                                          Net Influence Plot

                                                                          Figure 6 Plot method output for lnam

                                                                          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                          References

                                                                          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                          Journal of Statistical Software 47

                                                                          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                          48 Social Network Analysis with sna

                                                                          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                          Journal of Statistical Software 49

                                                                          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                          50 Social Network Analysis with sna

                                                                          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                          Journal of Statistical Software 51

                                                                          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                          Affiliation

                                                                          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                          • Introduction and overview
                                                                            • Package history
                                                                            • sna and statnet
                                                                            • Functionality
                                                                            • Terminology and data representation
                                                                              • Importing relational data into R
                                                                                  • Package highlights
                                                                                    • Random graph generation
                                                                                      • Example
                                                                                        • Visualization and data manipulation
                                                                                          • Neighborhood and ego net functions
                                                                                          • Visualization
                                                                                            • Descriptive indices
                                                                                              • Node-level indices
                                                                                              • Graph-level indices
                                                                                                • Connectivity and subgraph statistics
                                                                                                  • Example
                                                                                                    • Position and role analysis
                                                                                                      • Example
                                                                                                        • Exploratory edge set comparison
                                                                                                          • Example
                                                                                                            • Network inference and process models
                                                                                                              • Example
                                                                                                                  • Closing comments

                                                                            38 Social Network Analysis with sna

                                                                            parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

                                                                            27 Network inference and process models

                                                                            A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

                                                                            Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

                                                                            Journal of Statistical Software 39

                                                                            of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                            )prodk

                                                                            (1minusPr(Bk)

                                                                            )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                            While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                            y =

                                                                            (wsum

                                                                            i=1

                                                                            θiWi

                                                                            )y + Xβ + ε (4)

                                                                            ε =

                                                                            (zsum

                                                                            i=1

                                                                            ψiZi

                                                                            )ε+ ν (5)

                                                                            where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                            40 Social Network Analysis with sna

                                                                            Example

                                                                            To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                            Rgt g lt- rgraph(20)

                                                                            Rgt ep lt- rbeta(20 1 25)

                                                                            Rgt em lt- rbeta(20 15 25)

                                                                            Rgt dat lt- array(dim = c(20 20 20))

                                                                            Rgt for(i in 120)

                                                                            + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                            Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                            Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                            Rgt pem[1] lt- 2

                                                                            Rgt pem[2] lt- 11

                                                                            Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                            Rgt pep[1] lt- 2

                                                                            Rgt pep[2] lt- 11

                                                                            Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                            + epprior = pep burntime = 300 draws = 100)

                                                                            Rgt summary(b)

                                                                            Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                            Multiple Error Probability Model

                                                                            Marginal Posterior Network Distribution

                                                                            a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                            Journal of Statistical Software 41

                                                                            a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                            a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                            Marginal Posterior Global Error Distribution

                                                                            e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                            Marginal Posterior Error Distribution (by observer)

                                                                            Probability of False Negatives (e^-)

                                                                            Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                            42 Social Network Analysis with sna

                                                                            o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                            Probability of False Positives (e^+)

                                                                            Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                            MCMC Diagnostics

                                                                            Replicate Chains 5Burn Time 300

                                                                            Journal of Statistical Software 43

                                                                            Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                            Max 1003116Med 09992194IQR 00004545115

                                                                            Rgt cor(em apply(b$em 2 median))

                                                                            [1] 09187894

                                                                            Rgt cor(ep apply(b$ep 2 median))

                                                                            [1] 0971649

                                                                            Rgt mean(apply(b$net c(2 3) median) == g)

                                                                            [1] 1

                                                                            Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                            Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                            Rgt mean(consensus(dat method = LASintersection) == g)

                                                                            [1] 07725

                                                                            Rgt mean(consensus(dat method = LASunion) == g)

                                                                            [1] 0905

                                                                            Rgt mean(consensus(dat method = centralgraph) == g)

                                                                            [1] 09575

                                                                            Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                            44 Social Network Analysis with sna

                                                                            Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                            For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                            As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                            Rgt w1 lt- rgraph(50)

                                                                            Rgt w2 lt- rgraph(50)

                                                                            Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                            Rgt r1 lt- 02

                                                                            Rgt r2 lt- 03

                                                                            Rgt sigma lt- 01

                                                                            Rgt beta lt- rnorm(5)

                                                                            Rgt nu lt- rnorm(50 0 sigma)

                                                                            Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                            Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                            Rgt fit lt- lnam(y x w1 w2)

                                                                            Rgt summary(fit)

                                                                            Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                            ResidualsMin 1Q Median 3Q Max

                                                                            -052052 -018305 001156 015557 062082

                                                                            CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                            X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                            Journal of Statistical Software 45

                                                                            X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                            Estimate Std ErrorSigma 009597 922e-05

                                                                            Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                            Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                            In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                            3 Closing comments

                                                                            The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                            Acknowledgments

                                                                            The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                            46 Social Network Analysis with sna

                                                                            minus3 minus2 minus1 0 1 2

                                                                            minus3minus2

                                                                            minus10

                                                                            12

                                                                            Fitted vs Observed Values

                                                                            y

                                                                            y

                                                                            minus3 minus2 minus1 0 1 2

                                                                            minus02

                                                                            minus01

                                                                            00

                                                                            01

                                                                            02

                                                                            Fitted Values vs Estimated Disturbances

                                                                            y

                                                                            ν

                                                                            minus2 minus1 0 1 2

                                                                            minus04

                                                                            minus02

                                                                            00

                                                                            02

                                                                            04

                                                                            06

                                                                            Normal QminusQ Residual Plot

                                                                            Theoretical Quantiles

                                                                            Sam

                                                                            ple

                                                                            Qua

                                                                            ntile

                                                                            s

                                                                            Net Influence Plot

                                                                            Figure 6 Plot method output for lnam

                                                                            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                            References

                                                                            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                            Journal of Statistical Software 47

                                                                            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                            48 Social Network Analysis with sna

                                                                            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                            Journal of Statistical Software 49

                                                                            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                            50 Social Network Analysis with sna

                                                                            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                            Journal of Statistical Software 51

                                                                            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                            Affiliation

                                                                            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                            • Introduction and overview
                                                                              • Package history
                                                                              • sna and statnet
                                                                              • Functionality
                                                                              • Terminology and data representation
                                                                                • Importing relational data into R
                                                                                    • Package highlights
                                                                                      • Random graph generation
                                                                                        • Example
                                                                                          • Visualization and data manipulation
                                                                                            • Neighborhood and ego net functions
                                                                                            • Visualization
                                                                                              • Descriptive indices
                                                                                                • Node-level indices
                                                                                                • Graph-level indices
                                                                                                  • Connectivity and subgraph statistics
                                                                                                    • Example
                                                                                                      • Position and role analysis
                                                                                                        • Example
                                                                                                          • Exploratory edge set comparison
                                                                                                            • Example
                                                                                                              • Network inference and process models
                                                                                                                • Example
                                                                                                                    • Closing comments

                                                                              Journal of Statistical Software 39

                                                                              of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

                                                                              )prodk

                                                                              (1minusPr(Bk)

                                                                              )sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

                                                                              While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

                                                                              y =

                                                                              (wsum

                                                                              i=1

                                                                              θiWi

                                                                              )y + Xβ + ε (4)

                                                                              ε =

                                                                              (zsum

                                                                              i=1

                                                                              ψiZi

                                                                              )ε+ ν (5)

                                                                              where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

                                                                              40 Social Network Analysis with sna

                                                                              Example

                                                                              To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                              Rgt g lt- rgraph(20)

                                                                              Rgt ep lt- rbeta(20 1 25)

                                                                              Rgt em lt- rbeta(20 15 25)

                                                                              Rgt dat lt- array(dim = c(20 20 20))

                                                                              Rgt for(i in 120)

                                                                              + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                              Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                              Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                              Rgt pem[1] lt- 2

                                                                              Rgt pem[2] lt- 11

                                                                              Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                              Rgt pep[1] lt- 2

                                                                              Rgt pep[2] lt- 11

                                                                              Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                              + epprior = pep burntime = 300 draws = 100)

                                                                              Rgt summary(b)

                                                                              Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                              Multiple Error Probability Model

                                                                              Marginal Posterior Network Distribution

                                                                              a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                              Journal of Statistical Software 41

                                                                              a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                              a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                              Marginal Posterior Global Error Distribution

                                                                              e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                              Marginal Posterior Error Distribution (by observer)

                                                                              Probability of False Negatives (e^-)

                                                                              Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                              42 Social Network Analysis with sna

                                                                              o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                              Probability of False Positives (e^+)

                                                                              Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                              MCMC Diagnostics

                                                                              Replicate Chains 5Burn Time 300

                                                                              Journal of Statistical Software 43

                                                                              Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                              Max 1003116Med 09992194IQR 00004545115

                                                                              Rgt cor(em apply(b$em 2 median))

                                                                              [1] 09187894

                                                                              Rgt cor(ep apply(b$ep 2 median))

                                                                              [1] 0971649

                                                                              Rgt mean(apply(b$net c(2 3) median) == g)

                                                                              [1] 1

                                                                              Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                              Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                              Rgt mean(consensus(dat method = LASintersection) == g)

                                                                              [1] 07725

                                                                              Rgt mean(consensus(dat method = LASunion) == g)

                                                                              [1] 0905

                                                                              Rgt mean(consensus(dat method = centralgraph) == g)

                                                                              [1] 09575

                                                                              Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                              44 Social Network Analysis with sna

                                                                              Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                              For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                              As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                              Rgt w1 lt- rgraph(50)

                                                                              Rgt w2 lt- rgraph(50)

                                                                              Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                              Rgt r1 lt- 02

                                                                              Rgt r2 lt- 03

                                                                              Rgt sigma lt- 01

                                                                              Rgt beta lt- rnorm(5)

                                                                              Rgt nu lt- rnorm(50 0 sigma)

                                                                              Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                              Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                              Rgt fit lt- lnam(y x w1 w2)

                                                                              Rgt summary(fit)

                                                                              Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                              ResidualsMin 1Q Median 3Q Max

                                                                              -052052 -018305 001156 015557 062082

                                                                              CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                              X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                              Journal of Statistical Software 45

                                                                              X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                              Estimate Std ErrorSigma 009597 922e-05

                                                                              Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                              Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                              In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                              3 Closing comments

                                                                              The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                              Acknowledgments

                                                                              The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                              46 Social Network Analysis with sna

                                                                              minus3 minus2 minus1 0 1 2

                                                                              minus3minus2

                                                                              minus10

                                                                              12

                                                                              Fitted vs Observed Values

                                                                              y

                                                                              y

                                                                              minus3 minus2 minus1 0 1 2

                                                                              minus02

                                                                              minus01

                                                                              00

                                                                              01

                                                                              02

                                                                              Fitted Values vs Estimated Disturbances

                                                                              y

                                                                              ν

                                                                              minus2 minus1 0 1 2

                                                                              minus04

                                                                              minus02

                                                                              00

                                                                              02

                                                                              04

                                                                              06

                                                                              Normal QminusQ Residual Plot

                                                                              Theoretical Quantiles

                                                                              Sam

                                                                              ple

                                                                              Qua

                                                                              ntile

                                                                              s

                                                                              Net Influence Plot

                                                                              Figure 6 Plot method output for lnam

                                                                              team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                              References

                                                                              Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                              Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                              Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                              Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                              Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                              Journal of Statistical Software 47

                                                                              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                              48 Social Network Analysis with sna

                                                                              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                              Journal of Statistical Software 49

                                                                              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                              50 Social Network Analysis with sna

                                                                              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                              Journal of Statistical Software 51

                                                                              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                              Affiliation

                                                                              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                              • Introduction and overview
                                                                                • Package history
                                                                                • sna and statnet
                                                                                • Functionality
                                                                                • Terminology and data representation
                                                                                  • Importing relational data into R
                                                                                      • Package highlights
                                                                                        • Random graph generation
                                                                                          • Example
                                                                                            • Visualization and data manipulation
                                                                                              • Neighborhood and ego net functions
                                                                                              • Visualization
                                                                                                • Descriptive indices
                                                                                                  • Node-level indices
                                                                                                  • Graph-level indices
                                                                                                    • Connectivity and subgraph statistics
                                                                                                      • Example
                                                                                                        • Position and role analysis
                                                                                                          • Example
                                                                                                            • Exploratory edge set comparison
                                                                                                              • Example
                                                                                                                • Network inference and process models
                                                                                                                  • Example
                                                                                                                      • Closing comments

                                                                                40 Social Network Analysis with sna

                                                                                Example

                                                                                To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

                                                                                Rgt g lt- rgraph(20)

                                                                                Rgt ep lt- rbeta(20 1 25)

                                                                                Rgt em lt- rbeta(20 15 25)

                                                                                Rgt dat lt- array(dim = c(20 20 20))

                                                                                Rgt for(i in 120)

                                                                                + dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

                                                                                Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

                                                                                Rgt pem lt- matrix(nrow = 20 ncol = 2)

                                                                                Rgt pem[1] lt- 2

                                                                                Rgt pem[2] lt- 11

                                                                                Rgt pep lt- matrix(nrow = 20 ncol = 2)

                                                                                Rgt pep[1] lt- 2

                                                                                Rgt pep[2] lt- 11

                                                                                Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

                                                                                + epprior = pep burntime = 300 draws = 100)

                                                                                Rgt summary(b)

                                                                                Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

                                                                                Multiple Error Probability Model

                                                                                Marginal Posterior Network Distribution

                                                                                a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

                                                                                Journal of Statistical Software 41

                                                                                a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                                a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                                Marginal Posterior Global Error Distribution

                                                                                e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                                Marginal Posterior Error Distribution (by observer)

                                                                                Probability of False Negatives (e^-)

                                                                                Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                                42 Social Network Analysis with sna

                                                                                o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                                Probability of False Positives (e^+)

                                                                                Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                                MCMC Diagnostics

                                                                                Replicate Chains 5Burn Time 300

                                                                                Journal of Statistical Software 43

                                                                                Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                                Max 1003116Med 09992194IQR 00004545115

                                                                                Rgt cor(em apply(b$em 2 median))

                                                                                [1] 09187894

                                                                                Rgt cor(ep apply(b$ep 2 median))

                                                                                [1] 0971649

                                                                                Rgt mean(apply(b$net c(2 3) median) == g)

                                                                                [1] 1

                                                                                Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                                Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                                Rgt mean(consensus(dat method = LASintersection) == g)

                                                                                [1] 07725

                                                                                Rgt mean(consensus(dat method = LASunion) == g)

                                                                                [1] 0905

                                                                                Rgt mean(consensus(dat method = centralgraph) == g)

                                                                                [1] 09575

                                                                                Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                                44 Social Network Analysis with sna

                                                                                Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                                For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                                As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                                Rgt w1 lt- rgraph(50)

                                                                                Rgt w2 lt- rgraph(50)

                                                                                Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                                Rgt r1 lt- 02

                                                                                Rgt r2 lt- 03

                                                                                Rgt sigma lt- 01

                                                                                Rgt beta lt- rnorm(5)

                                                                                Rgt nu lt- rnorm(50 0 sigma)

                                                                                Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                                Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                                Rgt fit lt- lnam(y x w1 w2)

                                                                                Rgt summary(fit)

                                                                                Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                                ResidualsMin 1Q Median 3Q Max

                                                                                -052052 -018305 001156 015557 062082

                                                                                CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                                X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                                Journal of Statistical Software 45

                                                                                X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                Estimate Std ErrorSigma 009597 922e-05

                                                                                Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                3 Closing comments

                                                                                The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                Acknowledgments

                                                                                The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                46 Social Network Analysis with sna

                                                                                minus3 minus2 minus1 0 1 2

                                                                                minus3minus2

                                                                                minus10

                                                                                12

                                                                                Fitted vs Observed Values

                                                                                y

                                                                                y

                                                                                minus3 minus2 minus1 0 1 2

                                                                                minus02

                                                                                minus01

                                                                                00

                                                                                01

                                                                                02

                                                                                Fitted Values vs Estimated Disturbances

                                                                                y

                                                                                ν

                                                                                minus2 minus1 0 1 2

                                                                                minus04

                                                                                minus02

                                                                                00

                                                                                02

                                                                                04

                                                                                06

                                                                                Normal QminusQ Residual Plot

                                                                                Theoretical Quantiles

                                                                                Sam

                                                                                ple

                                                                                Qua

                                                                                ntile

                                                                                s

                                                                                Net Influence Plot

                                                                                Figure 6 Plot method output for lnam

                                                                                team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                References

                                                                                Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                Journal of Statistical Software 47

                                                                                Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                48 Social Network Analysis with sna

                                                                                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                Journal of Statistical Software 49

                                                                                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                50 Social Network Analysis with sna

                                                                                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                Journal of Statistical Software 51

                                                                                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                Affiliation

                                                                                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                • Introduction and overview
                                                                                  • Package history
                                                                                  • sna and statnet
                                                                                  • Functionality
                                                                                  • Terminology and data representation
                                                                                    • Importing relational data into R
                                                                                        • Package highlights
                                                                                          • Random graph generation
                                                                                            • Example
                                                                                              • Visualization and data manipulation
                                                                                                • Neighborhood and ego net functions
                                                                                                • Visualization
                                                                                                  • Descriptive indices
                                                                                                    • Node-level indices
                                                                                                    • Graph-level indices
                                                                                                      • Connectivity and subgraph statistics
                                                                                                        • Example
                                                                                                          • Position and role analysis
                                                                                                            • Example
                                                                                                              • Exploratory edge set comparison
                                                                                                                • Example
                                                                                                                  • Network inference and process models
                                                                                                                    • Example
                                                                                                                        • Closing comments

                                                                                  Journal of Statistical Software 41

                                                                                  a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

                                                                                  a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

                                                                                  Marginal Posterior Global Error Distribution

                                                                                  e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

                                                                                  Marginal Posterior Error Distribution (by observer)

                                                                                  Probability of False Negatives (e^-)

                                                                                  Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

                                                                                  42 Social Network Analysis with sna

                                                                                  o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                                  Probability of False Positives (e^+)

                                                                                  Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                                  MCMC Diagnostics

                                                                                  Replicate Chains 5Burn Time 300

                                                                                  Journal of Statistical Software 43

                                                                                  Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                                  Max 1003116Med 09992194IQR 00004545115

                                                                                  Rgt cor(em apply(b$em 2 median))

                                                                                  [1] 09187894

                                                                                  Rgt cor(ep apply(b$ep 2 median))

                                                                                  [1] 0971649

                                                                                  Rgt mean(apply(b$net c(2 3) median) == g)

                                                                                  [1] 1

                                                                                  Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                                  Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                                  Rgt mean(consensus(dat method = LASintersection) == g)

                                                                                  [1] 07725

                                                                                  Rgt mean(consensus(dat method = LASunion) == g)

                                                                                  [1] 0905

                                                                                  Rgt mean(consensus(dat method = centralgraph) == g)

                                                                                  [1] 09575

                                                                                  Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                                  44 Social Network Analysis with sna

                                                                                  Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                                  For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                                  As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                                  Rgt w1 lt- rgraph(50)

                                                                                  Rgt w2 lt- rgraph(50)

                                                                                  Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                                  Rgt r1 lt- 02

                                                                                  Rgt r2 lt- 03

                                                                                  Rgt sigma lt- 01

                                                                                  Rgt beta lt- rnorm(5)

                                                                                  Rgt nu lt- rnorm(50 0 sigma)

                                                                                  Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                                  Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                                  Rgt fit lt- lnam(y x w1 w2)

                                                                                  Rgt summary(fit)

                                                                                  Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                                  ResidualsMin 1Q Median 3Q Max

                                                                                  -052052 -018305 001156 015557 062082

                                                                                  CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                                  X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                                  Journal of Statistical Software 45

                                                                                  X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                  Estimate Std ErrorSigma 009597 922e-05

                                                                                  Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                  Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                  In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                  3 Closing comments

                                                                                  The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                  Acknowledgments

                                                                                  The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                  46 Social Network Analysis with sna

                                                                                  minus3 minus2 minus1 0 1 2

                                                                                  minus3minus2

                                                                                  minus10

                                                                                  12

                                                                                  Fitted vs Observed Values

                                                                                  y

                                                                                  y

                                                                                  minus3 minus2 minus1 0 1 2

                                                                                  minus02

                                                                                  minus01

                                                                                  00

                                                                                  01

                                                                                  02

                                                                                  Fitted Values vs Estimated Disturbances

                                                                                  y

                                                                                  ν

                                                                                  minus2 minus1 0 1 2

                                                                                  minus04

                                                                                  minus02

                                                                                  00

                                                                                  02

                                                                                  04

                                                                                  06

                                                                                  Normal QminusQ Residual Plot

                                                                                  Theoretical Quantiles

                                                                                  Sam

                                                                                  ple

                                                                                  Qua

                                                                                  ntile

                                                                                  s

                                                                                  Net Influence Plot

                                                                                  Figure 6 Plot method output for lnam

                                                                                  team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                  References

                                                                                  Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                  Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                  Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                  Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                  Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                  Journal of Statistical Software 47

                                                                                  Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                  Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                  Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                  Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                  Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                  Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                  Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                  Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                  Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                  Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                  Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                  Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                  Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                  Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                  Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                  Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                  Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                  48 Social Network Analysis with sna

                                                                                  Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                  Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                  Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                  Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                  Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                  Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                  Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                  Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                  Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                  Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                  Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                  Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                  Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                  Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                  Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                  Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                  Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                  Journal of Statistical Software 49

                                                                                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                  50 Social Network Analysis with sna

                                                                                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                  Journal of Statistical Software 51

                                                                                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                  Affiliation

                                                                                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                  • Introduction and overview
                                                                                    • Package history
                                                                                    • sna and statnet
                                                                                    • Functionality
                                                                                    • Terminology and data representation
                                                                                      • Importing relational data into R
                                                                                          • Package highlights
                                                                                            • Random graph generation
                                                                                              • Example
                                                                                                • Visualization and data manipulation
                                                                                                  • Neighborhood and ego net functions
                                                                                                  • Visualization
                                                                                                    • Descriptive indices
                                                                                                      • Node-level indices
                                                                                                      • Graph-level indices
                                                                                                        • Connectivity and subgraph statistics
                                                                                                          • Example
                                                                                                            • Position and role analysis
                                                                                                              • Example
                                                                                                                • Exploratory edge set comparison
                                                                                                                  • Example
                                                                                                                    • Network inference and process models
                                                                                                                      • Example
                                                                                                                          • Closing comments

                                                                                    42 Social Network Analysis with sna

                                                                                    o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

                                                                                    Probability of False Positives (e^+)

                                                                                    Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

                                                                                    MCMC Diagnostics

                                                                                    Replicate Chains 5Burn Time 300

                                                                                    Journal of Statistical Software 43

                                                                                    Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                                    Max 1003116Med 09992194IQR 00004545115

                                                                                    Rgt cor(em apply(b$em 2 median))

                                                                                    [1] 09187894

                                                                                    Rgt cor(ep apply(b$ep 2 median))

                                                                                    [1] 0971649

                                                                                    Rgt mean(apply(b$net c(2 3) median) == g)

                                                                                    [1] 1

                                                                                    Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                                    Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                                    Rgt mean(consensus(dat method = LASintersection) == g)

                                                                                    [1] 07725

                                                                                    Rgt mean(consensus(dat method = LASunion) == g)

                                                                                    [1] 0905

                                                                                    Rgt mean(consensus(dat method = centralgraph) == g)

                                                                                    [1] 09575

                                                                                    Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                                    44 Social Network Analysis with sna

                                                                                    Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                                    For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                                    As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                                    Rgt w1 lt- rgraph(50)

                                                                                    Rgt w2 lt- rgraph(50)

                                                                                    Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                                    Rgt r1 lt- 02

                                                                                    Rgt r2 lt- 03

                                                                                    Rgt sigma lt- 01

                                                                                    Rgt beta lt- rnorm(5)

                                                                                    Rgt nu lt- rnorm(50 0 sigma)

                                                                                    Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                                    Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                                    Rgt fit lt- lnam(y x w1 w2)

                                                                                    Rgt summary(fit)

                                                                                    Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                                    ResidualsMin 1Q Median 3Q Max

                                                                                    -052052 -018305 001156 015557 062082

                                                                                    CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                                    X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                                    Journal of Statistical Software 45

                                                                                    X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                    Estimate Std ErrorSigma 009597 922e-05

                                                                                    Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                    Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                    In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                    3 Closing comments

                                                                                    The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                    Acknowledgments

                                                                                    The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                    46 Social Network Analysis with sna

                                                                                    minus3 minus2 minus1 0 1 2

                                                                                    minus3minus2

                                                                                    minus10

                                                                                    12

                                                                                    Fitted vs Observed Values

                                                                                    y

                                                                                    y

                                                                                    minus3 minus2 minus1 0 1 2

                                                                                    minus02

                                                                                    minus01

                                                                                    00

                                                                                    01

                                                                                    02

                                                                                    Fitted Values vs Estimated Disturbances

                                                                                    y

                                                                                    ν

                                                                                    minus2 minus1 0 1 2

                                                                                    minus04

                                                                                    minus02

                                                                                    00

                                                                                    02

                                                                                    04

                                                                                    06

                                                                                    Normal QminusQ Residual Plot

                                                                                    Theoretical Quantiles

                                                                                    Sam

                                                                                    ple

                                                                                    Qua

                                                                                    ntile

                                                                                    s

                                                                                    Net Influence Plot

                                                                                    Figure 6 Plot method output for lnam

                                                                                    team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                    References

                                                                                    Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                    Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                    Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                    Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                    Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                    Journal of Statistical Software 47

                                                                                    Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                    Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                    Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                    Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                    Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                    Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                    Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                    Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                    Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                    Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                    Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                    Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                    Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                    Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                    Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                    Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                    Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                    48 Social Network Analysis with sna

                                                                                    Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                    Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                    Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                    Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                    Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                    Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                    Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                    Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                    Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                    Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                    Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                    Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                    Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                    Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                    Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                    Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                    Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                    Journal of Statistical Software 49

                                                                                    J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                    Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                    Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                    Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                    Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                    Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                    Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                    Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                    Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                    Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                    Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                    Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                    Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                    Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                    Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                    Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                    Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                    50 Social Network Analysis with sna

                                                                                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                    Journal of Statistical Software 51

                                                                                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                    Affiliation

                                                                                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                    • Introduction and overview
                                                                                      • Package history
                                                                                      • sna and statnet
                                                                                      • Functionality
                                                                                      • Terminology and data representation
                                                                                        • Importing relational data into R
                                                                                            • Package highlights
                                                                                              • Random graph generation
                                                                                                • Example
                                                                                                  • Visualization and data manipulation
                                                                                                    • Neighborhood and ego net functions
                                                                                                    • Visualization
                                                                                                      • Descriptive indices
                                                                                                        • Node-level indices
                                                                                                        • Graph-level indices
                                                                                                          • Connectivity and subgraph statistics
                                                                                                            • Example
                                                                                                              • Position and role analysis
                                                                                                                • Example
                                                                                                                  • Exploratory edge set comparison
                                                                                                                    • Example
                                                                                                                      • Network inference and process models
                                                                                                                        • Example
                                                                                                                            • Closing comments

                                                                                      Journal of Statistical Software 43

                                                                                      Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

                                                                                      Max 1003116Med 09992194IQR 00004545115

                                                                                      Rgt cor(em apply(b$em 2 median))

                                                                                      [1] 09187894

                                                                                      Rgt cor(ep apply(b$ep 2 median))

                                                                                      [1] 0971649

                                                                                      Rgt mean(apply(b$net c(2 3) median) == g)

                                                                                      [1] 1

                                                                                      Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

                                                                                      Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

                                                                                      Rgt mean(consensus(dat method = LASintersection) == g)

                                                                                      [1] 07725

                                                                                      Rgt mean(consensus(dat method = LASunion) == g)

                                                                                      [1] 0905

                                                                                      Rgt mean(consensus(dat method = centralgraph) == g)

                                                                                      [1] 09575

                                                                                      Rgt mean(consensus(dat method = romneybatchelder) == g)

                                                                                      44 Social Network Analysis with sna

                                                                                      Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                                      For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                                      As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                                      Rgt w1 lt- rgraph(50)

                                                                                      Rgt w2 lt- rgraph(50)

                                                                                      Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                                      Rgt r1 lt- 02

                                                                                      Rgt r2 lt- 03

                                                                                      Rgt sigma lt- 01

                                                                                      Rgt beta lt- rnorm(5)

                                                                                      Rgt nu lt- rnorm(50 0 sigma)

                                                                                      Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                                      Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                                      Rgt fit lt- lnam(y x w1 w2)

                                                                                      Rgt summary(fit)

                                                                                      Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                                      ResidualsMin 1Q Median 3Q Max

                                                                                      -052052 -018305 001156 015557 062082

                                                                                      CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                                      X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                                      Journal of Statistical Software 45

                                                                                      X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                      Estimate Std ErrorSigma 009597 922e-05

                                                                                      Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                      Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                      In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                      3 Closing comments

                                                                                      The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                      Acknowledgments

                                                                                      The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                      46 Social Network Analysis with sna

                                                                                      minus3 minus2 minus1 0 1 2

                                                                                      minus3minus2

                                                                                      minus10

                                                                                      12

                                                                                      Fitted vs Observed Values

                                                                                      y

                                                                                      y

                                                                                      minus3 minus2 minus1 0 1 2

                                                                                      minus02

                                                                                      minus01

                                                                                      00

                                                                                      01

                                                                                      02

                                                                                      Fitted Values vs Estimated Disturbances

                                                                                      y

                                                                                      ν

                                                                                      minus2 minus1 0 1 2

                                                                                      minus04

                                                                                      minus02

                                                                                      00

                                                                                      02

                                                                                      04

                                                                                      06

                                                                                      Normal QminusQ Residual Plot

                                                                                      Theoretical Quantiles

                                                                                      Sam

                                                                                      ple

                                                                                      Qua

                                                                                      ntile

                                                                                      s

                                                                                      Net Influence Plot

                                                                                      Figure 6 Plot method output for lnam

                                                                                      team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                      References

                                                                                      Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                      Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                      Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                      Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                      Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                      Journal of Statistical Software 47

                                                                                      Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                      Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                      Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                      Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                      Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                      Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                      Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                      Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                      Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                      Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                      Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                      Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                      Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                      Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                      Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                      Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                      Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                      48 Social Network Analysis with sna

                                                                                      Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                      Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                      Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                      Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                      Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                      Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                      Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                      Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                      Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                      Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                      Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                      Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                      Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                      Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                      Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                      Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                      Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                      Journal of Statistical Software 49

                                                                                      J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                      Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                      Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                      Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                      Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                      Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                      Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                      Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                      Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                      Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                      Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                      Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                      Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                      Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                      Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                      Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                      Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                      50 Social Network Analysis with sna

                                                                                      Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                      Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                      Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                      Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                      Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                      R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                      Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                      Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                      Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                      Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                      Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                      Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                      Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                      Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                      Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                      Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                      Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                      Journal of Statistical Software 51

                                                                                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                      Affiliation

                                                                                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                      • Introduction and overview
                                                                                        • Package history
                                                                                        • sna and statnet
                                                                                        • Functionality
                                                                                        • Terminology and data representation
                                                                                          • Importing relational data into R
                                                                                              • Package highlights
                                                                                                • Random graph generation
                                                                                                  • Example
                                                                                                    • Visualization and data manipulation
                                                                                                      • Neighborhood and ego net functions
                                                                                                      • Visualization
                                                                                                        • Descriptive indices
                                                                                                          • Node-level indices
                                                                                                          • Graph-level indices
                                                                                                            • Connectivity and subgraph statistics
                                                                                                              • Example
                                                                                                                • Position and role analysis
                                                                                                                  • Example
                                                                                                                    • Exploratory edge set comparison
                                                                                                                      • Example
                                                                                                                        • Network inference and process models
                                                                                                                          • Example
                                                                                                                              • Closing comments

                                                                                        44 Social Network Analysis with sna

                                                                                        Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

                                                                                        For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

                                                                                        As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

                                                                                        Rgt w1 lt- rgraph(50)

                                                                                        Rgt w2 lt- rgraph(50)

                                                                                        Rgt x lt- matrix(rnorm(50 5) 50 5)

                                                                                        Rgt r1 lt- 02

                                                                                        Rgt r2 lt- 03

                                                                                        Rgt sigma lt- 01

                                                                                        Rgt beta lt- rnorm(5)

                                                                                        Rgt nu lt- rnorm(50 0 sigma)

                                                                                        Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

                                                                                        Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

                                                                                        Rgt fit lt- lnam(y x w1 w2)

                                                                                        Rgt summary(fit)

                                                                                        Calllnam(y = y x = x W1 = w1 W2 = w2)

                                                                                        ResidualsMin 1Q Median 3Q Max

                                                                                        -052052 -018305 001156 015557 062082

                                                                                        CoefficientsEstimate Std Error Z value Pr(gt|z|)

                                                                                        X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

                                                                                        Journal of Statistical Software 45

                                                                                        X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                        Estimate Std ErrorSigma 009597 922e-05

                                                                                        Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                        Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                        In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                        3 Closing comments

                                                                                        The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                        Acknowledgments

                                                                                        The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                        46 Social Network Analysis with sna

                                                                                        minus3 minus2 minus1 0 1 2

                                                                                        minus3minus2

                                                                                        minus10

                                                                                        12

                                                                                        Fitted vs Observed Values

                                                                                        y

                                                                                        y

                                                                                        minus3 minus2 minus1 0 1 2

                                                                                        minus02

                                                                                        minus01

                                                                                        00

                                                                                        01

                                                                                        02

                                                                                        Fitted Values vs Estimated Disturbances

                                                                                        y

                                                                                        ν

                                                                                        minus2 minus1 0 1 2

                                                                                        minus04

                                                                                        minus02

                                                                                        00

                                                                                        02

                                                                                        04

                                                                                        06

                                                                                        Normal QminusQ Residual Plot

                                                                                        Theoretical Quantiles

                                                                                        Sam

                                                                                        ple

                                                                                        Qua

                                                                                        ntile

                                                                                        s

                                                                                        Net Influence Plot

                                                                                        Figure 6 Plot method output for lnam

                                                                                        team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                        References

                                                                                        Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                        Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                        Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                        Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                        Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                        Journal of Statistical Software 47

                                                                                        Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                        Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                        Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                        Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                        Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                        Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                        Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                        Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                        Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                        Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                        Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                        Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                        Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                        Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                        Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                        Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                        Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                        48 Social Network Analysis with sna

                                                                                        Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                        Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                        Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                        Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                        Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                        Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                        Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                        Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                        Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                        Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                        Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                        Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                        Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                        Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                        Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                        Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                        Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                        Journal of Statistical Software 49

                                                                                        J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                        Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                        Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                        Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                        Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                        Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                        Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                        Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                        Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                        Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                        Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                        Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                        Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                        Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                        Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                        Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                        Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                        50 Social Network Analysis with sna

                                                                                        Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                        Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                        Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                        Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                        Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                        R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                        Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                        Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                        Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                        Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                        Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                        Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                        Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                        Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                        Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                        Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                        Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                        Journal of Statistical Software 51

                                                                                        Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                        Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                        West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                        White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                        Affiliation

                                                                                        Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                        Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                        Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                        • Introduction and overview
                                                                                          • Package history
                                                                                          • sna and statnet
                                                                                          • Functionality
                                                                                          • Terminology and data representation
                                                                                            • Importing relational data into R
                                                                                                • Package highlights
                                                                                                  • Random graph generation
                                                                                                    • Example
                                                                                                      • Visualization and data manipulation
                                                                                                        • Neighborhood and ego net functions
                                                                                                        • Visualization
                                                                                                          • Descriptive indices
                                                                                                            • Node-level indices
                                                                                                            • Graph-level indices
                                                                                                              • Connectivity and subgraph statistics
                                                                                                                • Example
                                                                                                                  • Position and role analysis
                                                                                                                    • Example
                                                                                                                      • Exploratory edge set comparison
                                                                                                                        • Example
                                                                                                                          • Network inference and process models
                                                                                                                            • Example
                                                                                                                                • Closing comments

                                                                                          Journal of Statistical Software 45

                                                                                          X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

                                                                                          Estimate Std ErrorSigma 009597 922e-05

                                                                                          Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

                                                                                          Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

                                                                                          In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

                                                                                          3 Closing comments

                                                                                          The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

                                                                                          Acknowledgments

                                                                                          The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

                                                                                          46 Social Network Analysis with sna

                                                                                          minus3 minus2 minus1 0 1 2

                                                                                          minus3minus2

                                                                                          minus10

                                                                                          12

                                                                                          Fitted vs Observed Values

                                                                                          y

                                                                                          y

                                                                                          minus3 minus2 minus1 0 1 2

                                                                                          minus02

                                                                                          minus01

                                                                                          00

                                                                                          01

                                                                                          02

                                                                                          Fitted Values vs Estimated Disturbances

                                                                                          y

                                                                                          ν

                                                                                          minus2 minus1 0 1 2

                                                                                          minus04

                                                                                          minus02

                                                                                          00

                                                                                          02

                                                                                          04

                                                                                          06

                                                                                          Normal QminusQ Residual Plot

                                                                                          Theoretical Quantiles

                                                                                          Sam

                                                                                          ple

                                                                                          Qua

                                                                                          ntile

                                                                                          s

                                                                                          Net Influence Plot

                                                                                          Figure 6 Plot method output for lnam

                                                                                          team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                          References

                                                                                          Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                          Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                          Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                          Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                          Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                          Journal of Statistical Software 47

                                                                                          Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                          Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                          Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                          Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                          Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                          Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                          Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                          Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                          Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                          Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                          Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                          Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                          Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                          Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                          Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                          Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                          Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                          48 Social Network Analysis with sna

                                                                                          Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                          Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                          Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                          Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                          Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                          Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                          Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                          Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                          Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                          Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                          Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                          Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                          Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                          Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                          Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                          Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                          Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                          Journal of Statistical Software 49

                                                                                          J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                          Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                          Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                          Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                          Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                          Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                          Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                          Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                          Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                          Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                          Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                          Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                          Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                          Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                          Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                          Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                          Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                          50 Social Network Analysis with sna

                                                                                          Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                          Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                          Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                          Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                          Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                          R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                          Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                          Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                          Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                          Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                          Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                          Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                          Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                          Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                          Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                          Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                          Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                          Journal of Statistical Software 51

                                                                                          Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                          Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                          West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                          White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                          Affiliation

                                                                                          Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                          Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                          Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                          • Introduction and overview
                                                                                            • Package history
                                                                                            • sna and statnet
                                                                                            • Functionality
                                                                                            • Terminology and data representation
                                                                                              • Importing relational data into R
                                                                                                  • Package highlights
                                                                                                    • Random graph generation
                                                                                                      • Example
                                                                                                        • Visualization and data manipulation
                                                                                                          • Neighborhood and ego net functions
                                                                                                          • Visualization
                                                                                                            • Descriptive indices
                                                                                                              • Node-level indices
                                                                                                              • Graph-level indices
                                                                                                                • Connectivity and subgraph statistics
                                                                                                                  • Example
                                                                                                                    • Position and role analysis
                                                                                                                      • Example
                                                                                                                        • Exploratory edge set comparison
                                                                                                                          • Example
                                                                                                                            • Network inference and process models
                                                                                                                              • Example
                                                                                                                                  • Closing comments

                                                                                            46 Social Network Analysis with sna

                                                                                            minus3 minus2 minus1 0 1 2

                                                                                            minus3minus2

                                                                                            minus10

                                                                                            12

                                                                                            Fitted vs Observed Values

                                                                                            y

                                                                                            y

                                                                                            minus3 minus2 minus1 0 1 2

                                                                                            minus02

                                                                                            minus01

                                                                                            00

                                                                                            01

                                                                                            02

                                                                                            Fitted Values vs Estimated Disturbances

                                                                                            y

                                                                                            ν

                                                                                            minus2 minus1 0 1 2

                                                                                            minus04

                                                                                            minus02

                                                                                            00

                                                                                            02

                                                                                            04

                                                                                            06

                                                                                            Normal QminusQ Residual Plot

                                                                                            Theoretical Quantiles

                                                                                            Sam

                                                                                            ple

                                                                                            Qua

                                                                                            ntile

                                                                                            s

                                                                                            Net Influence Plot

                                                                                            Figure 6 Plot method output for lnam

                                                                                            team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

                                                                                            References

                                                                                            Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

                                                                                            Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

                                                                                            Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

                                                                                            Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

                                                                                            Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

                                                                                            Journal of Statistical Software 47

                                                                                            Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                            Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                            Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                            Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                            Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                            Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                            Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                            Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                            Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                            Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                            Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                            Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                            Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                            Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                            Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                            Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                            Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                            48 Social Network Analysis with sna

                                                                                            Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                            Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                            Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                            Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                            Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                            Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                            Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                            Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                            Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                            Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                            Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                            Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                            Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                            Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                            Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                            Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                            Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                            Journal of Statistical Software 49

                                                                                            J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                            Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                            Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                            Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                            Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                            Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                            Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                            Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                            Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                            Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                            Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                            Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                            Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                            Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                            Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                            Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                            Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                            50 Social Network Analysis with sna

                                                                                            Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                            Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                            Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                            Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                            Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                            R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                            Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                            Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                            Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                            Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                            Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                            Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                            Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                            Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                            Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                            Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                            Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                            Journal of Statistical Software 51

                                                                                            Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                            Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                            West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                            White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                            Affiliation

                                                                                            Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                            Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                            Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                            • Introduction and overview
                                                                                              • Package history
                                                                                              • sna and statnet
                                                                                              • Functionality
                                                                                              • Terminology and data representation
                                                                                                • Importing relational data into R
                                                                                                    • Package highlights
                                                                                                      • Random graph generation
                                                                                                        • Example
                                                                                                          • Visualization and data manipulation
                                                                                                            • Neighborhood and ego net functions
                                                                                                            • Visualization
                                                                                                              • Descriptive indices
                                                                                                                • Node-level indices
                                                                                                                • Graph-level indices
                                                                                                                  • Connectivity and subgraph statistics
                                                                                                                    • Example
                                                                                                                      • Position and role analysis
                                                                                                                        • Example
                                                                                                                          • Exploratory edge set comparison
                                                                                                                            • Example
                                                                                                                              • Network inference and process models
                                                                                                                                • Example
                                                                                                                                    • Closing comments

                                                                                              Journal of Statistical Software 47

                                                                                              Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

                                                                                              Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

                                                                                              Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

                                                                                              Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

                                                                                              Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

                                                                                              Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

                                                                                              Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

                                                                                              Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

                                                                                              Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

                                                                                              Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

                                                                                              Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

                                                                                              Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

                                                                                              Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

                                                                                              Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

                                                                                              Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

                                                                                              Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

                                                                                              Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

                                                                                              48 Social Network Analysis with sna

                                                                                              Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                              Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                              Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                              Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                              Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                              Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                              Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                              Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                              Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                              Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                              Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                              Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                              Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                              Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                              Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                              Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                              Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                              Journal of Statistical Software 49

                                                                                              J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                              Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                              Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                              Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                              Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                              Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                              Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                              Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                              Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                              Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                              Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                              Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                              Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                              Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                              Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                              Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                              Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                              50 Social Network Analysis with sna

                                                                                              Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                              Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                              Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                              Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                              Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                              R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                              Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                              Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                              Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                              Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                              Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                              Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                              Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                              Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                              Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                              Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                              Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                              Journal of Statistical Software 51

                                                                                              Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                              Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                              West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                              White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                              Affiliation

                                                                                              Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                              Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                              Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                              • Introduction and overview
                                                                                                • Package history
                                                                                                • sna and statnet
                                                                                                • Functionality
                                                                                                • Terminology and data representation
                                                                                                  • Importing relational data into R
                                                                                                      • Package highlights
                                                                                                        • Random graph generation
                                                                                                          • Example
                                                                                                            • Visualization and data manipulation
                                                                                                              • Neighborhood and ego net functions
                                                                                                              • Visualization
                                                                                                                • Descriptive indices
                                                                                                                  • Node-level indices
                                                                                                                  • Graph-level indices
                                                                                                                    • Connectivity and subgraph statistics
                                                                                                                      • Example
                                                                                                                        • Position and role analysis
                                                                                                                          • Example
                                                                                                                            • Exploratory edge set comparison
                                                                                                                              • Example
                                                                                                                                • Network inference and process models
                                                                                                                                  • Example
                                                                                                                                      • Closing comments

                                                                                                48 Social Network Analysis with sna

                                                                                                Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

                                                                                                Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

                                                                                                Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

                                                                                                Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

                                                                                                Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

                                                                                                Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

                                                                                                Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

                                                                                                Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

                                                                                                Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

                                                                                                Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

                                                                                                Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

                                                                                                Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

                                                                                                Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

                                                                                                Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

                                                                                                Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

                                                                                                Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

                                                                                                Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

                                                                                                Journal of Statistical Software 49

                                                                                                J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                                Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                                Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                                Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                                Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                                Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                                Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                                Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                                Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                                Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                                Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                                Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                                Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                                Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                                Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                                Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                                Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                                50 Social Network Analysis with sna

                                                                                                Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                                Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                                Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                                Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                                Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                                R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                                Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                                Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                                Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                                Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                                Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                                Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                                Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                                Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                                Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                                Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                                Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                                Journal of Statistical Software 51

                                                                                                Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                                Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                                West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                                White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                                Affiliation

                                                                                                Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                                Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                                Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                                • Introduction and overview
                                                                                                  • Package history
                                                                                                  • sna and statnet
                                                                                                  • Functionality
                                                                                                  • Terminology and data representation
                                                                                                    • Importing relational data into R
                                                                                                        • Package highlights
                                                                                                          • Random graph generation
                                                                                                            • Example
                                                                                                              • Visualization and data manipulation
                                                                                                                • Neighborhood and ego net functions
                                                                                                                • Visualization
                                                                                                                  • Descriptive indices
                                                                                                                    • Node-level indices
                                                                                                                    • Graph-level indices
                                                                                                                      • Connectivity and subgraph statistics
                                                                                                                        • Example
                                                                                                                          • Position and role analysis
                                                                                                                            • Example
                                                                                                                              • Exploratory edge set comparison
                                                                                                                                • Example
                                                                                                                                  • Network inference and process models
                                                                                                                                    • Example
                                                                                                                                        • Closing comments

                                                                                                  Journal of Statistical Software 49

                                                                                                  J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

                                                                                                  Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

                                                                                                  Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

                                                                                                  Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

                                                                                                  Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

                                                                                                  Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

                                                                                                  Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

                                                                                                  Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

                                                                                                  Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

                                                                                                  Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

                                                                                                  Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

                                                                                                  Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

                                                                                                  Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

                                                                                                  Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

                                                                                                  Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

                                                                                                  Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

                                                                                                  Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

                                                                                                  50 Social Network Analysis with sna

                                                                                                  Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                                  Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                                  Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                                  Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                                  Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                                  R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                                  Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                                  Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                                  Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                                  Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                                  Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                                  Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                                  Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                                  Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                                  Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                                  Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                                  Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                                  Journal of Statistical Software 51

                                                                                                  Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                                  Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                                  West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                                  White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                                  Affiliation

                                                                                                  Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                                  Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                                  Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                                  • Introduction and overview
                                                                                                    • Package history
                                                                                                    • sna and statnet
                                                                                                    • Functionality
                                                                                                    • Terminology and data representation
                                                                                                      • Importing relational data into R
                                                                                                          • Package highlights
                                                                                                            • Random graph generation
                                                                                                              • Example
                                                                                                                • Visualization and data manipulation
                                                                                                                  • Neighborhood and ego net functions
                                                                                                                  • Visualization
                                                                                                                    • Descriptive indices
                                                                                                                      • Node-level indices
                                                                                                                      • Graph-level indices
                                                                                                                        • Connectivity and subgraph statistics
                                                                                                                          • Example
                                                                                                                            • Position and role analysis
                                                                                                                              • Example
                                                                                                                                • Exploratory edge set comparison
                                                                                                                                  • Example
                                                                                                                                    • Network inference and process models
                                                                                                                                      • Example
                                                                                                                                          • Closing comments

                                                                                                    50 Social Network Analysis with sna

                                                                                                    Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

                                                                                                    Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

                                                                                                    Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

                                                                                                    Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

                                                                                                    Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

                                                                                                    R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

                                                                                                    Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

                                                                                                    Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

                                                                                                    Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

                                                                                                    Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

                                                                                                    Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

                                                                                                    Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

                                                                                                    Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

                                                                                                    Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

                                                                                                    Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

                                                                                                    Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

                                                                                                    Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

                                                                                                    Journal of Statistical Software 51

                                                                                                    Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                                    Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                                    West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                                    White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                                    Affiliation

                                                                                                    Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                                    Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                                    Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                                    • Introduction and overview
                                                                                                      • Package history
                                                                                                      • sna and statnet
                                                                                                      • Functionality
                                                                                                      • Terminology and data representation
                                                                                                        • Importing relational data into R
                                                                                                            • Package highlights
                                                                                                              • Random graph generation
                                                                                                                • Example
                                                                                                                  • Visualization and data manipulation
                                                                                                                    • Neighborhood and ego net functions
                                                                                                                    • Visualization
                                                                                                                      • Descriptive indices
                                                                                                                        • Node-level indices
                                                                                                                        • Graph-level indices
                                                                                                                          • Connectivity and subgraph statistics
                                                                                                                            • Example
                                                                                                                              • Position and role analysis
                                                                                                                                • Example
                                                                                                                                  • Exploratory edge set comparison
                                                                                                                                    • Example
                                                                                                                                      • Network inference and process models
                                                                                                                                        • Example
                                                                                                                                            • Closing comments

                                                                                                      Journal of Statistical Software 51

                                                                                                      Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

                                                                                                      Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

                                                                                                      West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

                                                                                                      White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

                                                                                                      Affiliation

                                                                                                      Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

                                                                                                      Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

                                                                                                      Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

                                                                                                      • Introduction and overview
                                                                                                        • Package history
                                                                                                        • sna and statnet
                                                                                                        • Functionality
                                                                                                        • Terminology and data representation
                                                                                                          • Importing relational data into R
                                                                                                              • Package highlights
                                                                                                                • Random graph generation
                                                                                                                  • Example
                                                                                                                    • Visualization and data manipulation
                                                                                                                      • Neighborhood and ego net functions
                                                                                                                      • Visualization
                                                                                                                        • Descriptive indices
                                                                                                                          • Node-level indices
                                                                                                                          • Graph-level indices
                                                                                                                            • Connectivity and subgraph statistics
                                                                                                                              • Example
                                                                                                                                • Position and role analysis
                                                                                                                                  • Example
                                                                                                                                    • Exploratory edge set comparison
                                                                                                                                      • Example
                                                                                                                                        • Network inference and process models
                                                                                                                                          • Example
                                                                                                                                              • Closing comments

                                                                                                        top related