Complexity as a Methodology, Point of View, Theory Bruce Kogut EIASM and Oxford University June 2006.

Complexity as a Methodology, Point

of View, Theory Bruce Kogut

EIASM and Oxford University

June 2006

What Complexity Seems

to Mean In Practice Interdisciplinary sharing of knowledge and creating a larger community

of scholarship.

Appreciation of the ‘a-linear’ view of the world

The importance of ‘events’ for triggering change.

Analyzing the statistical properties of large datasets

Understanding local interactions by micro-rules whose effects depend on topology (structure) but whose interpretations rely upon contextual knowledge.

More attention to what Elster and Hedstrom call ‘mechanisms’ as opposed to ‘causes’

Research Strategies: 1. Some Old one, 2. Some opportunistic ones, 3. Some new ones.

1. Greco-Latin Squares to Charles Ragin’s comparative methods

2. Borrowed simulation ‘structures and topologies’

3. Graph dynamics relying on new estimation techniques

Example of (1): Old Method Rethought

• In economics and management, we would like to determine the complementarities, or interactions, that compose ‘best practices’ to improve performance.

• Economics gave us an elegant analysis of complementarities but poor methods.

• The empirical problem of complementarities is saturating an experimental design. (This is identical to the theory of monotone comparative statics: the power set of combinations has to be tested for its effect on performance.)

Example of (2): a useful opportunistic strategy is

the NK model applied to complementarities • Consider a NK model in which a technological landscape is hardwired (the number of

nodes is given, they are connected, k-the interactions-are given: but … N and K can be varied. Fitness values are randomly assigned to nodes and hence to their combinations).

– Random boolean nets have been useful in biochemistry in which there are rules of ‘and’, ‘or’, ‘not and’. (However, from genes to phenotypic expression, there are many things that intervene: RNA, proteins. And fitness can be ‘endogenous’: my fitness can depend on how fit is your fitness in a given space.)

– We know a priori the central results from simulations in other fields: there are finite multiple optima that for some k (such as k=2) has known expected values. We also know much about search time to optima (this has received a lot of attention in science and little in social science: is the rate by which we have gotten to where we are ‘explainable’?.

– The apparatus sneaks in a language: long jumps, landscapes, iterations, that imply firms are engaged in search over a technological terrain that awaits to be discovered. (This is not social construction.)

– Unfortunately, it is hard to feed data to the model.

Example of (3) is the application of graphs to understanding data

1. We now have a much appreciation that static representations of networks cannot easily isolate ‘endogeneity’ (I smoke because I am weak or because you smoke) but more importantly cannot easily identify ‘social rules’.

2. We have though a better understanding even for static graphs that some properties are consistent or inconsistent with important social behaviors.

• For example, we know that the absence of a power law in degrees is inconsistent with ‘preferential attachment’. Since preferential attachment is a reasonable way to represent such concepts as ‘prestige’ and ‘reputation’, a non-finding is important.

3. We still have a long way to go regarding ‘estimations’: • our models are convenient (even if very hard: exponential random graph models)• It is hard to rule out other explanations not specified.

4. Exciting space (for me) is the combination of estimation and simulation to arrive at better understandings of the ‘possible’ interpretations.

• Formal models will also be critical.

Return to Example (1): What can old methods say to complexity?

Consider a question:

What are good corporate and labor institutions for generating growth?

Some argue that there are two prototypes:

Coordinated (e.g. Germany) and market (e.g. US) and each are good for growth (?)

Here are data from Hall and Gingerich who want to show there are 2 best configurations for setting policy

COUNTRY Growth Degree of wage coordination

Level of wage coordination

Labor turnov

er

Shareholder power Stock market size

Dispersion of control

Austria 1 1 1 1 1 1 1

Germany 1 1 1 1 1 1 1

Italy 1 1 0 1 1 1 1

Belgium 1 0 1 1 1 1 1

Norway 1 1 1 1 0 1 1

Finland 1 1 1 0 1 1 1

Portugal 1 0 1 1 1 1 1

Sweden 0 0 1 1 1 0 1

France 0 0 1 1 1 1 1

Denmark 1 1 1 0 1 1 0

Japan 1 1 1 1 0 0 0

Netherlands 0 0 1 1 1 0 1

Switzerland 0 1 1 1 1 0 0

Spain 1 0 0 0 0 1 1

Ireland 1 0 0 0 0 1 0

Australia 0 0 0 0 0 0 0

New Zealand 0 0 0 0 0 0 0

Canada 0 0 0 0 0 0 0

United Kingdom 0 0 0 0 0 0 0

United States 0 0 0 0 0 0 0

The coordination dichotomies are all coded in the same direction, with a score of 1 signaling conformity with “coordinated” market economies and a score of 0 signaling conformity with “liberal” market economies.

Observations on the Data

• N of 20 countries and yet 6 variables, hence 64 possible combinations (2^6).

• Of these 64, only 15 are uniquely observed.

• We are making inferences based on a poorly populated space.– Sparseness may reflect the operation of a maximizing hand that

rules out inefficient (?) combinations.– It may reflect “path dependency” and hence the paths not taken

(even if perhaps better).– It may reflect cultural preferences that rule out certain

institutions, such as stock markets historically in some countries.

Absorption: A+ AB = A

Reduction: AB + Ab = A(B+b) = A(1) = A

Approach One: Crisp Logic:We can try to find good causal ‘combinations’ by

borrowing from electrical engineering.

Advantage of this method: it is simple and intuitive.

Problem is that with too much sparseness, we won’t get much simplicity.

Using three logical gates (join, meet, null), what are the minimal circuits you need, or what are the fewest elements you need to ‘cause’ performance.

Solution for High Growth/Low Initial GDP per capita, without simplifying assumptions:

1. degreewc levelwc turnover sharehld STOCKMKT + 2. DEGREEWC LEVELWC turnover SHAREHLD

STOCKMKT + 3. DEGREEWC LEVELWC TURNOVER STOCKMKT

DISPERSN + 4. DEGREEWC TURNOVER SHAREHLD STOCKMKT

DISPERSN + 5. LEVELWC TURNOVER SHAREHLD STOCKMKT

DISPERSN + 6. DEGREEWC LEVELWC TURNOVER sharehld

stockmkt dispersn

Simplifying AssumptionsConsider the case where there are two solutions:

ABC + ABcNo reduction is possible.

If we permit two assumptions, we can achieve a simplification:

Y = ABC + aBc + ABc + aBC= (ABC + aBC) + (aBc + ABc)= (BC) + (Bc)= B

This is a type of simulation, but done by intuition –called theory– on unobservables.

It is a theory that explicitly reduces the complexity: It posits, ‘let’s imagine that if we had the data or if nature had been more experimental, we would indeed observe two cases with positive outcomes. These cases are aBc and aBC. Once we do this, we arrive at B.

This is very similar to Michael Hannan’s recent work in propositional logic in which premises are fed to a computer program that derives logical propositions. We simply say: let’s use nature as far as we can to ‘infer’ propositions and then add in theory to derive more simple expressions.

Solution for Low Growth/High Initial GDP per capita,

C. without simplifying assumptions:

1. degreewc levelwc turnover sharehld stockmkt dispersn + 2. DEGREEWC LEVELWC TURNOVER SHAREHLD stockmkt

dispersn + 3. degreewc LEVELWC TURNOVER SHAREHLD stockmkt

DISPERSN

D. with simplifying assumptions:

1. degreewc stockmkt +

2. SHAREHLD stockmkt

Let’s do better by understanding more clearly the Limited Diversity in data

Table 6: Mapping Limited Diversity and Assessing Simplifying Assumptions*

Configurations of Labor Institutions

dlt dlT dLt Dlt dLT DlT DLt DLT

psc 5 0 0 0 0 0 0 1

psC 0 0 0 0 0 0 0 0

pSc 1 0 0 0 0 0 0 0

Psc 0 0 0 0 0 0 0 1

pSC 1 0 0 0 0 0 0 1

PsC 0 0 0 0 2 0 0 0

PSc 0 0 0 0 0 0 1 0

PSC 0 0 0 0 3 1 1 2

Corporate Institutions (upper case denotes corporatist elements):

P = low shareholder power; p = high shareholder power

S = small stock market; s = large stock market

C = low dispersion of control; c = high dispersion of control

Labor Institutions (upper case denotes corporatist elements):

D = high degree of wage coordination; d = low degree of wage coordination

L = high level of wage coordination; l = low level of wage coordination

T = low level of labor turnover; t = high level of labor turnover

Configurations of Corporate Institutions

*Shaded portion of the table shows cells covered by the equation for high growth.

Logical exploration of the Not-Observed1. Reconsider the result for low growth:

low_growth = degreewc*stockmkt + SHAREHLD*stockmkt

2. We did not though combine our knowledge of what determines ‘low growth’ with that for what determines ‘high growth’. We can do this by…

Apply De Morgan’s Law by reversing the outcome, changing all upper-case to lower-case, and vice verse, and then also changing intersection to union, and vice versa:

high_growth = (DEGREEWC + STOCKMKT)*(sharehld + STOCKMKT)

We have arrived now at the maximal saturation of our experimental design, filling in as many of the 64 cells that we can.

After maximal saturation…

3. Finally, simplify the terms using Boolean algebra:

[hg=D*sh + DS + Ssh + SS.

By absorption rule and since SS= 1*1= 1=S,

high_growth = STOCKMKT + DEGREEWC*sharehld

4. And if we are not happy with two explanations, we can theorize what we should observe by simplifying assumptions and reduce further.

This is a combination of an incomplete saturated design methodology that analyzes complex non-linear interactions by a combination of logic, theory, and simulation using DATA.

Example Two Reviewed: NK model• NK models impose a large penalty on experimentation:

– Landscapes are rugged and organizations easily get trapped.

– Long-jumps are random.

• Consider Fontana’s idea of Neutrality (and the implementation by Lobo, Fontana, and Miller)– Fitness is discretized into bands such that organizations are inert

to small changes in fitness caused by experiments in complements.

– However, for large changes in fitness, experiments can lead to adoption of new configurations.

Simulating Neutrality in the Kauffman/Levinthal NK Model (Amit Jain Implementation and Simulation)

Simulation run for neutrality in rugged landscapes N= 10 K=2 Number of organizations = 100 Number of time periods = 100 Number of runs = 50 Only local search Landscape does not change (p=0) Runs made for M=? (in program this is M=0) (standard N-K), 10, 25, 100 Time periods in graph 1-100 Standard N-K run (M = ? /0) 100-200 M=10 200-300 M=25 300-400 M=100

Four simulations are run: first panel is the standard, the next 3 vary neutrality from fitness bands of 10%, 25%, 100%: that is, change only if change in fitness hits the band.

Under Neutrality, Organizations Discover ‘Ridges’ Between PeaksComparing results:

1. Fitness value is higher under neutrality (for these number of simulations).In other words, local traps are less confining.

2. More organizational forms are ‘viable’ over the short-run.3. We believe, but are checking, to show that there is more exploration of possible space

if N=10, then we have 1024 combinations. But with only a 100 organizations, how many combinations are actually explored in a period of time. Here we are returning to the type of questions: how long should it take to see a possible universe realized?

4. We still don’t how, nor do we think we know how, to fit data to this simulation.

Neutrality is a reasonable concept by which to capture the capability of firms to learn by trial and error before engaging in massive ‘retooling’ or ‘reengineering’.

It also captures the idea of ‘institutions’ and ‘institutional transplants’: many institutions can cross borders because they are ‘neutral’.

Example (3): Topologies, Graphs, Data, Inferences

1. Science of the complexity should be engagement of theory, data, estimation, simulation, imagining the possible.

2. To understand ‘large complex systems’, we need a lot of data.

1. A lot of what we do uses small data sets from which we try to make claims about asymptotic significance.

3. Alternatively, we can see social action as agent driven who are interacting by rules.

1. We would like to liberate them from strong topological impositions (e.g. regular graphs, or NK landscapes) but still come to understand the relation of local and macro structures on behavior, and vice versa.

Analyses of large data sets: Venture Capital In US

• We know little about entrepreneurial activities in terms of network dynamics.

• Many good studies on venture capital, but we have no global picture.

• We have no studies on dynamics.

Theories on VCsTwo common hypotheses:1. VCs do deal to signal prestige: this should lead to prestigious getting more rich.

Graph prediction: power law in degree.

2. VCs do deals to find ‘complements’ in expertise. Graph prediction: power law in ‘weighted link strength’.

Implication of (1) for components and clusters:

Venture capital is ‘clustered’ in geographies and a few prestigious companies come later to bridge them.

Implication of (2) for components and clusters:

VC firms will seek new partners when new expertise is required and we will thus see ‘repeated ties’ for investments in known areas and ‘new ties’ for investments in new areas.

Thus we will have a dynamic between the conservational rule of relying on proven expertiese and the diversity rule of seeking new partners.

Deal structure• Over 150,000 transactions over 40 years.

• Several thousand VC investors, targets

• Let’s start by posing a simple question:– Do regional markets grow and then become

integrated?– Or is Braudel right: regions develop in relation

to global (national) dynamics.

Deals distribution among Firms

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Percentage of Firms

Per

cen

tag

e o

f Dea

ls

Number of Deals

-

5,000

10,000

15,000

20,000

25,000

30,000

Year

Nu

mb

er o

f Dea

ls

0

1

2

3

4

5

6

Nu

mb

er o

f dea

ls p

er F

irm

Number of Deals Num Deals per Firm

High number of High number of deals per Firmdeals per Firm

Technological breaks create opportunities for new entrants.

National Component Grew Early and Connected Regions and Sectors: So much for Clusters

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1962 1965 1968 1971 1974 1977 1980 1983 1986

Year

Per

cen

tag

e o

f N

od

es

Sectors Covered by the Giant

Geographies Covered by the Giant

Size of the Giant Component

We do not find a power law in degrees: VC syndications don’t seem to be the product of ‘preferential attachment’

Fre

q7

5

K1 61

1

167.387

Fre

q8

0

K1 140

1

378.742

Fre

qu

en

cy9

0

K1 623

.694947

1527.65

Fre

qu

en

cy

K1 1104

.223825

3416.19

Inference by adduction: the dog did not bark, the graph does not have power law in degree, hence the culprit of rich get richer is innocent and released.

We do have Power Laws in Strength: Incumbents like to rely upon trusted partners

• Most Deals are Incumbent to Incumbent

• Hence we find power laws in repeated ties.

• Trusted expertise based on experience, not signalling of prestige, seems to matter.VC networks have far more repeated ties than Guimera Uzzi et al Broadway netorks.

1

10

100

1000

10000

1 10 100 1000 10000 100000

Strength sF

req

uen

cy

Percentage of Deals Where ‘Local’ cluster is greater than global

0%

10%

20%

30%

40%

50%

60%

1961 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004

Year

Per

cen

tag

e o

f d

eals

Geography Sector

In other words, clusters are stronger globally than ‘within’ region or sector.

New Links are Formed when…• A VC company goes

to a new geography or sector– that is, when it

needs new expertise.

• VC firms are drawn to successful targets.

Diversification VC Degree

Target Degree

Year Effects

New geography -1.577 -1.504 -1.559 -1.57 -0.023 -0.023 -0.023 -0.024 New sector -1.723 -1.613 -1.663 -1.669 -0.029 -0.029 -0.029 -0.029 Interaction 1.502 1.449 1.483 1.499 -0.043 -0.043 -0.043 -0.044 Firm degree 0.002 0.002 0.002 0 0 0 Target degree -0.046 -0.056 -0.002 -0.002 Constant 1.398 1.24 1.487 23.439

-0.015 -0.018 -0.02 -

26,313.25

Conclusions to Example 3• What we showed:

1. Looking at dynamics of graph properties rules out certain micro behaviors.

2. Clustering and giant component analysis confirms the Braudel hypothesis: clusters develop in relation to the national graph.

• What we did not show:

1. A formal model of the choice between new and old tie that is the equivalent to the Simon/Barabasi model of preferred attachment.

2. Agent based models that test more precisely the micro rules employed.1. Why not shown: I don’t think

we have a good empirical model yet.

2. We are out of time.

Caveats• Social systems are harder than physical systems if we play only by

the rules of the latter.– We don’t ask an electron ‘where y’a been and when were y’a d’ere?’– We can ask people this question.

• Physical systems have given topologies: – forests are reasonably viewed as 3 dimensional lattices.– American suburbs are often 2 dimensional lattices, but Paris is not, and

people move around.– Geographical space is not always the same as social space.

• Engineers often like to get rid of people because the problem is hard enough.– Systems are most often, even today, socio-technical.– Machines and people inhabit the same graph.

Interactions in physical systems. High Power Items - Jet EngineAdding in People makes this much harder

Function is physicaland cannot be representedlogically and symbolically

High power

Severe back-loading

Interfaces mustbe tailored to fct

Modules displaymultiple behaviors

in multiple energy domains

Modules are indepin design

Module behaviorchanges when

combined into system

Modules mustbe validated physically

Modules must bedesigned anew

specifically for their function

Side effects arehigh power

and can’t be isolated

Separate module and systemvalidation steps are needed

Systems cannot be designed withgood confidence that they will work

A construction processexists that eliminates

most assembled interfaces

The design can beconverted to a picture

The picture is an incomplete abstract

representation of the design

Main fct carrierscan’t be standardized

From Whitney, MIT.

A conclusion• Complexity is a point of view that the pursuit of

plausibility is more rewarding than certainty.

• Social sciences needs to move to an open science model, where we spend more time in projects, less time collecting data.

• Simulations and estimation should be seen as part of the interpretative methodology to identify plausible mechanisms as opposed to verify causes.

• Interactions, rules, non-saturated designs, simulations, estimations, graph theory are the words in the new vocabulary.

• But the going will not be easy… Consider Wings and Engines and ….. People

Appendix: If Time Permitted: Extend This Method of Experimental Design and Simulated and Real Data to

Complementarities in Manufacturing

• Consider activity systems that describe how auto companies manufacture efficiently with quality, including the work teams and social organization.

• Can we identify better ‘prototypical’ strategies that are robust across settings?

Strategy and PrototypesConsider strategy as the problem of choosing capabilities

and markets, that is the sets C and M are the givens to the decision choose CxM such that S* = argmax(C,M).

This can rarely if ever be solved, so that people think heuristically instead by prototypes that represent the “best configuration”: differentiate, cut cost, have religion.

This formulation is close to the theory of complementarities a la Milgrom and Roberts.

The empirical question is: Can we pick out the best configurations from the data?

Fuzzy Sets: • Data are no longer “crisp”.• Important consideration is coding and functional

transformations.• Rules are set theoretic: a necessary condition means

that the outcome is a subset of the condition; a sufficient means that the condition is a subset of the outcome.

• Values are calculated for combinations using fuzzy set algebra. These values are compared to value of the outcome. If outcome value larger, then indicates combination/element is sufficient; if smaller, then combination/element is necessary.

Benchmarking the fuzzy configuration against Data

1.0.8.6.4.20.0-.2

1.2

1.0

.8

.6

.4

.2

0.0

Act

ual P

rodu

ctiv

ity

Predicted Productivity 1.0.8.6.4.20.0-.2

1.0

.8

.6

.4

.2

0.0

The data are 70 or so auto plants around the world and consist of observations on teams, technologies, work processes, scale, etc.. These practices were analyzed to find unique combinations of ‘minimal’ practices sufficient to achieve performance.

Predicted Quality

Actu

al Q

ua

lity

Complexity as a Methodology, Point of View, Theory Bruce Kogut EIASM and Oxford University June 2006.

Documents

social rules

social science

fitness values

nk model

social construction

local interactions

static graphs

important social behaviors