Top Banner
OpenCorporates Co-Director Mapping Mapping Corporate Spra wls Tony Hirst Dept of Communications and Systems, The Open University
82
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mapping Corporate Networks With OpenCorporates

OpenCorporatesCo-Director Mapping

Mapping

Corporate

Sprawls

Tony HirstDept of Communications and Systems,

The Open University

Page 2: Mapping Corporate Networks With OpenCorporates

As company filings start to appear as open data, opportunities may arise for watchdogs to start mining this data in support of their investigations and monitoring activities.

This presentation introduces several ideas relating to mapping network structures in order to learn something about the structure of “corporate sprawls”, corporate groupings defined on the basis of co-director relationships.

Page 3: Mapping Corporate Networks With OpenCorporates

Social Media

MappingIntr

oduc

ing

“Gra

phs”

Page 4: Mapping Corporate Networks With OpenCorporates

To introduce the idea of a network map, let’s have a look at a view we can construct over the Twitter social space…

Page 5: Mapping Corporate Networks With OpenCorporates

Emer

gent

Soc

ial P

ositi

onin

g

Page 6: Mapping Corporate Networks With OpenCorporates

This network maps shows Twitter users who are commonly followed by the followers of @TOGYnews

Although hard to see at this scale, the map is actually constructed from labeled points connected by lines (in the jargon, “nodes connected by edges”).

The algorithm used to position the labeled nodes tries to place nodes that are heavily connected to each other close to each other. In a sense, we can view the diagram as a map, with regions that are highlighted using false colours identifying clusters of nodes that may in some sense be similar to each other based on the sharing of common followers.

Page 7: Mapping Corporate Networks With OpenCorporates

A

B

Is followed by

Follows

Is followed byfocus

Find

the

follo

wer

s

Page 8: Mapping Corporate Networks With OpenCorporates

The map is constructed using data grabbed from the Twitter API.

Using one or more “focus” users (a specific Twitter account, for example, or the set of users of a particular hashtag), we grab a list of their followers.

Page 9: Mapping Corporate Networks With OpenCorporates

A

B

Is followed byFollo

ws

Followspeer

peerFollows

Is followed byfocus

Find

Frie

nds

of F

ollo

wer

s

Follows

Page 10: Mapping Corporate Networks With OpenCorporates

For each of the followers, we grab a list of their friends (or a sample thereof) – that is, a lists of some or all of the people they follow on Twitter.

We can use this data to construct a network of people followed by the followers of the original focus.

It is typically at this point, where there is most relational information contained within the network, that we lay it out using automatic layout tools.

Page 11: Mapping Corporate Networks With OpenCorporates

A

B

Is followed byFollo

ws

peerFollows

Is followed byfocus

Find

Com

mon

Frie

nds

of F

ollo

wer

s

Follows

Page 12: Mapping Corporate Networks With OpenCorporates

Drawing on the insight that people on Twitter are likely to follow accounts that are of interest to them, we can start to imagine the network as a projection of the interests of the people who are interested in one or more of the things the focus is associated with.

However, interests of followers may spread to a wide range of topics, so we look for consistency of interest, pruning the network to remove people who are not commonly followed by the followers of the focus. That is, we remove nodes who are followed by only a few of the followers of the focus.

Page 13: Mapping Corporate Networks With OpenCorporates

peerFollows

focus

Filte

r out

not

com

mon

ly fo

llow

ed

Page 14: Mapping Corporate Networks With OpenCorporates

Having laid out the network map, we might now tidy it up a little by removing all the nodes that are not themselves followed by a significant number of the followers of the original focus,

Page 15: Mapping Corporate Networks With OpenCorporates

Emer

gent

Soc

ial P

ositi

onin

g

Page 16: Mapping Corporate Networks With OpenCorporates

The result is a map that shows groups of people positioned according to the shared projected presumed interests of their followers.

Page 17: Mapping Corporate Networks With OpenCorporates

A M

ore

Prin

cipl

ed A

ppro

ach

Page 18: Mapping Corporate Networks With OpenCorporates

It may also be possible to use metadata associated with social networks to develop additional insights.

A recent paper describes one way of mining social network data for information about people working for a particular company, and using public biographical information along with social connection data to map out the organisational structures of large companies.

Page 19: Mapping Corporate Networks With OpenCorporates

Corporate Structure

MapsIntr

oduc

ing

“Gra

phs”

Page 20: Mapping Corporate Networks With OpenCorporates

A more principled way of looking at corporate structures at a company level may possibly be derived from publicly available corporate information.

Page 21: Mapping Corporate Networks With OpenCorporates

C3

C1C2

D1

D3D2

Com

pani

es &

Dire

ctor

s

Page 22: Mapping Corporate Networks With OpenCorporates

For example, if we can get hold of directorial appointment and termination data, we can start to construct maps that who how companies are connected by common directors, as well as which companies are co-directed by particular directors.

As with the emergent social positioning network maps, if particular directors have particular corporate interests, we may be able to identify particular organisational groupings in corporate sprawls made up from dozens of operating companies working across a range of business areas.

Page 23: Mapping Corporate Networks With OpenCorporates

Com

pany

Rec

ords

on

Ope

nCor

pora

tes

Page 24: Mapping Corporate Networks With OpenCorporates

One possible source of open company information is OpenCorporates.

OpenCorporates’ ambitious aim is to mint a unique corporate identifier for every corporate legal entity in the world [CHECK], as well as collating, and normalising (or “harmonising”) company information about company filings, trademarks, patents(?) and officers (that is company directors, company secretaries and so on).

For GB registered companies, there is a growing repository of data relating to company directorships, which provides us with an opportunity to develop maps that show how companies are connected by virtue of having common directors.

Page 25: Mapping Corporate Networks With OpenCorporates

Subs

idia

ry C

ompa

nies

hav

e “w

orki

ng”

dire

ctor

s

Page 26: Mapping Corporate Networks With OpenCorporates

Just a note – my experience in looking at data related to GB registered companies suggests that the directors of the “top”/nominal company in a large multinational grouping are “atypical” compared to the officers appointed to UK based operating companies in the same corporate sprawl, being appointed from the great and the good, or from senior officers who do not take directorships across operating divisions or companies, rather than representing directors of operating companies.

When seeding corporate sprawl trawlers – algorithms that try to identify companies that make up a corporate sprawl based on co-directorships – my experience suggests that it often makes sense to see the search with one or more operating companies who have directors that are likely to be directors of other operating companies, rather than the “top level” company.

Page 27: Mapping Corporate Networks With OpenCorporates

Co-DirectorMappingMor

e G

raph

s

Page 28: Mapping Corporate Networks With OpenCorporates

We can reuse the ideas that underpin the construction of the emergent social positioning graph to map out corporate structures based on director information.

Page 29: Mapping Corporate Networks With OpenCorporates

Dire

ctor

Rec

ords

on

Ope

nCor

pora

tes

Page 30: Mapping Corporate Networks With OpenCorporates

As well as corporate information pages, OpenCorporates maintains information pages about directorial appointments. At the moment, there are no authority files providing identifiers that identify the same physical person – each directorial appointment to company provides the director with a unique officer ID. It is possible to search for officers of other companies with the same name as a particular director, but no identifiers that link them as the same physical person. (That said, there does appear to be a slot in the metadata for authoritative identifiers.)

Page 31: Mapping Corporate Networks With OpenCorporates

Star

t With

One

or M

ore

Seed

Com

pany

Page 32: Mapping Corporate Networks With OpenCorporates

So how might we go about constructing a corporate sprawl?

Let’s start with one or more seed company.

Page 33: Mapping Corporate Networks With OpenCorporates

C1

D1Follows

Has directorD2

Find

Frie

nds

of F

ollo

wer

s

Has director

Page 34: Mapping Corporate Networks With OpenCorporates

The general shape of this diagram might remind you of something…?

For each of the seed companies, we grab a list of their directors.

We can use this data to construct a network of people who are directors or other officers of the original seed company or companies.

Page 35: Mapping Corporate Networks With OpenCorporates

Find

Dire

ctor

s of

See

d Co

mpa

ny(s

)

Page 36: Mapping Corporate Networks With OpenCorporates

Here’s another way of imagining it – a company surrounded by its directors.

Page 37: Mapping Corporate Networks With OpenCorporates

C1

C2

Is directed by

Follows

Has directorD2

Find

Frie

nds

of F

ollo

wer

s

Has director

Is dire

cted by

D1

Page 38: Mapping Corporate Networks With OpenCorporates

For each of the directors, we run a search for them on OpenCorporates, to see what directorial appointments have been made to other companies for people of exactly the same name.

We can use this data to construct a network of companies directed by the directors of the original seed company.

For those companies that are directed by N or more of the directors associated with the seed company or companies (where N is typically 2) we might now say they are part of the corporate sprawl. The companies sharing fewer than N directors associated with companies admitted to the corporate sprawl are added to a list of possible candidate companies. As we find more directors associated with companies included in the sprawl, we might be able to “legitimise” membership of these companies within the sprawl.

Page 39: Mapping Corporate Networks With OpenCorporates

Find

Com

pani

es W

ith T

wo

or M

ore

Seed

Dire

ctor

s

Page 40: Mapping Corporate Networks With OpenCorporates

We now have a larger set of companies, reflecting those companies who share N or more directors with the original seed company or companies.

Page 41: Mapping Corporate Networks With OpenCorporates

C1

C2

Has director

Has dire

ctor

Has directorD3

D1Follows

Has directorD2

Find

Frie

nds

of F

ollo

wer

s

Has director

Page 42: Mapping Corporate Networks With OpenCorporates

If we so decide, we can continue with this snowball discovery process, looking up further directors associated with companies we have included in our sprawl, with a view to trying to discover more companies that should be included in the sprawl.

Page 43: Mapping Corporate Networks With OpenCorporates
Page 44: Mapping Corporate Networks With OpenCorporates

Using this snowball approach, I have constructed a scraper on Scraperwiki that mines OpenCorporates, given one or more seed companies (or seed directors) to map out corporate sprawls, limiting myself to the capture of current directors and active companies registered in the UK.

(The code needs checking and is perhaps not as easy to use as it might be. Developing a more robust and user friendly tool may be worth exploring if this approach is seen to be useful.)

Page 45: Mapping Corporate Networks With OpenCorporates

C3

C1C2

D1

D3D2

Com

pani

es &

Dire

ctor

s

Page 46: Mapping Corporate Networks With OpenCorporates

So – we can generate a network that connects companies with their directors, and grow this network out to identify companies that share several directors.

As with the emergent social positioning map, we can use automatic layout tools to try to position companies and directors close to each other based on their connectivity, producing a map over the corporate sprawl.

Page 47: Mapping Corporate Networks With OpenCorporates

C3

C1C2

Com

pani

es

Page 48: Mapping Corporate Networks With OpenCorporates

We can view this network in various ways. For example, we might choose to view just the companies.

Page 49: Mapping Corporate Networks With OpenCorporates

Page

Rank

Page 50: Mapping Corporate Networks With OpenCorporates

This map shows companies in a corporate sprawl grown out from Royal Dutch Shell.

Note the presence of BP in there – somehow, these two groupings are connected by shared directorships of some intermediate company.

Page 51: Mapping Corporate Networks With OpenCorporates

C3

C1C2

D1

D3D2

Com

pani

es &

Dire

ctor

s

Page 52: Mapping Corporate Networks With OpenCorporates

One of the nice things about representing this sort of structure in an abstract mathematical or computational way is that we can wrangle it with code...

So for example, companies C1 and C2 are connected by a single shared director, whereas C2 and C3 are connected by two directors.

Page 53: Mapping Corporate Networks With OpenCorporates

C3

C1C2

Com

pani

es S

harin

g D

irect

ors

Page 54: Mapping Corporate Networks With OpenCorporates

We can represent this by transforming the original bipartite (two types of node) graph that connects directors to companies and companies to directors by a graph that just connects companies who were connected by directors.

The thickness of the line (or “edge”) connecting the companies represents its “weight”, which in this case is given by the number of shared directors between connected companies.

Page 55: Mapping Corporate Networks With OpenCorporates

C3

C2

Com

pani

es S

harin

g Tw

o or

Mor

e D

irect

ors

Page 56: Mapping Corporate Networks With OpenCorporates

We can also filter the graph, for example by adding together the weights of all the edges incident on a node, and throwing away all nodes for whom this sum is below a specified threshold value.

We might alternatively prune the network by removing (“cutting”) all edges below a specified weight, and then throwing away nodes that aren’t connected to other nodes. (For example, we might remove connections between companies that only share a single director, and then throw away companies that aren’t connected to any other companies. Which is to say, we cut out companies that don’t share two or more directors with any other single company. When you start working with graphs, you begin to realise quite how beautiful, and powerful, a way they are for working data elements that are related to each other in some way.)

Page 57: Mapping Corporate Networks With OpenCorporates

Page

Rank

Page 58: Mapping Corporate Networks With OpenCorporates

Here’s an example of the Shell corporate sprawl with the directors removed and edges connecting companies that share two or more directors. The labels are sized relative to the PageRank score of each node, which a measure of how well connected the node is in the graph (the “importance” of each node is dependent on the “importance” of the nodes connected to it….)

The lines also provide a background that highlights the connectivity - and structure – of the corporate elements.

Page 59: Mapping Corporate Networks With OpenCorporates

Betw

eenn

ess

Page 60: Mapping Corporate Networks With OpenCorporates

In this view, I have resized the labels based on the betweenness centrality of each node. This network statistic highlights nodes that play an important role in connecting clusters or groupings of nodes. So for example, we see the suggestion that The Consolidated Petroleum Company and Shell Mex and BP Limited may be the companies that connect the Shell sprawl to the BP one.

Page 61: Mapping Corporate Networks With OpenCorporates

Betw

eenn

ess

(rep

ositi

oned

)

Page 62: Mapping Corporate Networks With OpenCorporates

This is just a tweaking of the layout of the previous graph to try to highlight the separation of the different clusters.

Page 63: Mapping Corporate Networks With OpenCorporates

C3

C1C2

D1

D3D2

Com

pani

es &

Dire

ctor

s

Page 64: Mapping Corporate Networks With OpenCorporates

Just as we collapsed the network to show how companies could be linked directly by virtue of co-directorships, so we can collapse the network to show how directors are connected.

For example, director D1 is connected by a single shared company to directors D2 and D3, whereas D2 and D3 are connected by two companies.

Page 65: Mapping Corporate Networks With OpenCorporates

D1

D3D2

Co-D

irect

ors

Page 66: Mapping Corporate Networks With OpenCorporates

Once again, we use line thickness (that is, edge weight) to denote how heavily connected directors are.

Page 67: Mapping Corporate Networks With OpenCorporates

Page

Rank

Page 68: Mapping Corporate Networks With OpenCorporates

Here’s a view over connected directors in the the Shell corporate sprawl.

Page 69: Mapping Corporate Networks With OpenCorporates

OpenCorporates

Scraperwiki db

JSON

D3.js

Networkx

Gexf

Gephi sigma.js

Page 70: Mapping Corporate Networks With OpenCorporates

As to how we get those graphs plotted? I built a crude workflow in Scraperwiki that gets data out of the scraped database and into a form that allows it to be visualised using the Gephi desktop tool or in a web page using different Javascript libraries (sigma.js or d3.js).

Page 71: Mapping Corporate Networks With OpenCorporates
Page 72: Mapping Corporate Networks With OpenCorporates

This is Gephi – a cross-platform desktop tool that’s great for generating effective network visualisations. I have some tutorials and sample datasets if anyone wants to give it a whirl…

Page 73: Mapping Corporate Networks With OpenCorporates

“Where” Next…?

- geocode registered addresses- explore non-gb registered companies

Page 74: Mapping Corporate Networks With OpenCorporates

So where can we take the OpenCorporates data next?

I have a couple of ideas:

- we can go spatial in a geographical sense and start to geocode the registered addresses of companies, to see whether any of them are located in offshore tax havens, for example, or to see whether there are different registered addresses that might lead us to yet more companies (by virtue of sharing common registered office addresses, rather than co-directors, for example);- we could start trying to tie non-gb registered companies into the mix. At the moment, director information for other territories is sparse – might them be some other way we can look for connections?

Page 75: Mapping Corporate Networks With OpenCorporates

And “When”?- company timelines (set-up dates, renaming)- explore director timelines (by company)- explore director timelines (by directory)

Page 76: Mapping Corporate Networks With OpenCorporates

Another approach might be to start analysing corporate sprawls in a time dimension. There are several opportunities here:

- If we have access to company formation and dissolution dates, we can map out a timeiline of a corporate sprawl, which might reveal how companies change name, directorship or association with other companies;- if we get all the director information associated with a company, we can visualise how director appointments and terminations occurred across one or more companies, which might in turn reveal identifiable “features” that we might be able to associate with news or business restructuing events;- if we track down companies a particular director appears to be associated with, we can start to develop “career timelines” of directors, showing how they have been associated with different corporate groupings over time (and maybe the odd company on the side…)

Page 77: Mapping Corporate Networks With OpenCorporates

Linking out and in

- linking companies or directors with external datasets

Page 78: Mapping Corporate Networks With OpenCorporates

Whilst it is possible to generate insight from the analysis of data that is contained just within OpenCorporates, there are likely to be many opportunities for using OpenCroporates to annotate other datasets, or use external datasets to annotate OpenCorporates data

Page 79: Mapping Corporate Networks With OpenCorporates

Sank

ey F

low

Dia

gram

s

Page 80: Mapping Corporate Networks With OpenCorporates

As this example starts to explore, we might try to reconcile company names as recorded in local spending data records with corporate entities identified within in OpenCorporates to build up a better picture of how money flows into corporate sprawls.

On a lobbying front, we might look for mentions of meetings between government officials and and company officers, and then try to make mappings between government departments and operational areas of a corporate sprawl, and so on.

Page 81: Mapping Corporate Networks With OpenCorporates

What do you think?

Page 82: Mapping Corporate Networks With OpenCorporates

[ This is part of an ongoing informal exploration of the patterns and structures we can find across large open datasets.

For more information, follow:

- blog.ouseful.info- @psychemedia

All comments welcome. ]