The dynamical structure of political corruption networks Luiz GA Alves 1,* , Haroldo V Ribeiro 2 , Alvaro F Martins 2 , Ervin K Lenzi 3 , Matjaž Perc 4,5 1 University of São Paulo, 2 State University of Maringá, 3 State University of Ponta Grossa, 4 University of Maribor, 5 Complexity Science Hub Abstract Corruptive behaviour in politics limits economic growth, embezzles public funds and promotes socio- economic inequality in modern democracies. We analyse well-documented political corruption scandals in Brazil over the past 27 years, focusing on the dynamical structure of networks where two individuals are connected if they were involved in the same scandal. Our research reveals that corruption runs in small groups that rarely comprise more than eight people, in networks that have hubs and a modular structure that encompasses more than one corruption scandal. We observe abrupt changes in the size of the largest connected component and in the degree distribution, which are due to the coalescence of different modules when new scandals come to light or when governments change. We show further that the dynamical structure of political corruption networks can be used for successfully predicting partners in future scandals. We discuss the important role of network science in detecting and mitigating political corruption. 10 −2 10 −1 10 0 0 10 20 30 40 50 10 −2 10 −1 10 0 0 10 20 30 40 50 Caso Banespa (1987) Ferrovia Norte-Sul (1987) CPI da Corrupção (1988) Máfia da Previdência (1991) Caso Collor (1992) Anões do Orçamento (1993) Paubrasil (1993) Escândalo da Parabólica (1994) Pasta rosa (1995) Compra de votos para a reeleição (1997) Escândalo das privatizações (1997) Frangogate (1997) Precatórios (1997) Dossiê Cayman (1998) Grampos do BNDES (1998) Máfia dos fiscais (1998) Caso Marka/FonteCindam (1999) Desvios de Verbas do TRT- SP (1999) Garotinho e a turma do Chuvisco (2000) Sudam (2001) Violação do painel do Senado (2001) Bunker petista (2002) Caso Celso Daniel (2002) Caso Lunus (2002) CPI do Banestado (2003) Operação Anaconda (2003) Banpará (2004) Caso GTech (2004) Caso Waldomiro Diniz (2004) Superfaturamento de obras em SP (2004) Corrupção nos Correios (2005) Dólares na cueca (2005) Escândalo do Mensalão (2005) República de Ribeirão (2005) Valerioduto mineiro (2005) Aloprados (2006) Escândalo dos sanguessugas (2006) Quebra de sigilo do caseiro Francenildo (2006) Cheque da Gol (2007) Renangate Caso do laranjal alagoano (2007) Renangate Caso Mônica Veloso (2007) Renangate Caso Schincariol (2007) Renangate Golpe no INSS (2007) Cartões corporativos (2008) Caso Satiagraha (2008) Dossiê contra FHC e Ruth Cardoso (2008) Paulinho da Força e o BNDES (2008) Atos Secretos (2009) Caso Lina Vieira (2009) Mensalão do DEM (2009) Bancoop (2010) Caso Erenice (2010) Os novos aloprados (2010) Escândalo em Cidades (2011) Escândalo na Agricultura (2011) Escândalo no Esporte (2011) Escândalo nos Transportes (2011) Escândalo no Trabalho (2011) Escândalo no Turismo (2011) Caso Cachoeira (2012) Escândalo na Pesca (2012) Operação Porto Seguro (2012) Máfia do ISS (2013) Operação Lava Jato (2014) Petrolão (2014) 0 10 20 30 40 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 0 15 30 45 60 75 1986 1992 1998 2004 2010 2016 0 15 30 45 60 75 1986 1992 1998 2004 2010 2016 Number of people Cumulative distribution Number of people Year Lag (years) Number of people Autocorrelation Corruption case D C B A 7.51 ± 0.03 1.2 ± 0.4 Figure 1: Demography and evolving behaviour of corruption scandals in Brazil. A) The number of people involved in each corruption scandal in chronological order (from 1987 to 2014). B) Cumulative probability distribution (on a log-linear scale) of the number of people involved in each corruption scandal (red circles). C) Time series of the number of people involved in corruption scandals by year (red circles). The alternating gray shades indicate the term of each general election that took place in Brazil between 1987 and 2017. D) Autocorrelation function of the time series of the yearly number of people involved in scandals (red circles). Figure 2: Complex network representation of people involved in corruption scandals. Complex network of people involved in all corruption cases in our dataset (from 1987 to 2014). Each vertex represents a person and the edges among them occur when two individuals appear (at least once) in the same corruption scandal. Node sizes are proportional to their degrees and the colour code refers to the modular structure of the network. There are 27 significant modules, and 14 of them are within the giant component (indicated by the red dashed loop). Figure 3: Characterization of nodes based on the within-module degree (Z ) and participation coefficient (P ). Each dot in the Z -P plane corresponds to a person in the network and the different shaded regions indicate the different roles according to the network cartography proposed by Guimerà and Amaral. The majority of nodes (97.5%) are classified as ultraperipheral (R1) or peripheral (R2), and the remaining are non-hub connectors (R3, three nodes), provincial hubs (R5, three nodes), and connector hubs (R6, two nodes). 0 4 8 12 16 20 1986 1992 1998 2004 2010 2016 0 4 8 12 16 20 1986 1992 1998 2004 2010 2016 10 −3 10 −2 10 −1 10 0 0 20 40 60 80 100 10 −3 10 −2 10 −1 10 0 0 20 40 60 80 100 1991 2003 2014 17.6 7.2 2.4 Cumulative distribution 10 -3 10 -2 10 -1 10 0 0 1 2 3 4 5 10 -3 10 -2 10 -1 10 0 0 1 2 3 4 5 1988 1992 1996 2000 2004 2008 2012 Year 1 Cumulative distribution A B C Vertex degree Rescaled vertex degree Characteristic degree Year Dilma Lula FHC Itamar Collor Sarney PT PT PSDB PMDB PMDB PRN President Party 2.4 ± 0.1 5.9 ± 0.8 17.9 ± 0.9 Figure 4: The vertex degree distribution is exponential, invariant over time, and the characteristic degree exhibits abrupt changes over the years. 0.0 0.2 0.4 0.6 0.8 1.0 1986 1992 1998 2004 2010 2016 0.0 0.2 0.4 0.6 0.8 1.0 1986 1992 1998 2004 2010 2016 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 1986 1992 1998 2004 2010 2016 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 1986 1992 1998 2004 2010 2016 Size of the largest cluster Year Dilma Lula FHC Itamar Sarney PT PT PSDB PMDB PMDB PRN Year Dilma Lula FHC Itamar Sarney PT PT PSDB PMDB PMDB PRN President Party Growth rate of the largest cluster C B A Máfia da previdência Caso Collor Ferrovia Norte-Sul CPI da corrupção Grampo do BNDES Dossiê Cayman Máfia dos fiscais Dossiê Cayman Máfia dos fiscais CPI Banestado Grampo do BNDES Máfia da previdência Caso Marka Fonte Cindam Ferrovia Norte-Sul CPI da corrupção Desvios de verba do TRT Precatórios Frangogate Paubrasil Compra de votos Caso Banespa Anões do orçamento Escândalo da parabólica Pasta rosa Frangogate Paubrasil Superfaturamento de obras em SP Precatórios Escândalo da parabólica República de Ribeirão Valerioduto mineiro Corrupção nos correios Dólares na cueca Escândalo do Mensalão Caso Celso Daniel Operação anaconda Caso Waldomiro Diniz Caso GTech Banpará Sudam Pasta rosa Violação do painel do Senado Bunker petista Anões do orçamento Caso Banespa Garotinho e a turma do chuvisco Caso Lunus Compra de votos para reeleição Escâncalo das privatizações Escâncalo das privatizações New nodes Year: 1998 Year: 1997 Year: 2005 Year: 2004 Máfia da Previdência CPI da Corrupção Ferrovia Norte-Sul Caso Banespa Máfia da Previdência CPI da Corrupção Ferrovia Norte-Sul Caso Banespa Caso Collor Year: 1992 Year: 1991 Collor Collor Figure 5: Changes in the size of the largest component of the corruption network over time are caused by a coalescence of network modules. Fraction of correct predictions Predictor SimRank Rooted PageRank Resource allocation Jaccard Cosine Association strength Degree product Admic\Adar, common neighbours, HRV, and Katz Random 0.00 0.05 0.10 0.15 0.20 0.25 Figure 6: Predicting missing links between people in the corruption network may be useful for investigat- ing and mitigating political corruption. We tested the predictive power of eleven methods for predicting missing links in the corruption networks. These methods are based on local similarity measures (degree product, association strength, cosine, Jaccard, resource allocation, Adamic-Adar, and common neighbours), global (path- and random walk-based) similarity measures (rooted PageRank and SimRank), and on the hierarchical structure of networks (hierarchical random graph – HRV). To access the accuracy of these methods, we applied each algorithm to snapshots of the corruption network in a given year (excluding 2014), ranked the top-10 predictions, and verified whether these predictions appear in future snapshots of the network. The bar plot shows the fraction of correct predictions for each method. We also included the predictions of a random model where missing links are predicted by chance (error bars are 95% bootstrap confidence intervals). Reference • HV Ribeiro, LGA Alves, AF Martins, EK Lenzi, M Perc The dynamical structure of political corruption networks, Journal of Complex Networks CNY002, 1-15 (2018). * [email protected]