Top Banner
Forecasting Formal Employment in Cities Citation Lora, Eduardo. “Forecasting Formal Employment in Cities.” CID Research Fellow and Graduate Student Working Paper Series 2019.114, Harvard University, Cambridge, MA, May 2019. Published Version https://www.hks.harvard.edu/centers/cid/publications/fellow-graduate-student-working-papers Permanent link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37366838 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA Share Your Story The Harvard community has made this article openly available. Please share how this access benefits you. Submit a story . Accessibility
42

Forecasting Formal Employment in Cities Share Your Story

Mar 24, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Forecasting Formal Employment in Cities Share Your Story

Forecasting Formal Employment in Cities

CitationLora, Eduardo. “Forecasting Formal Employment in Cities.” CID Research Fellow and Graduate Student Working Paper Series 2019.114, Harvard University, Cambridge, MA, May 2019.

Published Versionhttps://www.hks.harvard.edu/centers/cid/publications/fellow-graduate-student-working-papers

Permanent linkhttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37366838

Terms of UseThis article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Share Your StoryThe Harvard community has made this article openly available.Please share how this access benefits you. Submit a story .

Accessibility

Page 2: Forecasting Formal Employment in Cities Share Your Story

Forecasting Formal Employment in Cities

Eduardo Lora

CID Research Fellow and Graduate Student Working Paper No. 114

May 2019

© Copyright 2019 Lora, Eduardo; and the President and Fellows of Harvard College

at Harvard University Center for International Development

Working Papers

Page 3: Forecasting Formal Employment in Cities Share Your Story

1

Forecastingformalemploymentincities

EduardoLora1RiSEandDepartmentofEconomics,UniversidadEafitand

CenterforInternationationalDevelopment,HarvardUniversityMay2019

AbstractCan“fullandproductiveemploymentforall”beachievedby2030asenvisagedbytheUnitedNationsSustainableDevelopmentGoals?Thispaperassessestheissueforthelargest62Colombiancitiesusingsocialsecurityadministrativerecordsbetween2008and2015,whichshowthatthelargerthecity,thehigheritsformaloccupationrate.Thisisexplaninedbythefactthatformalemploymentcreationisrestrictedbytheavailabilityofthediverseskillsneededincomplexsectors.Sinceskillaccumulationisagradualpath-dependentprocess,futureformalemploymentbycitycanbeforecastedusingeitherordinaryleastsquareregressionresultsormachinelearningalgorithms.Theresultsshowthattheshareofworkingpopulationinformalemploymentwillincreasebetween13andnearly32percentpointsbetween2015and2030,whichissubstantialbutstillinsufficienttoachievethegoal.Resultsarebroadlyconsistentacrossmethodsforthelargercities,butnotthesmallerones.Forthese,themachinelearningmethodprovidesnuancedforecastswhichmayhelpfurtherexplorationsintotherelationbetweencomplexityandformalemploymentatthecitylevel.

1CommentsbyMauricioQuiñonesareacknowleged.

Page 4: Forecasting Formal Employment in Cities Share Your Story

2

1.IntroductionUnitedNationsSustainableDevelopmentGoal8is“Promotesustained,inclusiveandsustainableeconomicgrowth,fullandproductiveemploymentanddecentworkforall”. More specifically, target 8.3 seeks to “[b]y 2030, achieve full and productiveemploymentanddecentwork forallwomenandmen, including foryoungpeopleand personswith disabilities, and equal pay forwork of equal value”. This paperassesses how achievable this target is for Colombia, based on a novel theory offormalemploymentcreationincitiesandtwocomplementaryforecastingmethods:standardregressionsandmachinelearning.

Cities are necessary for economic growth to take place through a process ofdiversification and innovation that leads to productive employment and decentwork for largersharesof thepopulation.However,urbanization isnotasufficientcondition for industrialization and productive employment: the expected relationbetweenurbanization, industrializationandemploymentqualityisabsentinmanypartsof theworld (Gollin, JedwabandVollrath,2016).Urbanizationpatterns,andnot just urbanization rates or macroeconomic factors (such as natural resourceabundance) may shed light on the role of cities in economic growth and formalemploymentcreationassuggestedbytwostrongstylizedfacts(O’Cleryetal2018):(1) formal occupation rates aremore variable across citieswithin countries thanacross developing countries (Figure 1), and (2) larger cities create proportionallymoreformalemployment(Figure2).

Figure1.Boxplotsforthedistributionofformaloccupationratesinasetof56developingcountries(leftplot)andcitiesinBrazil,Colombia,MexicoandtheUS.Weobservealargervarianceinformalityrates across cities within countries than across countries, suggesting that the study of thedeterminants of formality across cities is a relevant area of study in connection with SustainableDevelopmentGoal8(“fullandproductiveemploymentanddecentworkforall”).Source:O’Cleryetal(2018).

Page 5: Forecasting Formal Employment in Cities Share Your Story

3

Figure 2. Formality rates increasewithworking age population for cities across Brazil, Colombia,Mexico and the US: larger cities have disproportionatelymore workers in the formal sector thansmallercities,apatternthatisstatisticallysignificantinthefourcountriesasshownatthebottomofthefigures.Source:O’Cleryetal(2018).

2.TheoreticalframeworkOneofthecentralissuesineconomicdevelopmenttheoryisthereasonforthesizeandpersistenceofinformallaborindevelopingeconomies.Sinceformalfirmshaveaccess to capital and technology that make themmore productive than small orfamily businesses, what explains that large chunks of the labor force are notoccupiedintheformalsectorwherelaborconditionsarebetterthanintheinformalsector?Economic theoryhasprovidedseveralexplanations. Indualisticmodelsofinformality, the self-employed and their family businesses are fundamentallydifferent from formal firms in the type of human capital they use –mainlyuneducated and unproductive entrepreneurs and managers–, and in what theyproduce –mainly low-quality products for low-income customers. The formal andinformalsectorscoexistbecausetheyaredifferent(Lewis1954,HarrisandTodaro1970, Rauch 1991). An alternative view is that of De Soto (1989, 2000), whoconsiders that informal firms are an untapped reservoir of productive resourcesheld back by government regulations. Relatedly, Levy (2008) sees informal

Page 6: Forecasting Formal Employment in Cities Share Your Story

4

businesses as entrenched firms that survive in spite of their low productivity byavoidingtaxesandregulations.Lastly,inlaborsearchmodelsthattakeintoaccountthe costs and benefits of labor regulations, informal employment is not theconsequence of exclusion, but the result of labor market frictions betweenheterogeneousworkersandfirms(Albrecht,NavarroandVroman2009;BoschandMaloney2010;Ulyssea2010;Meghir,NaritaandRobin2015).While empirical evidence has been provided in support to each of theseexplanations of informality, none of them recognizes the two stylized factsmentioned in the introduction, namely that formal occupation rates across cities(within a given country) have a larger variance than across countries, and thatformal occupation rates are directly and significantly associatedwith city size. Inotherwords,noneofthemainstreamtheoriescanexplaintheroleofcitiesinformalemployment creation. Furthermore, some of the main variables put forward bythose theories to explain the presence of informality –such as social securityregimes and labor hiring and firing legislation—have little or no variance acrosscitieswithineachcountry.In view of these shortcomings, this paper adopts the theoretical frameworkdevelopedbyO’Cleryetal(2019),whichdiffersfromprevioustheoriesinanumberofways.First,itfocusesoncitiesratherthancountries,becausecitiesaretheactuallocationswhereworkers and their employers interact. Second, it emphasizes skilldiversity–whichiscentralinurbaneconomics—ratherthanskilllevels,educationalattainment or managerial capabilities. Third, it assumes that firms evolve bytinkering with skills because many feasible technologies cannot be known inadvance, but need to be discovered. Formal employment creation in cities resultsfrom this evolutionary process. In larger cities, firms have better access to thediverseskillstheyneedtoproducemoresophisticatedgoods.Themaincomponentsofthemodelcanbesummarizedasfollows(forthecompletemodelseeO’Cleryetal2018):Citysizeandskilldiversityaretakenasexogenous.Eachfirmislocatedinacity,butsellstothewholenationalmarketunderperfectcompetition.Theoutputoffirm𝑟,whichbelongstoindustry𝑗attime𝑡,isgivenbyaCESproductionfunctionwhoseonlyproductionfactorsaretypesoflabordifferentiatedbyskill,whereskills𝑘arehardtosubstituteforoneanother:

𝑦!! = 𝐴!! 𝑙!! 𝑘 !!"!! !

!! (1)

Optimalformallabordemandofeachskillisgivenbythesolutiontothecostminimizationproblemfacingthefirm,fromwhichwagebyskillisobtained:

𝑊!! = 𝑤!

!! ! ! 𝐶!

! (2)

Page 7: Forecasting Formal Employment in Cities Share Your Story

5

where𝑤!is thesurvival (orminimum)wageofworkerswithgeneralskills (whicharesuppliedinexcessofwhatfirmsdemand),andwhere𝐶!

! reflectsthediversityofskillsthatareneededinindustry𝑗,andcanthereforebeinterpretedasameasureofthecomplexityoftheindustry,

𝐶!! = 𝜃! 𝑘

!! ! !! ! ! (3)

Since𝜃! 𝑘 ≥ 1foreveryskill,industrycomplexityislargerinindustrieswhichcombinealargersubsetofsophisticatedskills.Finally, firms transition from less to more sophisticated industries following aprobabilisticrulesuchthattheconditionalexpectedvalueofafirm´scomplexityoneperiodaheadis:

𝐸! 𝐶!!!! 𝑗! 𝑟 = 𝑗 = 𝑝 𝐶!

! + 1− 𝑝 𝛽! 𝑗, 𝑗!!!∈! 𝐶!!!

!"#$%&'()* !"#$%#&'(

(4)

Duetothedefinitionof𝛽! 𝑗, 𝑗! ,thecomplexitypotentialofagivenfirmdependson(i)theindustrytowhichthefirmcurrentlybelongs,(ii)thedistancebetweensuchindustry and those industries to which the firm could migrate, (iii) the relativeabundanceinthelocallabormarketofthosenewskillswhichthefirmmusthireinordertocarryoutanindustrytransition,and(iv)thesizeoflaborforceincity.

Totalformalemploymentatthecityleveloneperiodahead(𝐹!!!)isthusobtainedbyaggregatinglabordemandacrossskillsandfirms:

𝐹!!! = 𝑙!!!! 𝑘 ∗!∈!!!! !! = !!!!

!

!!!!! 𝐶!!!!

! !! 𝜃!!! 𝑘

!! ! !

!∈!!!! !! (5)

Aggregateformalemploymentinacitydependsoncurrentcomplexityofallitsfirms,whichinturndependsonpastcomplexitypotential(equation4).Therefore,formalemploymentinperiodt+1isafunctionofcomplexitypotentialinperiodt:

𝐹!!! = 𝑓 𝛽! 𝑗, 𝑗!!!∈! 𝐶!!!

!"#$%&'()* !"#$%#&'(

(6)

Noticethatcomplexitypotentialisaweightedaverageoftheindustrycomplexityofthemissingsectorswithweightsgivenbytheskillsimilaritybetweenthosesectorsandtheonesalreadypresent.Inorder tooperationalize equation (6), dataareneededon industrycomplexity,missingsectors,andskillsimilaritybetweenallpairsof industries.Sinceskillsaretacit knowledge and therefore unobservable, industry complexity and complexity

Page 8: Forecasting Formal Employment in Cities Share Your Story

6

potentialmustbecomputedindirectly.Tothatend,O’Cleryetal(2019)makeuseofthe methodologies developed by Hidalgo and Hausmann (2009) and Neffke andHenning(2013). Inessence, industrycomplexity isameasureof therangeofskillsneededinanindustry,whichisobtainedfromthenumberofindustriespresentinthecitiesthathavetheindustry(iethoseindustriesthathaverevealedcomparativeadvantage greater than 1 in city, based on formal employment shares) and thenumber of cities that have the industry (ie those cities where the industry hasrevealed comparative advantage greater than1).Skill similarity between apair ofindustriesismeasuredbytherelativeintensityofthelaborflowsbetweenthetwoindustries, and missing industries in city are those with revealed comparativeadvantage lower than 1 (Appendix 1 provides further details on computationmethods).

3. DataandempiricaldefinitionsLike in O’Clery et al (2019), I use data for Colombian cities larger than 50,000inhabitants.MydefinitionofcitiesrestsonthemethodologyproposedbyDuranton(2015)todefinemetropolitanareas.Itconsistsofaddingiterativelyamunicipalitytoametropolitanareaif thereisashareofworkers,aboveagiventhreshold,thatcommute from themunicipality to themetropolitan area. Assuming a 10 percentthreshold,themethodologygenerates19metropolitanareasthatconsistoftwoormore municipalities (comprising a total of 115 municipalities). Since another 43individualmunicipalities havepopulations above50,000 inhabitants, a total of 62citiesisobtained.

The main data source for the 62 cities is the social security administrative datacollected by the Health and Social Security Ministry, known as PILA (PlanillaIntegradadeLiquidaciónLaboral). PILA contains informationbyworker and firmon days of work, sector of activity and municipality.2To aggregate these data, Icounttheshareof theyeart thateachworkereffectivelycontributedtothesocialsecuritysystemthrough firmspercitycper industry j (𝑒𝑚𝑝!,!).This is the formalemployment for a given sector (or for the aggregate of all sectors within a city).Sectors are defined at the 4-digit industry level of the International StandardIndustrialClassification(ISIC,revision3.0).

Theformalemploymentrateincitycinyeart(𝐹!,!)isdefinedasformalemploymentdivided by the city-wide population 15 years old or older (𝑝𝑜𝑝!,! , estimated byDANE):

𝐹!,! = 𝑒𝑚𝑝!,! 𝑝𝑜𝑝!,! (7)

2The datasets have information on age and gender, which we do not use. Unfortunately, it provides no information on education, which prevents us from testing our model predictions vis-à-vis the findings of previous works discussed in the introduction.

Page 9: Forecasting Formal Employment in Cities Share Your Story

7

The (simple) average formal occupation rate in cities was only 20.3 percent ofworkingagepopulationin2015,witharelativelylargestandarddeviation(between11.1percentpoints).Importantchangesinurbanformaloccupationratesoccurredbetween2008and2015:theaggregateformaloccupationrateforthe62citieswentupfrom21.1percentto31.2percent,witha(simple)averageincreaseacrosscitiesof8.1percentpointsandastandarddeviationof5.4points.Formaloccupationwasfacilitated by a rate of GDP growth of 4.1 percent, and probably also by theeliminationinMayof2013ofpayrolltaxesrepresenting5percentofthewagebill(Kugler,KuglerandHerrera-Prada2017).

Sincetheformalemploymentrateisavariableboundedbetween0and1,andtheaimistoassesshowfastitapproaches1,itistransformedtoitslogisticform,time-differentiatedandexpressedinannualterms:

𝑦!,!!! =!!!!

!!!,!

!!!!!,!− !!!,!!!

!!!!!,!!! (8)

where𝑦!,!!! willbethedependendvariableandthesubscriptiistheyear-intervalornumberofyearsforthetime-differentiation(whichmaytakevaluesbetween1and7,giventhatthedatacoveran8-yearspan).Forintuition’ssake,Iwillrefertothedependentvariableasthe“annualspeedtowardsfullemployment”,or“speed”,forshort.Theindependentvariables(attimet-i)willbecomplexitypotential,𝐶𝑃!,!!! asexplainedabove,the(logof)workingagepopulation,𝑙𝑝𝑜𝑝!,!!! ,thelogisticofformal

occupationrate, !!!,!!!

!!!!!,!!!,adummyfortheoil-producingcities(thosewithmorethan

oneoilwellper10,000inhabitants:Acacías,Arauca,Barrancabermeja,NeivaandYopal)andasyntheticmeasureoftheexogenoussectoralshocksbycityc(followingMcGuireandBartik1991,theso-calledBartikshockmeasureforcitycattimetisaweightedaverageoftheratesofchangebetweent-iandtofformalemploymentbysectoratthenationallevel,excludingcityc,withweightsequaltotheemploymentshareofeachsectorincitycinyeart-i).3Twoforecastingmethodswillbeusedinacomplementaryway.Thefirstonewillbebasedonordinaryleastsquareregressionsforallthepossibletimefrequenciesoftheyearlydatabetween2008and2015.Afterdiscussingthelackofconsistencyofsomeofthecoefficcients,tworegressionsarechosentoforecastthedependentvariablebycityandcomputetheformalityratesbycityin2030.Thesecondmethod,furtherexplainedinsection5,willbeamachinelearningtechniqueknownas“randomforest”,bywhichasetofalternativeresultsarepredictedbasedoncombinationsofexplanatoryvariablespresumedlyassociatedwiththeresults(inan

3InO’Cleryetal(2019)themeasureofcomplexitypotentialdependsonworkingagepopulation,whilehereIamtakingthelatterasaseparateexplanatoryvariable.Inthisway,therelationbetweenbothvariablescanbeexploredinthemachinelearningexercises.

Page 10: Forecasting Formal Employment in Cities Share Your Story

8

unknownnon-linearfashion).Thetwomethodsarecomplementarybecause,whileOLSprovideslightonthepossibleinfluenceofeachindividualvariable,itspredictionscanonlybereliableifthecoefficientscanbeconsistentlyestimatedandtherelationbetweenthedependentandtheindependentvariables(orcombinationsthereof)islinearandknowninadvance.Theselimitationsdonotapplytomachinelearningtechniques,whichareintendedtoproducereliablepredictionsusingprobabilisticmethodsthatmakeefficientuseofallthedatathatmayberelevant.

4. Regression-basedforecastsTable1isasummaryoftheregressions.Onlythe7-year(iefull2008-2015)and1-yearintervalregressionsarepresented(seeAppendix2foralltheintervals).Intheupperpanelthe62observationscorrespondtothenumberofcitiesbecausethereisonlyoneperiod.Inthetwootherpanels,thenumberofobservationsis434,sincethereare7one-yearperiods(434=62x7).

Page 11: Forecasting Formal Employment in Cities Share Your Story

9

Table1.Regressionsofspeedtowardsfullformalemploymentoncomplexitypotentialandothercontrols

(Pooledordinaryleastsquaresfordifferentintervals,withyeardummies)

Full7-yearperiod Coefficient Standarderror

tstatistic P>|t|

Complexitypotentialatt-7(log) 0.003043 0.0007914 3.85 0Workingagepopulationatt-7(log) -0.0006131 0.0003166 -1.94 0.058Formalityrateatt-7(logistic) 0.1132962 0.046996 2.41 0.019Oilproducingcity 0.0037701 0.0007497 5.03 0Bartikshockbetweent-7andt -0.0419715 0.0237082 -1.77 0.082Constant -0.0388139 0.0235932 -1.65 0.106

Numberofobs=62 AdjR-squared=0.5891

1-yearintervals(fullspecification) Coefficient Standarderror

tstatistic P>|t|

Complexitypotentialatt-1(log) 0.0033963 0.0006686 5.08 0Workingagepopulationatt-1(log) -0.0006598 0.0002322 -2.84 0.005Formalityrateatt-1(logistic) -0.0272684 0.0122967 -2.22 0.027Oilproducingcity 0.0016853 0.0005864 2.87 0.004Bartikshockbetweent-1andt 0.2048173 0.0303162 6.76 0Constant 0.0329708 0.0071898 4.59 0

Yeardummies F(6,422)= 5.841 0Numberofobs=434

AdjustedR-squared=0.5020

1-yearintervals(simplifiedspecification) Coefficient Standarderror

tstatistic P>|t|

Complexitypotentialatt-1(log) 0.0030968 0.0003987 7.77 0Oilproducingcity 0.0036794 0.0005224 7.04 0Constant 0.0118205 0.0011629 10.16 0Yeardummies F(6,422)= 36.571 0

Numberofobs=434 AdjustedR-squared=0.4331

Theinterpretationofthecoefficientsisnotstraighforwardbecauseofthewaythedependentvariableisdefined.However,itisclearthatalthoughalltheexplanatoryvariablesaresignificantlyassociatedwiththespeedtowardsfullemployment,somechangesignbetweenthe7-yearandthe1-yearfullspecification(upperandmiddlepanels).Thissuggeststhattheirrelationwiththedependentvariableisnotadequatelycaptured:theremaybeimportantinteractionsbetweentheexplanatory

Page 12: Forecasting Formal Employment in Cities Share Your Story

10

variablesordynamicissuesthatareignoredinthespecificationadopted.Sinceboththenumberofcitiesandthenumberofperiodsaresmall,notmuchcanbedonetoovercometheseproblemswithstandardeconometrics.Aswewillsee,machinelearningtechniquesareabletodealwiththeselimitations.Thelowerpanelshowsasimplifiedversionofthe1-yearintervalregression,whichonlyincludestheexplanatoryvariablesthataresignificantlyandconsistentlydirectlyorinverselyassociatedwiththedependentvariable.Thosearejustcomplexitypotentialandthedummyvariableforoil-producingcities.Iusethecoefficientsofthemiddle-andlower-panelregressionstoforecastformalemploymentin2030,withthefollowingadditionalassumptionsandmethods:

• Complexitypotentialbycityisassumedconstantatthe2015values• Workingagepopulationbycityisprojectedatthesamegrowthrate

observedbetween2008and2015• Formalityrateatt-1(logistic)bycityiscalculatedrecursivelywiththe

forecastofthedependentvariableforthepreviousyear• Oilproducingcitydummyiskeptunchangedthroughouttheforecastperiod• Bartikshockbycityisassumedconstantatthemeanoftheyearlydatafor

2008-2015.TheresultsappearinFigures3-5(andAppendices2and3).Abriefsummaryisinorder.Figure3showsthatformalityrateswillincreasethrougoutthewholesampleofcitiesandforecastoptions:allcitieswilladvancetowardsthefull-formalemploymenttarget.However,itisunclearwhetherformalityrateswilltendtoconverge.Inthefullspecification,formalityratestendtoconverge(becauseallincreasebyaboutthesame),butinthesimplifiedspecificationtheytendtodiverge(increasesareproportionaltotheinitialvalues).Also,withthefullspecification,formalemploymentratesinmanycitieswillbeabove0.6,andeven0.8in2030,suggestingthat“fullandproductiveemploymentanddecentworkforallwomenandmen”maybewithinreach.Butinthesimplifiedspecification,onlyahandfulofcitieswillgetthathigh.

Page 13: Forecasting Formal Employment in Cities Share Your Story

11

Figure3.Regression-basedforecastsofformalemploymentbycityshowthatallcitieswouldmovetowardsfullemployment.Inthesimplifiedspecification,formalemploymentratestendtodiverge.

Figure4.Regression-basedforecastsofformalemploymentgrowthratesbycityshowsubstantiallymoredispersioningrowthratesandbetweenthetwospecificationsamongthesmallercities.

0.2

.4.6

.81

Form

ality

rate

in 2

030

0 .2 .4 .6Formality rate in 2015

Full specification Simplified specification

Figure 3. Formality rates forecasts by city

Medellín Met

Rionegro Met Barranquilla Met

Bogotá MetCartagena Met

Tunja MetDuitama MetSogamoso Met

Manizales Met

Girardot MetVillavicencio Met

Pasto Met

Ipiales Met

Cúcuta MetArmenia Met

Pereira MetBucaramanga MetCali Met

Tuluá Met

0.0

5.1

.15

.2.2

5An

nual

rate

of f

orm

al e

mpl

oym

ent g

row

th 2

015-

2030

11 12 13 14 15 16Population in 2015 (log)

Full specification Simplified specification

and city sizeFigure 4. Projected formal employment growth rates

Page 14: Forecasting Formal Employment in Cities Share Your Story

12

Figure4makesclearthatthedifferencesbetweenthetwoforecastsarestronglyrelatedtocitysize:whileforthesmallercitiestheratesofemploymentgrowthcandifferbymorethan10percentpoints,forthelargestcitiesthedifferencesarenegligible(thefigureshowsthenamesofthemulti-municipalitycitiesonly,mostofwhicharealsothelargestcities).

Figure5.Regression-basedforecastsofformalemploymentgrowthratesbycityshowhighdispersionamongcitieswhoseinitialcomplexitypotentialislow.Althoughthetheoreticalframeworkemphasizestheimportanceofcomplexitypotential,itmaynotbetheuniquefactorinfluencingtheforecasts,assuggestedbyFigure3:withthefullspecification,thatincludesothervariables,manyofthelow-complexitycitiesshowhighformalemploymentrates,whichisnotapparentinthesimplifiedspecification.Inthelatter,thefastestgrowingcitieshavemediumlevelsofinitialcomplexitypotential.Toconcludethepresentationoftheregression-basedforecasts,Table3showstheaggregatesofthemostrelevantresults.In2015,theformalemploymentrateintheurbanareaswas34percentofthepopulationinworkingage,andtheaverageacrosscities22percent.Rememberthatourdefinitionofformalemploymenttakesintoaccounttheactualnumberofweeksofworkofeveryemployee.Fromthisbasis,theformalemploymentratewillprobablyreachbetween63and66percentin

Medellín Met

Rionegro MetBarranquilla Met

Bogotá MetCartagena Met

Tunja MetDuitama MetSogamoso Met

Manizales Met

Girardot MetVillavicencio Met

Pasto Met

Ipiales Met

Cúcuta MetArmenia Met

Pereira MetBucaramanga MetCali Met

Tuluá Met

0.0

5.1

.15

.2.2

5An

nual

rate

of f

orm

al e

mpl

oym

ent g

row

th 2

015-

2030

-3.5 -3 -2.5 -2Complexity potential in 2015 (logs)

Full specification Simplified specification

and initial complexity potentialFigure 5. Projected formal employment growth rates

Page 15: Forecasting Formal Employment in Cities Share Your Story

13

2030,andthesimpleaveragewillbebetween43and59percent,dependingontheregressionspecificationonwhichtheforecastsarebased.Whileformalemploymentinthe62citiesgrew8percentperyearbetween2008and2015(or10.5%onaverage),itwillprobablyslowdowntoarateofgrowthofabout6percentinthefuture(orbetween7and10percentonaverage).Thisisduetothefactthatthelargestcitieswillseemoremodestratesofformalemploymentgrowth.Theseresultssuggestthatthechoiceofspecificationdoesnotmakemuchofadifferenceforthe(weighted)aggregateofthe62cities,thisiscertainlynotthecaseforthesimpleaveragesorfortheindividualcities,aswehaveseen.Thatiswherethemachinelearningtechniquesmaybemoreadequate.

Table3.Regression-basedforecastsfortheaggregateofthe62cities

Current

Projected(2030)

Fullspecification

Simplifiedspecification

Formalemployment

rate

Weightedaverage 34.3% 66.1% 62.5%

Simpleaverage 22.0% 59.0% 43.0%

Formalemploymentgrowthrate

Weightedaverage 7.7% 6.3% 5.9%

Simpleaverage 10.5% 10.0% 6.8%

5. MachinelearningforecastsMachinelearningisatypeorartificalintelligenceusedtopredictoutcomesfrominputdatawithoutexplicitlyspecifyingtherelationbetweentheoutcomesandtheinputdata.Thealgorithmsusedinmachinelearningareabletodiscoverthepatternsinthedatathatbestfittheoutcomes,withoutanytheoryormodelthatrelatestheoutcomesandtheinputs.Iwillusethemachinelearningtechniqueknownas“randomforest”,whichistypicallyappliedtopredictingcategoriesofanoutcomeusingrandomsubsetsofthedatatorandomlyconstructeddecisiontrees.Adecisiontreeissimplyastepbystepprocesstodecideacategorysomethingbelongsto.Itshouldbenotedthattherearetwotypesofrandomnessinrandomforests.Oneistherandomselectionofthedataineachsubsetandtheotheristherandombranchingorsplittingoftheinputsinthesubset.Thetwotypesofrandomnessare

Page 16: Forecasting Formal Employment in Cities Share Your Story

14

waystopreventoverfittinganddeterminehowreliablethepredictionsare(foranintuitiveintroductiontorandomforestsseeHartshorn,2016).Severaldecisionsmustbemadetoapplytherandomforesttechnique.Basically:

• Outcomecategoriesmustbedefined.Inourcase,theoutcomeisthedependentvariabledefinedinequation(9)andthecategorieswillbeitsquartiles.SinceIusethe434observationsofthe1-yearintervals(asinthemiddleandlowerpanelregressionsinTable2),eachquartilecontains108or109observations.Theprogram’sobjectivewillbetopredictthecategorytowhicheachobservationbelongs.

• Inputdatamustbeselected:Iwillusethesamesetofexplanatoryvariablesinthe“fullspecification”(listedinthemiddlepanelofTable2).SinceIwanttomakepredictionsoftheoutcomecategoriesfor2030,Ialsoincludetheinputdataforthatyear(thesameusedintheregression-basedforecasts).

• Inputdatacategories:althoughitisnotstrictlynecessaryto“discretize”theinputdata,itimprovesthereliabilityoftheresultswhenthenumberofobservationsissmall,asinourcase.Ihaveconstructeddecilesofeachvariableforthe434observationsbetween2008and2015,exceptthedummyforoilproducingcities.Ithenappliedthecategorizationcriteriatothe62observationsofthe2030inputdata.

• Numberoftrees,orsimulations:1000.• Other:althoughmanyfeaturesoftheprogrammaybemodified,Ihaveused

thedefaultoptionsintheStataprogramforrandomforests.ThepredictionscoresaresummarizedinTable4.The“successrate”forthewholesampleis78percent,meaningbythatthepercentofoutcomespredictedinthecorrectoutcomecategory(listedinthefirstcolumn).Thesuccessratesofeachofthecategoriesrangebetween86percentforcategory1(slowestspeedofformalemploymentchange)and72percentforcategory4(fastest).Keepinmindthat,sincetherearefourcategories,theexpectedsuccessrateofacompletelyrandompredictionwouldbe25percentineachcategory(andthereforeinthetotalaswell).Thesuccessrateshouldnotbeconfusedwiththeprobabilitythatthecategorypredictedforanindividualoutcomeisthecorrectone.Sinceeachofthe434individualoutcomeswillenterinmanyofthesimulations(moreexactly63.2percentofthesimulations,seeHartshorn,2016),theprogramcomputesthepercentofthosecasesinwhichithasmadethecorrectprediction.ThelastcolumnofTable4showsthat,onaverage,thatprobabilityis44percent(andverysimilarforeachofthecategories).

Page 17: Forecasting Formal Employment in Cities Share Your Story

15

Table4.Scoresummaryofmachinelearningpredictions

Annualspeedtowardsfullemploymentcategory

Falselypredicted

Correctlypredicted

Totalnumberof

cases

Successrate

Meanprobabilityofthecorrect

predictions

1=Lessthan0.05pp 15 94 109 86% 46%

2=Between0.05and0.28pp 28 80 108 74% 40%

3=Between0.28and0.54pp 24 85 109 78% 42%

4=Morethan0.54pp 30 78 108 72% 48%

Total 97 337 434 78% 44%

Table5presentsasummaryofpredictionscoresforaselectionofcities(allofthemmulti-municipalitycities).Forthreeofthose,randomforestpredictscorrectlythespeedcategoryeveryyearbetween2008and2015.Althoughtheprobabilityofeachofthoseindividualeventsismoderate(again,around44percent),theconsistencyofthepredictionsuggests,forinstance,thatitishighlyreliablethatBarranquillaandRionegrobelongtospeedcategory3,whileIpialesbelongstospeedcategory1.AtthebottomofthetableisBogotá,withonlythreecorrectpredictionsthatitbelongstocategory4(thefastest).

Table5.Scoreofpastformalemploymentchangepredictionsbymachinelearning,selectedcities

City

Numberofcorrect

predictions2008-2015(outof7)

Mediangrowthgroup

predicted2008-2015

Meanprobabilityofbelongingtogrowthgroup2008-2015

BarranquillaMet 7 3 48%RionegroMet 7 3 44%IpialesMet 7 1 43%VillavicencioMet 6 4 47%CúcutaMet 6 3 47%ArmeniaMet 6 3 41%PereiraMet 5 4 49%TunjaMet 5 4 45%DuitamaMet 5 3 45%SogamosoMet 5 3 40%GirardotMet 5 2 39%TuluáMet 5 1 38%CartagenaMet 4 3.5 51%ManizalesMet 4 4 49%

Page 18: Forecasting Formal Employment in Cities Share Your Story

16

MedellínMet 4 4 48%CaliMet 4 3.5 44%BucaramangaMet 4 4 43%BogotáMet 3 4 45%

Theobjectiveoftheexerciseistoforecastthespeedcategoryofeachcityinthefuture.AsummaryoftheresultsforthesameselectionofcitiesispresentedinTable6.

Table6.Futureformalemploymentchangegrouppredictedbymachinelearning

(groupsofformalemploymentratechange:1=Lessthan0.05pp2=Between0.05and0.28pp3=Between0.28and0.54pp

4=Morethan0.54pp)

City Growthgrouppredicted Probabilityofbelongingtogroup

ManizalesMet 4 55%PereiraMet 4 55%TunjaMet 4 51%MedellínMet 4 50%BogotáMet 4 48%CaliMet 4 45%BucaramangaMet 4 43%VillavicencioMet 4 42%ArmeniaMet 4 39%RionegroMet 4 37%CúcutaMet 3 59%BarranquillaMet 3 51%SogamosoMet 3 40%TuluáMet 3 38%CartagenaMet 3 36%DuitamaMet 2 44%GirardotMet 2 31%IpialesMet 1 41%

Mostofthelargecitiesbelongtothefastestcategoryofformalemploymentgrowthinthefuture,whichinmanycasesdifferfromthepast,aswewillseebelow.Theprobabilityofthateventisrelativelyhighforsomeofthosecities.Onlythreeofthemulti-municipalitycitiesareclassifiedintheslowercategories.Appendix6,whichpresentsthecompletelistofcities,showsthat18citiesareclassifiedintheslowestcategory,andinsomecaseswithhighprobabilities.Mostofthosearesmallcities.

Page 19: Forecasting Formal Employment in Cities Share Your Story

17

Howdifferentarethesemachinelearningforecastsfromtheregression-basedonesandthepastrecordsofthecitiespresentedintheprevioussection?Table7focusesagaininthesameselectionofcities,andcompleteresultscanbeseeninAppendix7.Asthelastcolumnofthetableindicates,inonlyahandfulofthecities(Tunja,Manizales,VillavicencioandPereira),dothethreeclassificationscoincide.Thisstronglysuggeststhatthecitiesbelongtothefastestgroup,wheretheyareconsistentlyclassified.Themachine-learningbasedforecastsarelessoptimisticthantheonesbasedonthesimplifiedregression(ortheonesbasedinthefullspecificationregression,whichareallcategory4andnotincludedintable),butmoreoptimisticofwhatasimpleextrapolationofthepastwouldsuggest.

Table7.Comparisonofregressionandmachine-learningpredictionsoffutureformalemploymentchange

(groupsofformalemploymentratechange:1=Lessthan0.05pp2=Between0.05and0.28pp3=Between0.28and0.54pp

4=Morethan0.54pp)

2008-2015

median

Regression-based

(simplifiedspecification)

Machine-learningbased

Numberofsame

categoriesCity

TunjaMet 4 4 4 3

ManizalesMet 4 4 4 3

VillavicencioMet 4 4 4 3

PereiraMet 4 4 4 3

MedellínMet 3 4 4 1

RionegroMet 3 4 4 1

BogotáMet 3 4 4 1

ArmeniaMet 3 4 4 1

BucaramangaMet 3 4 4 1

CaliMet 3 4 4 1

BarranquillaMet 3 4 3 1

CartagenaMet 3 4 3 1

SogamosoMet 3 4 3 1

PastoMet 3 4 3 1

CúcutaMet 3 4 3 1

TuluáMet 1 4 3 0

GirardotMet 3 3 2 1

Pamplona 2 3 2 1

DuitamaMet 3 4 2 0

IpialesMet 1 3 1 1Averagesandpercentsame 3.0 3.9 3.3 14%

Page 20: Forecasting Formal Employment in Cities Share Your Story

18

Inordertocomparetheforecastsfor2030bythedifferentmethods,thecategorypredictionsbymachinelearningmustbeconvertedintoformalemploymentgrowthratesandthenextrapolatedto2030.Tothatend,Iassumethatthevalueofthedependantvariable(speed)ineachcategoryexactlycorrrespondstothemedianofthecategory,whichIthenusetomakethecalculations.Figure6comparetheforecastsbythethreemethodsoftheformalityratesin2030.Noticethatthemachinelearningforecastsformfourstraightlines:eachoneofthemcorrespondstoaspeedcategory,giventhanIhaveusedthesamespeedforallthecitiesineachcategory.Asalreadymentioned,themachinelearningpredictionsarelessoptimisticthantheregression-basedones.Furthermore,forthecitiesclassifiedincategory1(slowestspeed),formalityrateswillnotchange,accordingtothemachine-learningforecast.Althoughmostofthesecitiesinitiallyhavelowformalityrates,twoofthemhaveinitialformalityratesabouttheaverage(BarrancabermejaandBuga)andoneofthemstartsfromaveryhighformalityrate(Yopal).

Figure6.Machine-learningbasedforecastsofformalemploymentratesarelowerandlessdifferentiatedbycitythanthosebasedonregressions.

0.2

.4.6

.81

Form

ality

rate

in 2

030

0 .2 .4 .6Formality rate in 2015

Machine learning Simplified regression Full regression

Regression and machine learning basedFigure 6. Formality rate forecasts by city

Page 21: Forecasting Formal Employment in Cities Share Your Story

19

Figure7.Machine-learningbasedforecastsofformalemploymentgrowthratesarelowerthanthosebasedonregressions,especiallyformanyofthesmallercities.

Figure7.Machine-learningbasedforecastsofformalemploymentgrowthratesaremuchlessdispersethanthosebasedonregressions,especiallyformanyofthesmallercities.

-.05

0.0

5.1

.15

.2An

nual

rate

of f

orm

al e

mpl

oym

ent g

row

th 2

015-

2030

11 12 13 14 15 16Population in 2015 (log)

Machine learning Full regression Simplified regression

(regression and machine learning forecasts)Figure 7. Projected formal employment growth rates and city size

-.05

0.0

5.1

.15

.2An

nual

rate

of f

orm

al e

mpl

oym

ent g

row

th 2

015-

2030

-3.5 -3 -2.5 -2Complexity potential in 2015 (log)

Machine learning Full regression Simplified regression

(regression and machine learning forecasts)Figure 8. Formal employment growth rates and complexity

Page 22: Forecasting Formal Employment in Cities Share Your Story

20

Figure7showsthattheformalemploymentgrowthratesofthethreemethodsaresimilarforthelargestcitiesbuttendtodivergeforsmallercities.Thesamepatternholdsinrelationtoinitialcomplexitypotential.Finally,toconcludethepresentationoftheresults,Table8,comparestheaggregatesofthe62citiesfromthethreemethods.Theformalemploymentratefortheaggregate,currently34.3percent,mayreachbetween47.9percentand66.1percent,dependingontheforecastmethod(andthesimpleaveragebetween29.1and59.4percent,startingfrom22percent).Whileintheperiod2008-2015,totalformalemploymentinthe62citiesgrew7.7percentperannum,itmaybeexpectedtogrowinthefuturebetween4andand6.3percent(simpleaveragebetween2.9and10percent,comparedwith10.5percentintherecentpast).

Table8.Forecastssummaryfortheaggregateofthe62cities

Current

Projected(2030)

Regression-based,fullspecification

Regressionbased,

simplifiedspecification

Machineleaningbased

Formalemploymentrate

Weightedaverage 34.3% 66.1% 62.5% 47.9%

Simpleaverage 22.0% 59.4% 43.0% 29.1%

Formalemploymentgrowthrate

Weightedaverage 7.7% 6.3% 5.9% 4.0%

Simpleaverage 10.5% 10.0% 6.8% 2.9%

6. Discussion

Inordertoassesstheseresults,itmustberecalledthatthedefinitionofformalemploymentusedinthispaperisnottheshareoftheoccupiedthathadsomeformalemploymentorsocialsecurityinthereferenceperiod.WiththeformalemploymentcriterionusedbyDANE(employeesinestablishmentsofmorethan5workers)anda3-month(rolling)referenceperiod,theformalityratein2015inthe23largestcitiesandtheirmetropolitanareaswas50.7percent.Withthesocialsecuritycriterion,itwaseither64.6or46.8percent,dependingonwhethersocialsecurity

Page 23: Forecasting Formal Employment in Cities Share Your Story

21

affiliationreferstohealthorpensions.Inanyofthesedefinitions,thereisonlyonemarginthroughwhichtheformalityratemayincrease,whichisthestatus(eitherformalorinformal)oftheoccupied.Inmydefinition,therearefourmargins,ascanbeseeninthisexpression,whichisanexpansionofequation(7):𝐹!,! =

!"#!,!!"!!,!

= !"#!,! !"#$%#&!,!

∗ !"#$%#&!,!!""#$%&'!,!

∗ !""#$%&'!,!!"#$%&$%'(!,!

∗ !"#$%&$%'(!,!!"!!,!

(9)𝐹!,! =

!"#!,!!"!!,!

= 𝑤𝑜𝑟𝑘 𝑖𝑛𝑡𝑒𝑠𝑖𝑡𝑦 𝑟𝑎𝑡𝑒!,! ∗ 𝑜𝑓𝑓𝑖𝑐𝑖𝑎𝑙 𝑓𝑜𝑟𝑚𝑎𝑙𝑖𝑡𝑦 𝑟𝑎𝑡𝑒!,! ∗ 1 − 𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑒!,! ∗

𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒!,! (10)wheretheworkintensityrateistheshareoftheyeartthatworkersonaverageeffectivelycontributetothesocialsecuritysystem,givenmydefinitionof𝑒𝑚𝑝!,! .Myformalemploymentrateandtheofficialformalityratewouldmoveproportionallyaslongasthethreeothermarginsremainunchanged.Ifso,theofficialformalityratewouldgoupfromarangebetween46.8and64.6percent,aswehavejustseen,toarangebetween65.3percentand90.2percentinthemachine-learningbasedforecast.Butthisconclusionisunwarrantedbecause,althoughIhavenotexplicitlymodelledthethreeothermargins(ietheworkintensity,theunemploymentandtheparticipationmargins)theyareimplicitlyconsideredintheforecastsanditwouldnotbereasonabletoexpectsubstantialincreasesintheofficialformalityratewithoutincreasesintheotherrates.Asarguedbefore,theofficialdefinitionsof(in)formalityarenotadequatetoassessthefeasibilityofthesustainabledevelopmentgoalof“fullandproductiveemploymentanddecentworkforallwomenandmen”.Mydefinitionismuchbettersuitedtothisend.Beingso,itisabundantlyclearfromtheforecaststhatreachingthefullemploymentgoalliesmuchfurtherinthefuturethan2030.Thisdoesnotcontradictthefindingthat,mostlikely,formalityrateswillincreaseinmostifnotallColombiancitieslargerthan50,000inhabitants.Also,itdoesnotdenythatthedifferentforecastmethodsconsistentlyindicatethattheformalemploymentgrowthratesinthelargestcitieswillbeabout5percent.However,thereismuchlessconsistencyinthepredictionsforthemid-sizeandsmallercities,manyofwhicharenotveryoptimistic.Giventhelimitationsoftheregression-basedforecasts,themachine-learningbasedoneshouldbegivenseriousconsideration.Themainstrengthofthelatterliesnotinitsabilitytopredictaggregates,butinallthenuancesitprovideswithrespecttotheindividualpredictions.Forsomeofthesmallercities(suchasCarmendeBolívarandChiquinquirá),itpredictswithconfidencethatformalemploymentrateswillstagnateattheirlowinitiallevel,contrarytowhatthefullspecificationregressionwouldsuggest.Inothercases(suchasTunjaandPopayán),itstronglypredictsafastprocessoflaborformalization,consistentwiththestillincipientpasttendencies,butalsowiththepredictionsbasedonregressions.Yetinothers,thepredictionsnot

Page 24: Forecasting Formal Employment in Cities Share Your Story

22

onlydifferwidelyacrossmethods,butthosebymachine-learningarestatisticallyweak(Fusagasugá,Tulúa).Asarguedinthetheoreticalsectionandshownintheregressionresults,complexitypotentialisthestrongestandmostconsistentpredictorofformalemploymentratechangesincities.However,themachine-learningmethodsuggeststhattherelationbetweenthetwovariablesislessstraightforwardthanassumedintheregression-basedmethods.Furtherresearchisneededtounderstandhowtheabilityofcitiestomakeuseoftheirskillmixindevelopingnewindustriesmaybeaffectedbyurbanfeaturessuchasdensity,availabilityoftransportationmeans,women’saccesstoworkplaces,etc.

Page 25: Forecasting Formal Employment in Cities Share Your Story

23

ReferencesAlbrecht, James, Lucas Navarro, and Susan Vroman. 2009. “The effects of LabourMarket

PoliciesinanEconomywithanInformalSector”.EconomicJournal,119(539):1105-29.

Bosch, Mariano, and William F. Maloney. 2010. “Comparative analysis of labor marketdynamicsusingMarkovprocesses:Anapplicationtoinformality”.LabourEconomics,17(4):621-31.

DeSoto,Hernando.1989.TheOtherPath:TheInvisibleRevolutionintheThirdWorld.NewYork:HarperandRow.

_______________.2000.TheMysteryofCapital:WhyCapitalismTriumphsintheWestandFailsEverywhereElse.NewYork:BasicBooks.

Duranton, Gilles. 2015. Delineating Metropolitan Areas: Measuring Spatial LabourMarketNetworksThroughCommutingPatterns. In:WatanabeT.,Uesugi I.,OnoA.(eds)TheEconomicsofInterfirmNetworks.AdvancesinJapaneseBusinessandEconomics,vol4.Springer,Tokyo.

Gollin, D., Jedwab, R. & Vollrath D., “Urbanizationwith andwithout Industrialization”,Journal of Economic Growth (2016) 21: 35. https://doi-org.ezp-prod1.hul.harvard.edu/10.1007/s10887-015-9121-4

Harris,JohnR.,andMichaelP.Todaro.1970.“Migration,Unemployment,andDevelopment:ATwo-SectorAnalysis.”AmericanEconomicReview60(1):126–42.

Hartshorn,Scott.MachineLearningWithRandomForestsAndDecisionTrees:AVisualGuideForBeginners.KindleEdition,2016.

Hidalgo,CésarandRicardoHausmann2009.“TheBuildingBlocksofEconomicComplexity”,Proceedings of the National Academy of Sciences,106(26):10570-5.DOI:10.1073/pnas.0900943106.

Kugler,Adriana,MauriceD.KuglerandLuisO.Herrera-Prada.2017."DoPayrollTaxBreaksStimulate Formality? Evidence from Colombia’s Reform,"Economia, Journal of theLatinAmericanandCaribbeanEconomicAssociation,Fall2017:3-40.

Levy,Santiago.2008.GoodIntentions,BadOutcomes:SocialPolicy,Informality,andEconomicGrowthinMexico.BrookingsInstitutionPress.

Lewis, W. Arthur. 1954. “Economic Development with Unlimited Supplies of Labor.”ManchesterSchoolofEconomicandSocialStudies22(2):139–91.

McGuire,T.J.,Bartik,T.J.,1991.WhobenefitsFromstateandlocaleconomicdevelopmentpolicies?JSTOR.

Meghir, Costas, Renata Narita, and Jean-Marc Robin. 2015. “Wages and Informality inDevelopingCountries”.AmericanEconomicReview,105:1509-46.

Neffke, Frank and Martin Henning. 2013. “Skill Relatedness and Firm Diversification”,StrategicManagementJournal,34(3):297-316

O’Clery, N., Chaparro, J.C., Gómez-Liévano, A., *Lora, E. 2019. “Skill Diversity and theEvolutionofFormalEmploymentinCities”,submittedtoResearchPolicy.

Rauch, James E. 1991. “Modeling the Informal Sector Formally.” Journal of DevelopmentEconomics35(1):33–47.

Ulyssea, Gabriel. 2010. “Regulation of entry, labor market institutions and the informalsector”.JournalofDevelopmentEconomics,91(1):87-99.

Page 26: Forecasting Formal Employment in Cities Share Your Story

24

Appendix1-CalculationMethodsforIndustryComplexityThisappendixexplainsthemethodsforcalculatingtheindustrycomplexityvariableintroducedattheendofSection2.ItisadaptedfromHidalgoandHausmann(2009)andNeffke andHenning (2013). The actual calculations used formal employmentdataofallindustriesproducingeithergoodsorservices(ISIC-AC,Rev.3,at4digits,usingsocialsecuritydatafromPILA).Intheequationsbelow,thesub-indexcindicatescitiesandthesub-indexpindicatesindustries.Whilenotimesub-indexisusedhere,allcalculationsareappliedforeachyearseparately(2008-2015).CalculationofRevealedComparativeAdvantagesThe computation starts with data for employment by industry, city and year,organizedinmatrixform:

𝑋!"Fromthismatrix,thefollowingaggregatesarecomputed:

𝑋! = 𝑋!"!

𝑋! = 𝑋!"

!

𝑋 = 𝑋!"

!!

ThesemetricsareusedtocalculatetheRevealedComparativeAdvantage(RCA)foreachcity/industrycombination:

𝑅𝐶𝐴!" =𝑋!" 𝑋!𝑋! 𝑋

DiversityandUbiquityCalculationsThe RCA matrix is transformed in a binary matrix depending on whether aparticularvalueislargerthan1ornot:

Page 27: Forecasting Formal Employment in Cities Share Your Story

25

𝑀!" =1 𝑅𝐶𝐴!" ≥ 1

0 𝑅𝐶𝐴!" < 1

Thismatrixindicatestheindustriesthatarerelativelylargeineachcity.Thismatrixis thenused to compute theDiversity indicator at the city level, and theUbiquityindicatorat the industry level–that is, thecountof thenumberof industrieswithrelatively large employment for each city, and the count of the cities that have agivenindustrywitharelativelyhighintensity:

𝑘!,! = 𝑀!"!

𝑘!,! = 𝑀!"!

IndustryEconomicComplexityThe complexity of an industry can be measured by its ubiquity weighed by thediversity of the localities that have revealed comparative advantage in suchindustry.Extendingthisexerciseadinfinitum,correctingdiversitywithubiquityandvice-versawith consecutive iterations, is called themethodofreflections. It canbeexpressedasfollows:

𝑘!,! =1𝑘!,!

𝑀!"!

1𝑘!,!

𝑀!!!𝑘!!,!!!!!

= 𝑘!!,!!!!!

𝑀!!!𝑀!"

𝑘!,! 𝑘!,!!

= 𝑘!!,!!!

!!𝑀!,!!!

Where:

𝑀!,!!! ≡

𝑀!!!𝑀!!

𝑘!,! 𝑘!,!!

Usingvectornotation,thecalculationmethodcanbewritteninacompactmanneras:

𝒌𝒏 = 𝑴𝑪×𝒌𝒏!𝟐when𝑛 → ∞,thefollowingexpressionobtains:

Page 28: Forecasting Formal Employment in Cities Share Your Story

26

𝑴𝑪×𝒌 = 𝝀𝒌Where𝒌isaneigenvectorof𝑴𝑪.Thesecondlargesteigenvectorof𝑴!istakenastheIndustryComplexityIndex.TheIndex iscalculatedonemployment levelsper industry/citycombination, includingonly industrieswith at least 50 formal employees in an averagemonth, and onlycitieswithatleast10industrieswith50ormoreformalemployees.

Page 29: Forecasting Formal Employment in Cities Share Your Story

27

Appendix2.Regressionsofspeedtowardsfullformalemploymentoncomplexitypotentialandothercontrols

(Pooledordinaryleastsquaresfordifferentintervals,withyeardummies)

Full7-yearperiod Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-7(log) 0.003043 0.0007914 3.85 0

Workingagepopulationatt-7(log) -0.0006131 0.0003166 -1.94 0.058

Formalityrateatt-7(logistic) 0.1132962 0.046996 2.41 0.019

Oilproducingcity 0.0037701 0.0007497 5.03 0

Bartikshockbetweent-7andt -0.0419715 0.0237082 -1.77 0.082

Constant -0.0388139 0.0235932 -1.65 0.106

Numberofobs=62

AdjR-squared=0.5891

6-yearintervals Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-6(log) 0.0030322 0.0005672 5.35 0

Workingagepopulationatt-6(log) -0.0005583 0.0002257 -2.47 0.015

Formalityrateatt-6(logistic) 0.0777203 0.026448 2.94 0.004

Oilproducingcity 0.0034717 0.0005683 6.11 0

Bartikshockbetweent-6andt -0.0258881 0.0154644 -1.67 0.097

Constant -0.0221075 0.0132719 -1.67 0.098

Yeardummies F(1,117)= 8.784 0.004

Numberofobservations=124

AdjustedR-squared=0.5776

5-yearintervals Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-5(log) 0.0029394 0.000478 6.15 0

Workingagepopulationatt-5(log) -0.0004868 0.0001827 -2.66 0.008

Formalityrateatt-5(logistic) 0.0371817 0.0154663 2.4 0.017

Oilproducingcity 0.0027807 0.0004671 5.95 0

Bartikshockbetweent-5andt -0.0046998 0.0114487 -0.41 0.682

Page 30: Forecasting Formal Employment in Cities Share Your Story

28

Constant -0.0031799 0.007911 -0.4 0.688

Yeardummies F(2,178)= 2.3 0.103

Numberofobservations=186

AdjustedR-squared=0.5334

4-yearintervals Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-4(log) 0.0029197 0.0004596 6.35 0

Workingagepopulationatt-4(log) -0.0004501 0.0001706 -2.64 0.009

Formalityrateatt-4(logistic) 0.0158137 0.0133056 1.19 0.236

Oilproducingcity 0.0022181 0.0004436 5 0

Bartikshockbetweent-4andt 0.0154851 0.0119289 1.3 0.195

Constant 0.0070455 0.0069345 1.02 0.311

Yeardummies F(3,239)= 6.548 0

Numberofobservations=248

AdjustedR-squared=0.514

3-yearintervals Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-3(log) 0.0029632 0.0004778 6.2 0

Workingagepopulationatt-3(log) -0.0005133 0.0001734 -2.96 0.003

Formalityrateatt-3(logistic) -0.0015121 0.0120469 -0.13 0.9

Oilproducingcity 0.0018829 0.0004502 4.18 0

Bartikshockbetweent-3andt 0.0446801 0.0134568 3.32 0.001

Constant 0.0166122 0.0064531 2.57 0.011

Yeardummies F(4,300)= 6.922 0

Numberofobservations=310

AdjustedR-squared=0.5149

2-yearintervals Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-2(log) 0.0032913 0.0005331 6.17 0

Workingagepopulationatt-2(log) -0.0006558 0.0001903 -3.45 0.001

Formalityrateatt-2(logistic) -0.0025988 0.0124833 -0.21 0.835

Oilproducingcity 0.001869 0.0004873 3.84 0

Page 31: Forecasting Formal Employment in Cities Share Your Story

29

Bartikshockbetweent-2andt 0.0717888 0.0178002 4.03 0

Constant 0.0199271 0.0068108 2.93 0.004

Yeardummies F(5,361)= 8.34 0

Numberofobservations=372

AdjustedR-squared=0.5402

1-yearintervals(fullspecification) Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-1(log) 0.0033963 0.0006686 5.08 0

Workingagepopulationatt-1(log) -0.0006598 0.0002322 -2.84 0.005

Formalityrateatt-1(logistic) -0.0272684 0.0122967 -2.22 0.027

Oilproducingcity 0.0016853 0.0005864 2.87 0.004

Bartikshockbetweent-1andt 0.2048173 0.0303162 6.76 0

Constant 0.0329708 0.0071898 4.59 0

Yeardummies F(6,422)= 5.841 0

Numberofobs=434

AdjustedR-squared=0.5020

1-yearintervals(simplifiedspecification) Coefficient Standarderror tstatistic P>|t|

Complexitypotentialatt-1(log) 0.0030968 0.0003987 7.77 0

Oilproducingcity 0.0036794 0.0005224 7.04 0

Constant 0.0118205 0.0011629 10.16 0

Yeardummies F(6,422)= 36.571 0

Numberofobs=434

AdjustedR-squared=0.4331

Page 32: Forecasting Formal Employment in Cities Share Your Story

30

Appendix3.Currentandprojectedformalityrates

(orderedbymidprojection)

CityCurrent(2015)

Projected(2030)

Fullspecification

Simplifiedspecification

Yopal 57% 88% 100%MedellínMet 44% 73% 78%BogotáMet 43% 71% 75%BucaramangaMet 42% 71% 72%ManizalesMet 40% 68% 62%TunjaMet 39% 69% 59%Neiva 39% 75% 89%VillavicencioMet 38% 70% 66%Popayán 36% 65% 57%CaliMet 35% 66% 66%PereiraMet 35% 69% 66%Barrancabermeja 35% 80% 85%Acacías 34% 68% 74%Ibagué 33% 65% 57%GuadalajaradeBuga 32% 63% 44%SantaMarta 31% 63% 54%SanAndrés 30% 65% 48%RionegroMet 30% 63% 54%CartagenaMet 29% 65% 58%Apartadó 28% 63% 44%Valledupar 28% 59% 46%ArmeniaMet 27% 63% 53%Montería 27% 61% 49%BarranquillaMet 25% 60% 54%PastoMet 25% 60% 46%Arauca 24% 67% 65%DuitamaMet 24% 66% 50%CúcutaMet 24% 62% 52%Sincelejo 24% 61% 46%Quibdó 23% 59% 39%Palmira 22% 62% 46%Florencia 20% 61% 41%Cartago 20% 60% 42%SogamosoMet 19% 60% 42%Riohacha 19% 56% 33%GirardotMet 19% 55% 35%

Page 33: Forecasting Formal Employment in Cities Share Your Story

31

TuluáMet 18% 58% 41%Aguachica 16% 56% 32%SantanderdeQuilichao 16% 52% 26%Espinal 16% 55% 31%Fusagasugá 16% 54% 33%LaDorada 15% 56% 32%Granada 15% 52% 28%Pamplona 13% 50% 18%Montelíbano 12% 50% 18%Fundación 12% 51% 21%Buenaventura 12% 51% 28%Ocaña 12% 52% 26%Pitalito 11% 55% 32%Caucasia 11% 55% 30%Chiquinquirá 11% 52% 23%Ciénaga 8% 48% 17%IpialesMet 8% 51% 24%Chigorodó 8% 51% 21%Magangué 7% 48% 18%SanAndresdeTumaco 7% 48% 20%Turbo 7% 49% 21%Cereté 7% 49% 18%Maicao 6% 49% 18%Corozal 5% 48% 15%Lorica 5% 48% 19%ElCarmendeBolívar 3% 43% 7%

Total62cities 34% 66% 63%Correlationwithpast 100% 95% 95%

Page 34: Forecasting Formal Employment in Cities Share Your Story

32

Appendix4.Pastandprojectedformalemploymentgrowthrates

(orderedbymidprojection)

City Past(2008-2015)

Projected(2015-2030)

FullspecificationSimplified

specification

Fusagasugá 22% 11% 8%Aguachica 21% 10% 6%Magangué 20% 14% 7%Acacías 19% 8% 8%Granada 18% 11% 7%Yopal 17% 6% 8%Ocaña 16% 12% 7%Lorica 16% 17% 10%Quibdó 16% 7% 5%Pitalito 15% 14% 10%Ciénaga 14% 13% 6%Valledupar 14% 8% 7%VillavicencioMet 14% 7% 7%GirardotMet 13% 9% 5%SanAndresdeTumaco 13% 17% 10%ElCarmendeBolívar 13% 22% 9%Maicao 12% 17% 10%Montería 12% 8% 6%Chiquinquirá 12% 14% 8%Sincelejo 11% 9% 7%PastoMet 11% 8% 6%Caucasia 11% 15% 11%Neiva 11% 6% 7%Pamplona 11% 11% 3%RionegroMet 10% 7% 6%Popayán 10% 5% 4%IpialesMet 10% 16% 10%Arauca 10% 9% 8%CartagenaMet 10% 7% 6%Fundación 10% 11% 4%Chigorodó 10% 17% 11%Florencia 10% 10% 7%BucaramangaMet 10% 5% 5%Ibagué 10% 6% 5%CúcutaMet 9% 9% 7%SantaMarta 9% 7% 6%

Page 35: Forecasting Formal Employment in Cities Share Your Story

33

Riohacha 9% 12% 8%Buenaventura 9% 13% 9%ArmeniaMet 9% 7% 5%Barrancabermeja 9% 6% 7%Corozal 8% 17% 8%Cereté 8% 16% 8%SanAndrés 8% 7% 4%BarranquillaMet 8% 8% 7%SantanderdeQuilichao 8% 11% 5%TunjaMet 8% 6% 5%ManizalesMet 8% 4% 4%SogamosoMet 7% 8% 6%Espinal 7% 9% 5%BogotáMet 7% 5% 6%PereiraMet 7% 6% 5%DuitamaMet 7% 8% 6%LaDorada 7% 10% 6%Cartago 7% 8% 6%CaliMet 7% 6% 6%Apartadó 6% 10% 7%Montelíbano 6% 13% 5%MedellínMet 6% 5% 5%Turbo 6% 18% 12%Palmira 5% 8% 6%TuluáMet 3% 10% 7%GuadalajaradeBuga 1% 5% 2%

Total62cities 8% 6% 6%Correlationwithpast 100% 24% 27%

Page 36: Forecasting Formal Employment in Cities Share Your Story

34

Appendix5.Scoreofpastformalemploymentchangepredictionsbymachine

learning

City

Numberofcorrect

predictions2008-2015(outof7)

Mediangrowthgrouppredicted2008-2015

Meanprobabilityofbelongingtogrowthgroup2008-2015

Yopal 7 4 62%

Neiva 7 4 52%

BarranquillaMet 7 3 48%

SanAndrés 7 3 45%

RionegroMet 7 3 44%

IpialesMet 7 1 43%

Cartago 7 2 41%

Florencia 7 2 39%

Apartadó 7 2 38%

Chigorodó 6 1.5 53%

ElCarmendeBolívar 6 1 52%

Turbo 6 1.5 50%

VillavicencioMet 6 4 47%

CúcutaMet 6 3 47%

Arauca 6 3 46%

SantanderdeQuilichao 6 1.5 46%

Chiquinquirá 6 1 46%

Magangué 6 2 45%

Quibdó 6 2 45%

Popayán 6 3.5 45%

Ibagué 6 3.5 44%

Acacías 6 4 43%

PastoMet 6 3 43%

Montelíbano 6 1.5 42%

GuadalajaradeBuga 6 2 41%

ArmeniaMet 6 3 41%

Valledupar 6 2.5 41%

Pamplona 6 2 40%

Riohacha 6 1 40%

Palmira 6 2 38%

Caucasia 6 2 37%

Barrancabermeja 5 4 51%

Lorica 5 1 49%

Cereté 5 1 49%

PereiraMet 5 4 49%

Maicao 5 1 48%

Page 37: Forecasting Formal Employment in Cities Share Your Story

35

TunjaMet 5 4 45%

DuitamaMet 5 3 45%

SanAndresdeTumaco 5 2 45%

LaDorada 5 2 45%

Montería 5 3 42%

Buenaventura 5 1 40%

Pitalito 5 2 40%

SogamosoMet 5 3 40%

Ocaña 5 2 40%

GirardotMet 5 2 39%

SantaMarta 5 3 39%

Espinal 5 2 38%

TuluáMet 5 1 38%

Sincelejo 5 2 38%

Fusagasugá 5 3 37%

Corozal 4 1 52%

CartagenaMet 4 3.5 51%

ManizalesMet 4 4 49%

MedellínMet 4 4 48%

Ciénaga 4 1.5 48%

CaliMet 4 3.5 44%

BucaramangaMet 4 4 43%

Granada 4 1.5 40%

Aguachica 4 2 39%

BogotáMet 3 4 45%

Fundación 3 2 38%

Median 5.5 2 44%

Page 38: Forecasting Formal Employment in Cities Share Your Story

36

Table6.Futureformalemploymentchangegrouppredictedbymachinelearning

(groupsofformalemploymentratechange:1=Lessthan0.05pp2=Between0.05and0.28pp3=Between0.28and0.54pp

4=Morethan0.54pp)

City Growthgrouppredicted Probabilityofbelongingtogroup

Popayán 4 59%

ManizalesMet 4 55%

PereiraMet 4 55%

TunjaMet 4 51%

MedellínMet 4 50%

Acacías 4 48%

BogotáMet 4 48%

CaliMet 4 45%

BucaramangaMet 4 43%

VillavicencioMet 4 42%

ArmeniaMet 4 39%

RionegroMet 4 37%

CúcutaMet 3 59%

Arauca 3 59%

BarranquillaMet 3 51%

Montería 3 49%

SanAndrés 3 47%

Palmira 3 46%

SantanderdeQuilichao 3 45%

Aguachica 3 44%

Neiva 3 43%

SantaMarta 3 43%

PastoMet 3 43%

Sincelejo 3 43%

Ibagué 3 40%

SogamosoMet 3 40%

Caucasia 3 40%

Apartadó 3 38%

TuluáMet 3 38%

CartagenaMet 3 36%

Quibdó 2 51%

Chigorodó 2 50%

Page 39: Forecasting Formal Employment in Cities Share Your Story

37

Espinal 2 46%

Cartago 2 45%

DuitamaMet 2 44%

LaDorada 2 44%

Turbo 2 43%

Pamplona 2 42%

Valledupar 2 41%

Montelíbano 2 39%

Ciénaga 2 38%

Granada 2 38%

Florencia 2 37%

GirardotMet 2 31%

ElCarmendeBolívar 1 63%

Cereté 1 58%

Chiquinquirá 1 55%

Maicao 1 53%

Corozal 1 52%

Magangué 1 49%

SanAndresdeTumaco 1 47%

Yopal 1 47%

Buenaventura 1 45%

Lorica 1 44%

Ocaña 1 44%

Barrancabermeja 1 43%

GuadalajaradeBuga 1 42%

IpialesMet 1 41%

Riohacha 1 37%

Pitalito 1 35%

Fundación 1 34%

Fusagasugá 1 30%

Page 40: Forecasting Formal Employment in Cities Share Your Story

38

Appendix7.Comparisonofregressionandmachine-learningpredictionsoffutureformalemploymentchange

(groupsofformalemploymentratechange:1=Lessthan0.05pp2=Between0.05and0.28pp3=Between0.28and0.54pp

4=Morethan0.54pp)

2008-2015median

Regression-based(simplified

specification)

Machine-learningbased

Samepredictions?

City

Median2008-2015andregression-

based

Median2008-2015

andmachine-learningbased

Regression-basedandmachine-

learningbased

Total(outof3)

Aguachica 3 3 3 1 1 1 3

TunjaMet 4 4 4 1 1 1 3

ManizalesMet 4 4 4 1 1 1 3

Popayán 4 4 4 1 1 1 3

VillavicencioMet 4 4 4 1 1 1 3

Acacías 4 4 4 1 1 1 3

PereiraMet 4 4 4 1 1 1 3

SanAndrés 3 3 3 1 1 1 3

Florencia 2 4 2 0 1 0 1

SantanderdeQuilichao 2 3 3 0 0 1 1

Lorica 1 3 1 0 1 0 1

Ocaña 3 3 1 1 0 0 1

Sincelejo 3 4 3 0 1 0 1

MedellínMet 3 4 4 0 0 1 1

Apartadó 2 3 3 0 0 1 1

Chigorodó 2 3 2 0 1 0 1

RionegroMet 3 4 4 0 0 1 1

Turbo 2 3 2 0 1 0 1

BarranquillaMet 3 4 3 0 1 0 1

BogotáMet 3 4 4 0 0 1 1

CartagenaMet 3 4 3 0 1 0 1

ElCarmendeBolívar 1 3 1 0 1 0 1

Chiquinquirá 1 3 1 0 1 0 1

SogamosoMet 3 4 3 0 1 0 1

LaDorada 2 3 2 0 1 0 1

Montería 3 4 3 0 1 0 1

Cereté 1 3 1 0 1 0 1

Montelíbano 2 3 2 0 1 0 1

GirardotMet 3 3 2 1 0 0 1

Quibdó 2 3 2 0 1 0 1

Page 41: Forecasting Formal Employment in Cities Share Your Story

39

Neiva 4 4 3 1 0 0 1

Riohacha 1 3 1 0 1 0 1

SantaMarta 3 4 3 0 1 0 1

Ciénaga 2 3 2 0 1 0 1

Granada 3 3 2 1 0 0 1

PastoMet 3 4 3 0 1 0 1

IpialesMet 1 3 1 0 1 0 1

CúcutaMet 3 4 3 0 1 0 1

Pamplona 2 3 2 0 1 0 1

ArmeniaMet 3 4 4 0 0 1 1

BucaramangaMet 3 4 4 0 0 1 1

Corozal 1 3 1 0 1 0 1

Ibagué 3 4 3 0 1 0 1

Espinal 2 3 2 0 1 0 1

CaliMet 3 4 4 0 0 1 1

Cartago 2 4 2 0 1 0 1

Arauca 3 4 3 0 1 0 1

Yopal 4 4 1 1 0 0 1

DuitamaMet 3 4 2 0 0 0 0

Caucasia 2 4 3 0 0 0 0

Magangué 2 3 1 0 0 0 0

Valledupar 3 4 2 0 0 0 0

Fusagasugá 3 4 1 0 0 0 0

Pitalito 2 4 1 0 0 0 0

Maicao 2 3 1 0 0 0 0

Fundación 2 3 1 0 0 0 0

SanAndresdeTumaco 2 3 1 0 0 0 0

Barrancabermeja 3 4 1 0 0 0 0

Buenaventura 2 3 1 0 0 0 0

GuadalajaradeBuga 2 3 1 0 0 0 0

Palmira 2 4 3 0 0 0 0

TuluáMet 1 4 3 0 0 0 0

Averagesandpercentsame 2.5 3.5 2.4 21% 56% 26% 34%

Page 42: Forecasting Formal Employment in Cities Share Your Story

40