Essays on ethnicity and marriage in Sub-Saharan Africa

Ecole des Hautes Etudes en Sciences Sociales

Ecole doctorale: ED 465 – Economie Pantheon Sorbonne

Paris School of Economics

Doctorat

Discipline: Sciences Economiques

Juliette CRESPIN-BOUCAUD

Essays on ethnicity and marriage in Africa

These dirigee par Denis COGNEAU

Date de soutenance : le 13 septembre 2021

Rapporteurs 1 Catherine GUIRKINGER Professeure a l’Universite de Namur

2 Michael GRIMM Professeur a l’Universite de Passau

Jury 1 Denis COGNEAU Directeur d’etudes EHESSDirecteur de recherche IRDProfesseur a PSE

2 Valerie GOLAZ Directrice de Recherche, INED

3 Catherine GUIRKINGER Professeure a l’Universite de Namur

4 Michael GRIMM Professeur a l’Universite de Passau

5 Oliver VANDEN EYNDE Charge de Recherche, CNRSProfesseur a PSE

A mes grands-peres, Joel et Albert,dont l’amour des destinations lointaines et celui des etudes

ont seme les graines de cette these.

ACKNOWLEDGEMENTS-REMERCIEMENTS

All the stages of this PhD, from applying for a bourse de these to completing this manuscript,were made possible by the help and support of countless people. Here goes some acknowledg-ment of what has been given to me over these years.

My gratitude goes to Denis Cogneau, my advisor during this PhD. Ta rigueur, ton sens aigude l’ethique et ton interet pour les sciences sociales sont ce qui m’ont decidee a poursuivre unethese avec toi comme directeur, et je n’ai jamais regrette ce choix. Les chapitres qui composentcette these te doivent beaucoup: tu as toujours ete disponible pour examiner avec moi resultatset idees et pour relire une nouvelle (n-ieme) version de ces chapitres. Au dela de l’aspect“recherche” de la these, tes questions (parfois difficiles) et tes conseils (souvent tres justes) ontcontribue a la facon dont je me suis construite en tant que chercheuse. Enfin, dans les hauts etles bas de la these, tu as su trouver les mots pour ameliorer et mes papiers et mon moral. Merci.

I thank Oliver Vanden Eynde, who has followed this PhD from the start. I am extremely grate-ful to have benefited from your sharp insights and your kind words. Your feedback and sug-gestions always nudged me in the direction of the bigger picture when it was most needed. Ithank all the jury members for agreeing to read and review this PhD. I am grateful to MichaelGrimm and Catherine Guirkinger for their comments, suggestions, and encouragements. Ithank Valerie Golaz for having accepted to examine this PhD.

This PhD would, of course, not have been the same without my co-authors. I was extremelylucky to work with great researchers who also happened to be extremely different kinds ofresearchers. Merci d’abord a Rozenn Hotte avec qui ce second chapitre de these a ete ecrit. Cefut un plaisir d’echanger idees, recits de terrain et bouts de code pour construire ce projet. Je teremercie d’avoir dompte mes penchants a l’exactitude exhaustive (aussi connue sous le nom decollectionite de minuscules details) en faveur d’une vision d’ensemble du projet – je sais que cene fut pas une lutte facile. Il me reste encore un peu de progres a faire sur ce point et beaucoupa apprendre de toi, alors a tres vite au prochain projet !The third chapter of this PhD is joint work with Alexander Moradi and Catherine Boone. Alex,thank you for having invited me for the visiting stay at the University of Sussex that marked thebeginning of our collaboration. Thank you for having always treated me as a researcher whoseinputs were of equal value to yours despite age and status differences. I have enjoyed exploringKenyan history in your company. Catherine, your knowledge about Kenya, African politics,ethnicity is astonishing. This last chapter benefited greatly from your insights. I also thank you

v

for having invited me to join the Spatial Inequalities in African Political Economy Project as anAssociate Research Fellow. This position exposed me to new and exciting research questionsand responsibilities. It provided both new ideas that improved this PhD and a welcome changefrom working on my PhD. Finally, thanks for not having married me off in Kenya despite theappealing marriage proposals – this would probably have marked the end of my PhD.

Le financement de la recherche publique a permis la realisation de cette these, qui a ete fi-nancee par une bourse doctorale de l’ENS Paris-Saclay, l’EHESS, l’Universite Paris 1 et la LSE.Ma recherche ainsi que les sejours de recherche que j’ai entrepris ont ete possible grace a cesfinancements ainsi que les financements additionnels du groupe Developpement ainsi que dePGSE. Je remercie ici les equipes administratives de PSE: leur travail a rendu possible d’avoirun environnement de recherche si accueillant. Un merci plus specifique a Veronique Guilletinqui resout avec bienveillance et efficacite petites questions et gros problemes des doctoranteset doctorants de PSE.

To keep some structure in this acknowledgment section, the following paragraphs are orga-nized according to the distance between my home and the location of people and places. Thisvariable has the property of predicting almost perfectly the likelihood that events mentionedbelow took place before March 2020. First, my PhD was enriched by a research stay at the Uni-versity of Sussex (UK) and by fieldwork trips to Cote d’Ivoire and Kenya. At the University ofSussex, I thank members of the economics department, with special thanks to Panka Bencsik,Annemie Maertens, and Andy McKey. En Cote d’Ivoire, Hughes Kouadio, Albertine Kouadio,et Claire : a vous trois, merci beaucoup !In Kenya, I benefited from the help and support from the team at IFRA : merci beaucoup aMarie-Aude Fouere, Chloe Josse-Durand et Marion Asego. Thanks also to the BIEA team:their garden and library provided great places to work. For inviting me to join them duringtheir fieldwork, thanks to Lea Lacan and Miriam Waltz. For their help, support, and great com-pany, thanks to Alex Dyzenhaus, Clarissa Lehne, Riley Linebaugh, and Abla Safir. Thanks toLeigh Gardner, Michael Wahman, Andrew Linke, and Fibian Lukalo: it was a great work trip,not only because we saw elephants. Special thanks to Francesca di Matteo – merci de m’avoirprete tes lunettes d’anthropologue lors de ce sejour au Kenya. Qualitative interviews were onlypossible because I worked with four fantastic interviewers: Francis Onyango, Jehu Nyawara,Florence Mukami, and Esther Mbua. I am extremely grateful for their help and the care theyput into finding the best respondents. I also want to thank all the people who took the time toanswer my questions and were willing to trust me and to open up about their lives. Asantenisana! I also want to thank all the people with whom I had the opportunity to discuss my work– at conferences, in airplanes, on Zoom –. The interest you showed in my research and yourcomments and suggestions were extremely helpful.

Second, I benefited from an amazing research environment at PSE. Thanks to all members of thedevelopment group at PSE, for their insightful comments on my work, all their great presenta-tions at CFDS, and for discussions on Wednesday evenings. Merci a Sylvie Lambert et KarenMacours de faire en sortie que tout tourne toujours bien au sein du goupe Developpement.¡Gracias comrade Oscar! Organizing the CFDS with you taught me a lot, and we did havemuch more fun than worries. Je remercie plus particulierement Sylvie Lambert pour ses relec-

tures et commentaires pour tout les sujets portant sur les mariages et divorces. I thank toLuc Behaghel, Karen Macours, Akiko Suwa-Eisenmann, and Liam Wren Lewis for their in-sightful comments on my work. Merci aux membres du groupe d’Histoire economique pourl’ambiance stimulante mais bienveillante du groupe. Ma these a aussi ete enrichie par mesactivites d’enseignement a Paris 1: merci a mes etudiantes et etudiants. J’ai sans doute plusappris de vos questions que vous de mes reponses.

I was extremely lucky to have awesome officemates, not once, but twice. Long live R6-01! LaDream Team, premiere du nom – Paola, Rozenn, Lisa, Yasmine et Sarah–, merci d’avoir creecette atmosphere si speciale de bienveillance et d’entraide. Je mesure la chance que j’ai eue depouvoir vous poser absolument toutes les questions qui me passaient par la tete, notammenten debut de these. Lisa, thanks for rere(...)rereading my paper on intermarriages – you taughtme the basics of ”writing econ papers”. Rozenn, je t’ai deja citee en tant que coautrice maisje voudrais ici te remercier pour ton amitie toujours bienveillante, tes innombrables relecturesque ce soit pour cette these ou pour des candidatures, mais aussi pour ta franchise quant aufait que non, la recherche n’est pas toujours un long fleuve tranquille. Ton soutien m’a ete tresprecieux durant cette these.Many thanks to the G.C. Team, – Sarah, Andrea, Charlotte, Duncan, and Zhexun. Despitethe pandemic restrictions we were able to enjoy some glitter & pailettes times together. Yoursupport, ideas, and love of laughter have been a great joy during the past two years. Thanks tothe GIS/python/GNU-Linux/let’s push the efficiency frontier further crew: Etienne, Andrea,Aaron, Sarah and Ximena. Sarah, je t’ai deja mentionnee trois fois, et jamais trois sans quatre:cette these aurait ete bien moins drole et agreable sans toi. Je te suis/tu me suis depuis leM1, d’un batiment/bureau a l’autre, d’une Dream Team a l’autre, et ce jusqu’a la fin de nostheses respectives. Merci d’avoir partage mon enthousiasme pour les MOOC de recherchereproductible, les editeurs de texte, les articles du Guardian, les echanges de bouquins et lesstickers Zozo de Signal. Thanks to all my fellow PhD students who have made these yearsat PSE so lively and happy. Among those that I have not mentioned yet: Cristina, Emanuela,Georgia, Giulio, Helene, Ismael, Jonathan, Julieta, Juni, Justine, Kelsey, Malka, Maiting, MarionL, Marion R, Melanie, Paul D-P, Victor C, Victor P, Shaden, Yajna.

Third, a large chunk of this PhD was written at home. While working on a PhD in a globalpandemic has at times been challenging, I felt lucky to be able to focus (or attempt to focus) onmy PhD and to work from home. Que soit ici remerciees les personnes de qui j’ai partage lequotidien pendant ces annees de these: Benjamin, Bridget, Louise, Laureline et Tote. Merci aSam pour son soutien logistique et moral et pour avoir ete un excellent co-(bureau-du)-salon.Les confinements successifs ont ete plus doux grace a vous.

Enfin (else), que soient ici remerciees toutes les personnes qui m’ont entouree, accompagnee,nourrie et soutenue pendant toutes ces annees de these. Merci a ma famille pour leur soutiensans faille, leur sens de l’humour et leur capacite a parler d’autre chose que de ma these. Unimmense merci a mes parents qui ont toujours accueilli et encourage ma curiosite et qui m’onttransmis la perserverance et la tenacite. Tout cela, et bien plus encore, leur amour, m’ont porteelors de cette these. Thanks to all my friends who have not been mentioned above. Une mentionspeciale pour le club S.C. pour les fous rires, les vacances (!) et le soutien. Merci egalement a

toutes celles et tous ceux avec qui j’ai eu la joie de danser, de reparer des velos, de randonnerou de bivouaquer mais aussi de partager week-ends et soirees. Ces moments sont a la fois ceuxqui ont rendu le travail de cette these soutenable (l’oxygene avant d’y retourner) et agreable (lajoie de retrouver les projets et questions apres une pause).

SUMMARY

This PhD brings together three empirical chapters that are related to either ethnic identity ormarital decisions in sub-Saharan Africa.

The first chapter documents the evolution of interethnic and interfaith marriages in 15 coun-tries of sub-Saharan Africa using the Demographic and Health Surveys (DHS). I find that 20.4%of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and religious differ-ences are not always barriers. The share of interethnic marriages increased and the share ofinterfaith marriages decreased. The increase in the share of interethnic marriages can onlypartly be explained by increases in urbanization and education levels. This finding suggeststhat changes in preferences and social norms may also be at work. The results show that someethnic boundaries became more porous whereas religious boundaries did not.

The second chapter provides new evidence on the consequences of parental divorce for chil-dren in Africa. Using survey data that collected the detailed life histories of Senegalese womenand their children, we investigate how children’s educational outcomes are affected by theirparents’ divorce. Using a sibling fixed-effect strategy, we find that younger siblings were morelikely than their older siblings to have attended primary school. This higher level of investmentdoes not persist in the long run. We find that custody and fostering decisions do not seem tomediate the positive effects on school attendance. Our findings are consistent with either an im-provement of the financial situation (due to remarriage) or an increase in the decision-makingpower of mothers after the divorce.

The third chapter assesses the effects of ethnic homogenization on public good provision bystudying a large-scale land reform program that took place in Kenya and led to a significantreduction in ethnic diversity. We implement a spatial regression discontinuity design. We find astrong discontinuity in ethnic diversity but no differences in school provision between programareas and counterfactual areas in the short run as well as in the long run. As individuals wereresettled to the program areas, they likely lack the dense social networks that favor collectiveaction. Our results are not driven by spillovers from treatment to counterfactual areas. Amediation analysis indicates that income effects are unlikely to drive this null result.

Keywords: Ethnicity, Marriage, Education, Religion, Children, Public Goods, Land Reform,Sub-Saharan Africa.

JEL Classification: H41, I25, J11, J12, J13, J15, Z12, N37, N97, Q15

RESUME

Ce doctorat rassemble trois chapitres empiriques qui se rapportent soit a l’identite ethnique,soit aux decisions matrimoniales en Afrique sub-saharienne.

Le premier chapitre documente l’evolution des mariages interethniques et interconfession-nels dans 15 pays d’Afrique subsaharienne a l’aide des enquetes demographiques et sani-taires (EDS). 20,4% des mariages sont interethniques et 9,7% sont interconfessionnels, ce quiindique que les differences ethniques et religieuses ne sont pas toujours des obstacles entre lesgroupes. Je mets en evidence que la part des mariages interethniques a augmente et que lapart des mariages interconfessionnels a diminue. L’augmentation de la part des mariages in-terethniques n’est qu’en partie expliquee par la hausse de l’urbanisation et du niveau moyend’education. Ceci suggere qu’il est probable que des changements dans les preferences et lesnormes sociales jouent un role dans l’augmentation de la part des mariages interethniques. Cesresultats indiquent que certaines frontieres ethniques sont devenues plus poreuses, mais queles frontieres religieuses ne le sont pas devenues.

Le deuxieme chapitre fournit de nouvelles connaissances sur les consequences du divorce desparents pour les enfants en Afrique. En utilisant les histoires de vie detaillees de femmessenegalaises et de leurs enfants, nous etudions comment les resultats scolaires des enfants sontaffectes par le divorce de leurs parents. Nous utilisons une strategie d’effet fixe fratrie et consta-tons que les freres et sœurs plus jeunes sont plus susceptibles que leurs aınes d’avoir frequentel’ecole primaire. Ce niveau d’investissement plus eleve ne persiste pas sur le long terme. Nousconstatons que les decisions relatives a la garde et au confiage ne semblent pas expliquer nosresultats. Ceux-ci pourraient etre expliques soit par une amelioration de la situation financierepour les meres remarriees, soit par un plus fort pouvoir de decision des meres apres le divorce.

Le troisieme chapitre evalue les effets de l’homogeneisation ethnique sur les investissementsen biens publics. Nous etudions pour cela un programme de reforme fonciere a grande echellequi a eu lieu au Kenya et qui a conduit a une reduction significative de la diversite ethnique.Cette experience naturelle nous permet d’utiliser un modele de regression sur discontinuitespatiale. Nous trouvons une forte discontinuite entre les zones du programme et les zones con-trefactuelles au niveau de la diversite ethnique mais aucune difference dans le nombre d’ecoles.Une analyse de mediation indique que les effets du revenu ne sont probablement pas a l’originede ce resultat nul. Les resultats sont probablement dus a la perte des reseaux sociaux densesen raison de la migration, et non a la diversite ethnique en soi.Keywords: Ethnicite, Mariage, Education, Religion, Enfants, Biens Publics, Reforme Agraire,Afrique sub-saharienne.JEL Classification: H41, I25, J11, J12, J13, J15, Z12, N37, N97, Q15

Contents

Acknowledgements v

Summary ix

Resume xi

Table of Contents xiii

List of Figures xv

List of Tables xvi

Introduction 1

1 Interethnic and interfaith marriages in sub-Saharan Africa 111.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3 Intermarriages and marriage markets: Preferences, norms, and diversity . . . . . 171.4 Empirical strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.4.1 Descriptive statistics: Comparisons across countries and across identitycategories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.4.2 Assessing time trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5 Results on pooled sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.5.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.5.2 Time trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.6 Results at country-level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.6.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.6.2 Time trends on interethnic marriages . . . . . . . . . . . . . . . . . . . . . 301.6.3 Time trends on interfaith and Muslim-Christian marriages . . . . . . . . 35

1.7 Robustness analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.7.1 Testing for heterogeneity in the “other” group . . . . . . . . . . . . . . . 381.7.2 Testing the remarriage story . . . . . . . . . . . . . . . . . . . . . . . . . 381.7.3 Testing the “assimilation”/conversion story . . . . . . . . . . . . . . . . . 401.7.4 Testing Birth year v. Cohabitation year . . . . . . . . . . . . . . . . . . . . 42

1.8 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44A-1.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44A-1.2. Linguistic distance measures . . . . . . . . . . . . . . . . . . . . . . . . . 45Online Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47B-1.1. Supplementary Appendix on data . . . . . . . . . . . . . . . . . . . . . . 47B-1.2. Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49B-1.3. Ethnic composition over time . . . . . . . . . . . . . . . . . . . . . . . . . 53B-1.4. Additional results at country-level . . . . . . . . . . . . . . . . . . . . . . 55B-1.5. Additional robustness analyzes at country-level . . . . . . . . . . . . . . 60

xiii

Table of Contents

2 Parental divorce and children’s educational outcomes in Senegal 672.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.2 Data: Pauvrete et structure familiale . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.3 Background: Divorce and education in Senegal . . . . . . . . . . . . . . . . . . . 72

2.3.1 Insights on divorces in Senegal . . . . . . . . . . . . . . . . . . . . . . . . 722.3.2 The Senegalese education system . . . . . . . . . . . . . . . . . . . . . . . 79

2.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.4.1 Empirical strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.4.2 Identification and interpretation issues . . . . . . . . . . . . . . . . . . . . 84

2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.5.1 Results: Ever attended primary school . . . . . . . . . . . . . . . . . . . . 882.5.2 Results: Completed primary . . . . . . . . . . . . . . . . . . . . . . . . . . 912.5.3 Sensitivity checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2.6 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.6.1 Children: Custody and fostering decisions after divorce . . . . . . . . . . 942.6.2 Mothers: Financial resources, remarriage and decision-making power . 95

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102A-2.1. Individual determinants of educational outcomes . . . . . . . . . . . . . 102A-2.2. Observable characteristics of identifying families . . . . . . . . . . . . . . 103A-2.3. Additional table and figure: Custody and fostering decisions . . . . . . . 104A-2.4. Additional results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Online Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106B-2.1. Additional robustness checks . . . . . . . . . . . . . . . . . . . . . . . . . 106B-2.2. Additional tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3 Ethnic homogenization and public goods: Evidence from Kenya’s land reform pro-gram 1093.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123.3 Background: Land, ethnicity, and public goods in Kenya . . . . . . . . . . . . . . 114

3.3.1 Colonial period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.3.2 Land reform post-independence . . . . . . . . . . . . . . . . . . . . . . . 1153.3.3 Schools in Kenya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.4.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.4.2 Identifying assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1223.4.3 Data for SRDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243.5.1 Validity checks: Ethnicity and pre-treatment comparison . . . . . . . . . 1243.5.2 Effects of program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333.6.1 Ethnicity and school provision . . . . . . . . . . . . . . . . . . . . . . . . 1333.6.2 Alternative explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

3.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1373.7.1 Other comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1373.7.2 Additional robustness tests . . . . . . . . . . . . . . . . . . . . . . . . . . 141

3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Conclusion 155

References 159

List of Figures

1.1 Changes over time: Education, Urban residence, Diversity levels . . . . . . . . 181.2 Survey effect: Marital status and age at survey date . . . . . . . . . . . . . . . 211.3 Intermarriage shares on pooled sample . . . . . . . . . . . . . . . . . . . . . . 251.4 Shares of interethnic marriages . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.5 Shares of interfaith marriages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.6 Observed interethnic marriage shares over birth cohorts . . . . . . . . . . . . . 301.7 Observed interfaith marriage shares over birth cohorts . . . . . . . . . . . . . . 351.8 Shares of interethnic/interfaith marriages over birth cohort . . . . . . . . . . . 38A-1.1 Ethnolinguistic tree for groups listed in DHS Benin . . . . . . . . . . . . . . . . 45A-1.2 Ethnolinguistic tree for groups listed in DHS Kenya. . . . . . . . . . . . . . . . 46B-1.1 Random interethnic marriage shares - Panel A . . . . . . . . . . . . . . . . . . 54B-1.2 Random interethnic marriage shares - Panel B . . . . . . . . . . . . . . . . . . . 55

2.1 Age of mothers and of their children at the time of the divorce . . . . . . . . . 752.2 With whom do children usually live? . . . . . . . . . . . . . . . . . . . . . . . . 762.3 Formal education by age and birth year . . . . . . . . . . . . . . . . . . . . . . 812.4 Coefficients associated with age-at-divorce variables . . . . . . . . . . . . . . . 91A-2.1 With whom do children of divorced parents live? . . . . . . . . . . . . . . . . . 104A-2.2 Coefficients associated with age-at-divorce variables . . . . . . . . . . . . . . . 105B-2.1 Coefficients on age at divorce – All children aged 7 and older . . . . . . . . . 107

3.1 Map of Kenya: Scheduled Areas, program areas, other redistributed areas,and main cities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.2 Illustration: Boundaries studied. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213.3 Ethnic fractionalization at the border (DHS) . . . . . . . . . . . . . . . . . . . . 127A-3.1 Map: Boundaries studied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144A-3.2 Ethnic fractionalization in Kenya (censuses) . . . . . . . . . . . . . . . . . . . . 145

xv

List of Tables

1.1 Average intermarriage shares and linguistic distance . . . . . . . . . . . . . . 231.2 Trends - Intermarriage shares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.3 Trend - Observed interethnic marriage shares . . . . . . . . . . . . . . . . . . . 311.4 Trend - Linguistic distance between spouses . . . . . . . . . . . . . . . . . . . 331.5 Trend - Observed interfaith marriage shares . . . . . . . . . . . . . . . . . . . . 361.6 Robustness checks on interethnic marriages and interfaith marriages – Pooled

sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41A-1.1 Data description by survey wave . . . . . . . . . . . . . . . . . . . . . . . . . . 44B-1.1 DHS – Countries and waves not included in the main sample . . . . . . . . . 48B-1.2 Observed and random intermarriage shares . . . . . . . . . . . . . . . . . . . . 50B-1.3 Muslim/Christian marriages & religious structure . . . . . . . . . . . . . . . . 51B-1.4 Linguistic distance - descriptive statistics . . . . . . . . . . . . . . . . . . . . . . 51B-1.5 Descriptive statistics on education and urban residence levels . . . . . . . . . . 52B-1.6 Women’s characteristics and interethnic marriage . . . . . . . . . . . . . . . . 56B-1.7 Women’s characteristics and interfaith marriage . . . . . . . . . . . . . . . . . 57B-1.8 Trend on interethnic marriage shares . . . . . . . . . . . . . . . . . . . . . . . . 58B-1.9 Trend on interfaith marriage shares . . . . . . . . . . . . . . . . . . . . . . . . . 59B-1.10 Ethnic identification and time in union . . . . . . . . . . . . . . . . . . . . . . . 62B-1.11 Religious identification and time in union . . . . . . . . . . . . . . . . . . . . . 63B-1.12 Trend - Year of marriage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64B-1.13 Robustness table – Interethnic marriages . . . . . . . . . . . . . . . . . . . . . 65B-1.14 Robustness table –Interfaith marriages . . . . . . . . . . . . . . . . . . . . . . . 66

2.1 Characteristics of divorced women and of their children . . . . . . . . . . . . . 782.2 Balance test of characteristics according to children’s age at the time of divorce 872.3 Effect of parental divorce on primary school attendance and completion . . . 892.4 Sensitivity to the definition of the sample (columns) and to the definition of

being affected by divorce (rows) . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.5 Heterogeneity of effects on attendance: custody and fostering decisions . . . 962.6 Heterogeneity of effects on attendance: Remarriage, age, and education of

mothers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A-2.1 Correlations between individual characteristics and school attendance . . . . 103A-2.2 Characteristics of families according to children’s age at divorce . . . . . . . . 104A-2.3 Custody and fostering decisions after a divorce . . . . . . . . . . . . . . . . . . 105B-2.1 Robustness checks: Primary school attendance . . . . . . . . . . . . . . . . . . 106B-2.2 Schooling of children according to mother’s characteristics . . . . . . . . . . . 108B-2.3 Characteristics of families of divorced women according to the age composi-

tion of children at the time of the survey . . . . . . . . . . . . . . . . . . . . . . 108

3.1 Characteristics of settlement schemes . . . . . . . . . . . . . . . . . . . . . . . 1173.2 Is the program associated with ethnic homogeneisation? . . . . . . . . . . . . 1253.3 Program areas vs. conterfactual – Altitude & pre-treatment characteristics . . 1273.4 Program areas vs. counterfactual – Short run outcomes . . . . . . . . . . . . . 1283.5 Program areas vs. counterfactual – Long run outcome measures . . . . . . . . 129

xvi

List of Tables xvii

3.6 Program areas vs. conterfactual – Sponsor of schools . . . . . . . . . . . . . . 1303.7 Program areas vs. conterfactual – Border segment analysis (income potential) 1323.8 Program areas vs. counterfactual – Mediation analysis: Field size and long

run outcome measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137A-3.1 Schools in Kenya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144A-3.2 Field size in low and high income potential program areas . . . . . . . . . . . 146A-3.3 Program areas vs. counterfactual – Long run outcome measures (Ethnicity

boundary FE controls) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147A-3.4 Program areas vs. conterfactual – Border segment analysis (ethnic majority) . 148A-3.5 Descriptive statistics: Population, schools, and students by ethnic majority . . 149A-3.6 Program areas vs. African Land Units – Altitude & pre-treatment character-

istics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149A-3.7 Program areas vs. African Land Units – Short run outcomes . . . . . . . . . . 149A-3.8 Program areas vs. African Land Units – Long run outcome measures . . . . . 150A-3.9 Program areas vs. African Land Units – Border segment analysis (income

potential) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151A-3.10 Non-program European settler areas vs. African Land Units – Altitude &

pre-treatment characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152A-3.11 Non-program European settler areas vs. African Land Units – Short run out-

comes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152A-3.12 Non-program European settler area vs. African Land Units – Long run out-

come measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153A-3.13 Robustness – Flexible bandwidth selection, correction of standard errors . . . 154

INTRODUCTION

This PhD brings together three pieces of empirical chapters that are related to either ethnic

identity or marital decisions in sub-Saharan Africa. I start by presenting the two issues dis-

cussed in this dissertation: Ethnic identity and marital decisions in sub-Saharan Africa. I then

present the chapters of my dissertation as well as “behind the scenes” insights from the quali-

tative interviews and new research developments.

General motivation

The starting point of this PhD was writing my M2 thesis on interethnic marriage patterns in

Kenya and Ghana. I found the topic fascinating and soon I was wondering about two big

questions – What is ethnic identity? How do people choose whom to get married to? – that

became the foundation of my PhD.

What exactly is ethnic identity, and how to measure it, has long been discussed (Kanbur et al.

(2011) provides a summary of the issues at stake). In this PhD, ethnic identity is defined as

what is measured in surveys and censuses as an individual’s ethnic group. In this dissertation,

I do not discuss the formation of ethnic categories (Lynch (2011), for an example on Kenya) or

how the list of ethnic categories to be included in a given survey is determined (Fearon, 2003).

It must be noted that ethnic identity is likely to be mismeasured. This PhD aims to provide

elements to back up the need for a more complex vision of ethnic identity and ethnic diversity

than usually present in most of the literature, and to do so without considering the fact that

ethnic categories are social constructs (a topic that has been more studied).

In the words of Chandra (2006) ethnic membership is “determined by attributes associated

with, or believed to be associated with, descent”. As such, concerns about ethnic membership

being passed on result, in most societies, in restrictive marriage rules (Anderson and Bidner,

2021). Here we are, back to the partner choice question! While marriage decisions may impact

2 Introduction

whether children identify as members of a certain ethnic group, ethnic groups also have differ-

ent marriage rules, that in turn affect the pool of potential partners. Rules can be as different as

prescribing that individuals marry outside of their clan, for instance in among the Luo group in

Kenya (Luke and Munshi, 2006), and allowing cousins to get married, for instance in Senegal

(Hotte and Marazyan, 2020).

In this PhD, “married individuals” (or to ”husbands”, ”wives”, ”partners”, ”spouses”) might

have married according to civil law, religious law, customary law, or... not have married yet.

Cohabitant couples are included in the same category as the married couples. Most unmarried

cohabitant couples have started the formalities needed for customary marriage, but have not

completed them, often because they are waiting for the husband-to-be to be able to afford the

bride price (a transfer from the groom or his parents to the parents of the bride). However,

these couples are still in a committed partnership that explains why they should be included in

the same category as the married couples. I use ”divorce” and ”separation” as interchangeable

words throughout this dissertation for the same reason.

Having defined the key concepts used in this PhD, I provide a more detailed introduction on

ethnicity and on marriages and divorces.

Ethnicity

Ethnic identity has been mostly taken, in economics, as given, and not as a historical construct

or as the product of individual decisions (be they rational or not). While the impact of ethnic

fractionalization has been investigated 1, how ethnic identities are formed and maintained has

received little attention in the ethnic politics literature, although ethnic identity formation and

ethnic diversity levels both need to be endogenized (Alesina and La Ferrara, 2005). 2 Surveys

conducted in African countries rarely allow respondents to select multiple ethnic affiliations –

or none at all – and the indicators based on these measures, such as the ethnolinguistic fraction-

alization index (Alesina et al., 2003) and polarization measures (Montalvo and Reynal-Querol,

2005), similarly consider all individuals are belonging to a single ethnic group. This gap is

even more puzzling knowing the literature that addresses the complexity and multi-layering

of ethnic identities (Chandra 2006; Posner 2005). There are (at least) two understudied research

questions regarding ethnicity in sub-Saharan Africa: the first one is how ethnic identities are

”transmitted” across generations and the second is how ethnic diversity is made up (at the

1Since the seminal paper of Easterly and Levine (1997a) on ethnic diversity and growth in Africa, ethnic frac-tionalization and its consequences have been widely studied.

2Michalopoulos (2012) studies the origins of linguistic diversity. Ahlerup and Olsson (2012) proposes a theoreti-cal model that explains the emergence of groups within populations.

Introduction 3

national and sub-national levels).

Regarding the first question, Bisin and Verdier (2000) develop a model of transmission of iden-

tity across generations. They find that identity is transmitted mostly through marriage: inter-

marriages result in minority-group parents having access to a weaker socialization technology

and thus in their children been less strongly attached to the minority identity. Intermarriages

have thus long been used to measure the strength of cleavages within societies (Kalmijn, 1998),

as they combine a measure of segregation (who meets whom and where) and a measure of

who is thought to be an acceptable spouse. Chapter 1 aims to provide quantitative evidence on

intermarriage is sub-Saharan Africa.

Regarding the second question, the fact that states play a key role in shaping levels of ethnic

diversity has been demonstrated. Government decisions or approbation is needed when col-

lecting data on ethnicity and may affect the categories listed in a given country (i.e. are minority

groups explicitly included in the ethnic classification or are they lumped together in the catch-

all ”other groups” category?). Ethnic classifications themselves affect the level of ethnic diver-

sity that is measured (Fearon, 2003; Posner, 2004). Another type of intervention is resettlement

policies, either ”nation-building policies” (Bazzi et al., 2019; Miguel, 2004) or forced migration

policies (Charnysh and Peisakhin, 2021; Miho et al., 2019). A more extreme type of intervention

is those that aim to change the level of ethnic diversity at the national level throughout ethnic

cleansing (Conversi, 2010). Chapter 3 studies the impact of an ethnic-based redistributive land

reform program (ethnic homogenization policy) on school provision in Kenya.

Marriages and divorces

Partner choice is one of the most important decisions that affect individuals’ welfare (Becker,

1973). In most sub-Saharan Africa, parental involvement in the spouse choice remains strong

(Fafchamps and Quisumbing, 2007). Parents might want their children (or at least some of

their children) to marry someone who is from the same group as them (Anderson and Bidner,

2021), whether this group is defined as the ethnic group (to ensure the transmission of their eth-

nic identity (Bisin and Verdier, 2000)) or the extended family (notably for insurance purposes

(Hotte and Marazyan, 2020)), though these concerns might not always be fully compatible with

assortative matching on other dimensions, such as education (Furtado and Theodoropoulos

(2011), on US marriage markets).

However, marital rules and traditions are non-static: parental involvement is decreasing in

some contexts (Bertrand-Dansereau and Clark, 2016) and not all individuals marry according

4 Introduction

to the social norms. Chapter 1 aims to better understand the partner choice in sub-Saharan

Africa and to capture some of the changes in marriage-related norms.

If partner choice is so important, it is also because of the risks associated with having chosen

the wrong partner: another key decision is whether to divorce or not. Most countries in sub-

Saharan Africa are characterized by high marital instability: Clark and Brauner-Otto (2015)

estimate that approximately 25% of first unions end in divorce. Consequences of divorce on

women’s welfare have been studied more in-depth (Djuikom and van de Walle, 2018; Lambert

et al., 2019) than the consequences of divorces on children (Chae, 2016). Chapter 2 aims to assess

the impact of divorce on children’s educational outcomes in Senegal.

This dissertation

This dissertation is made up of three empirical research papers that all relate to either ethnicity

or marriages in sub-Saharan Africa. The first chapter documents patterns of interethnic and

interfaith marriages in 15 countries of sub-Saharan Africa. The second chapter discusses the

consequences of parental divorce on children’s educational attainment in Senegal. The third

chapter investigates the effects on public good provisions of a Kenyan land redistribution pro-

gram that resulted in ethnic homogenization of program areas. I present the outline of each

chapter, sometimes linking it to insights from fieldwork and archival work, as well as to new

research development. A more detailed introduction to the data and methods used can be

found at the end of this section.

Chapter 1 – Interethnic and interfaith marriages in sub-Saharan Africa

Summary of the paper This paper documents interethnic and interfaith marriage patterns to bet-

ter understand which identity-related cleavages matter in sub-Saharan Africa. Using Demo-

graphic and Health Surveys (DHS) spanning 15 countries, I build a representative sample of

women born between 1955 and 1989. Extrapolating to the population of these countries, I find

that 20.4% of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and re-

ligious differences are not always barriers. Accounting for diversity levels, both shares are

similar. Regarding the pooled sample of these 15 countries, the share of interethnic marriages

increased, and there is no country where interethnic marriages became less frequent. The share

of interfaith marriages decreased in the pooled sample. Only in Cameroon did interfaith mar-

riages become more frequent. The share of Muslim-Christian marriages remained stable in the

pooled sample. The increase in the share of interethnic marriages can only partly be explained

by increases in urbanization and education levels, suggesting that changes in preferences and

Introduction 5

social norms may also be at work. The decrease in the share of interfaith marriages is due to

decreasing levels of religious diversity: traditional religions were replaced by Islam and Chris-

tianity. These results show that some ethnic boundaries became more porous whereas religious

boundaries did not. However, religious boundaries shifted as a result of changes in the reli-

gious landscape.

Insights from qualitative research & links to new research While these insights were not presented

in this dissertation, I spent time in Cote d’Ivoire and in Kenya to conduct qualitative inter-

views on the topics of marriages, divorces, and intergenerational transmission. One question

that I had been curious about was the process by which information on the ethnic identity of

individuals was produced. How was this question asked and how did people know how to

answer it? While I did not directly ask this question, what became clear during interviews is

that there is the short story (the one box that will be ticked by the interviewer) and the longer

story. I have started to think about the question “What is your ethnic group?” as similar to the

question “Where were you born?”, for which an interviewee could answer ”In Paris” to then

tell a story about moving to a small town in Brittany at the age of 2 and having rarely visit the

capital city since. The standard questionnaire answer hence does not measure the ”true value”

of individuals’ ethnic identity. A respondent, having presented himself as a Luo man, without

adding any qualifiers, suddenly remarked that of course he was able to deal with a situation that

involved thugs – because he heard them speak Luhya and was able to reason with them. When

I asked him how he had learned this language, he simply remarked: ”Oh, but my mother is

Luhya.” Another story was that of a Christian woman in Kenya. When she told me the story of

her marriage, I learned that she had grown up in a Muslim family and had converted to marry

her Christian husband. So far, nothing too remarkable. But when I asked her whether her par-

ents had approved of the marriage, she quickly answered that her mother has grown up in a

Christian family and had converted to Islam before marrying. This piece of information that

was to me so important (a family with two generations of interfaith marriages!) and explained

so much has not been mentioned earlier in the interview. Both these stories are examples of

situations in which individuals’ mixed ascendance matter to understand their choices and de-

cisions, but also stress how strongly social norms dictate how individuals should introduce

themselves. It is not a coincidence that these stories both have a strongly gendered dimension:

social conventions in Kenya dictate that children belong to their father’s ethnic group and it

seems that women are much more often the ones who convert before getting marrying than

men.

Another question that became salient while conducting interviews was the question of in-

6 Introduction

tergenerational transmission. A striking pattern emerged during interviews: in interethnic

families, parents were very likely to address their children in a language that was not their

mother tongue, usually Kiswahili in Kenya and French in southern Africa. Some respondents

expressed that ”tribes” were a thing of the past, and that they did not intend for their chil-

dren to learn another language, others wished that their children would learn their mother

tongue – a wish often associated with the idea of sending the children ”back home” to speak

with their grandparents. One father in Cote d’Ivoire jokingly told me that, frustrated by the

fact that his primaryschool-aged children could understand his mother tongue but not speak

it, he had simply pretended not to understand any other language anymore, especially when

his children were bickering. His children, he reported, quickly switched to his mother tongue

when addressing him. While such parental strategies might succeed, it seems likely that chil-

dren born to parents of different ethnic groups will have weaker ethnic attachment than their

counterparts, not least because they cannot speak fluently the language associated with their

parents’ home villages. This intuition has been back up by Dulani et al. (2021). In a recent pa-

per, the authors explore one of the implications of high rates of intermarriages by examining

the electoral preferences of multiethnic voters in Kenya and Malawi. They find that mixed in-

dividuals are less likely to support the party associated with their stated ethnic group, relative

to mono-ethnics. Their findings stress the importance of updating the theoretical and empirical

approaches used when addressing ethnicity in African countries.

Chapter 2 – Parental divorce and children’s educational outcomes in Senegal

The second chapter of this PhD, written jointly with Rozenn Hotte (CY Cergy Paris Univer-

sity), deals with the issue of divorces in Senegal and their impact on children’s educational

outcomes. This paper provides new evidence on the consequences of parental divorce for chil-

dren in Africa. Using survey data that collected the detailed life histories of Senegalese women

and their children, we investigate how children’s educational outcomes are affected by their

parents’ divorce. We use a sibling fixed-effects strategy that allows us to control for all the

factors that are common to all children in a family, such as parental preferences regarding edu-

cation or the level of education of the parents, alleviating concerns of omitted variable bias. We

compare children who were old enough to have been enrolled in primary school at the time of

the divorce to their younger siblings, for whom enrollment decisions had not yet been made

at the time of the divorce. We find that younger siblings were more likely than their older sib-

lings to have attended primary school. This higher level of investment does not persist in the

long run: there are no differences between siblings when considering primary school comple-

tion. We find that custody and fostering decisions do not seem to mediate the positive effects

Introduction 7

on school attendance. Our findings are consistent with either an improvement of the financial

situation (due to remarriage) or an increase in the decision-making power of mothers after the

divorce.

Chapter 3 – Ethnic homogenization and public goods: Evidence from Kenya’s land reform

program

Summary of the paper The third chapter of this PhD, written jointly with Catherine Boone (Lon-

don School of Economics and Political Science) and Alexander Moradi (Free University of

Bozen-Bolzano), is concerned with the consequences of land reform and ethnic homogenization

on public good provision. In this paper, we examine the effects of ethnic homogenization on

public good provision using a natural experiment that took place in Kenya. We study a large-

scale land reform program that led to a significant reduction in ethnic diversity, the settlement

schemes program. Using a novel dataset about the precise location of program area boundaries

(Lukalo et al., 2019) that we combine with archival, survey, census, and satellite data, we im-

plement a spatial regression discontinuity design. We argue that the border between program

areas (treatment) and neighboring areas (counterfactual) is plausibly random at the local level

and confirm that there are no observable differences in pre-treatment characteristics. We find a

strong discontinuity in ethnic diversity but no differences in school provision between program

areas and counterfactual areas in the short run as well as in the long run. As individuals were

resettled to the program areas, they likely lack the dense social networks that favor collective

action to either hold politicians accountable or to provide public goods throughout cooperation

at the community level. Our results are not driven by spillovers from treatment to counterfac-

tual areas. A mediation analysis indicates that income effects are unlikely to drive this null

result.

Talking about land in Kenya The third chapter of this PhD is my first economic history chap-

ter and my first interdisciplinary research collaboration. I was lucky to join the team of the

Spatial Inequalities in African Political Economy Project for a trip to Kenya in May and June

2019 during which I got the opportunity to listen to talks on land in Kenya that were given by

Kenyan researchers and students. These offered fascinating insights on which research ques-

tions were deemed important across academic and cultural settings. Among the (selected)

sample of Kenyan researchers and study who I saw present their research, many worked on

the link between subsistence income and plot size. What is the minimal plot size needed to sup-

port the needs of a Kenyan family? were they asking. What I could hear was there is not enough

land. The issue of inequality and land redistribution often came up when audiences reacted

8 Introduction

to the presentation of the preliminary findings of Boone et al. (2021), and almost always what

people were discussing was the issue of class inequality, not ethnic inequality. Having read so

many papers about ethnicity, I sometimes forget that indeed, this is not necessarily the relevant

cleavage (not that this was new to me).

Data & methods

Data In this dissertation, I have used detailed household survey data to study marriages

and divorces and geolocalized data, and archival documents. This PhD also offered me the

opportunity to collect qualitative data in Cote d’Ivoire and Kenya as well as to conduct archival

research in the British National Archives.

Household survey data The Demographic and Health surveys (DHS) were used in Chapter 1 and

Chapter 3. DHS questionnaires procure information on who is married to whom (within the

household), and in specific countries and waves, these questionnaires also procure information

on respondents’ ethnic and religious identity. Respondents (women and men) are asked to re-

port their own ethnic identities, which makes the measure much less sensitive to measurement

error and declaration bias compared to other surveys (some of the Living Standard and Mea-

surement Surveys only ask either the household head or the most knowledgeable person about

other household members to declare the ethnic identity of all household members). Another

advantage of using the DHS data is that the surveys cover a large range of countries and a long

period – the earliest surveys that collected information on ethnic identity in African countries

were implemented in 1992 and the program is still ongoing.

The second wave of the Pauvrete et Structure Familiale (PSF) survey that was conducted in Sene-

gal in 2011 was used in Chapter 2. The survey is described in detail in De Vreyer et al. (2008).

This database provides an extremely rich and detailed account of the lives of Senegalese house-

holds and includes detailed information on marital histories, consumption, and migration. Sur-

veys conducted in sub-Saharan Africa rarely record marital histories and often only record the

marital status of respondents at the time of the survey. The PSF database includes information

on all the unions that a respondent experienced (date of start and end of the union and rea-

son why the last union that ended did so) which allowed us to identify divorced women and

their children instead of assuming whether a woman who had remarried had divorced or been

widowed. The PSF database also includes information on all children of household members.

As we retrieve information on divorces through each child’s mother, we need to have informa-

tion on children who are not living with their mother (i.e. who are not living in the surveyed

household) for our estimates not to be biased by endogenous decisions.

Introduction 9

Shapefiles of settlement schemes and archival documents on the land reform program Chapter 3 was

made possible by the access to the exact boundaries of the settlement schemes from Lukalo

et al. (2019), who constructed a map layer from over 1,500 digitized Registry Index Maps (RIM)

kept by Survey of Kenya in 2018. Polygons were joined with attribute data from the Ministry

of Lands and Physical Planning (MoLPP) dataset on Kenyan settlement schemes, presented in

Lukalo and Odari (2016). The final database includes shapefiles with boundary information

as well as information on schemes. We also relied on documents from the British National

Archives to obtain more information on the program. Presenting the complete list of the data

used in this chapter exceeds the purpose of this introduction. A detailed list can be found in

Section 3.2.

Qualitative interviews in Cote d’Ivoire and Kenya I conducted qualitative interviews regarding

marriages and divorces in Cote d’Ivoire (21 respondents) who were living in and close to Abid-

jan in June 2016. I also conducted the same type of qualitative interviews in Kenya in May and

August 2019, where I interviewed 52 respondents in urban and rural Kenya. Interview loca-

tions included Nairobi (the capital city) and Ongata Rongai (periurban areas close to Nairobi);

Kisumu (the third largest city of Kenya, located in the west of the country) and in rural areas in

Kisumu county; Nakuru (the fourth largest city of Kenya, located in central Kenya); Mombasa

(the second largest city of Kenya, located on the coast) and villages in Kilifi and Kwale counties.

These areas were selected to ensure a diverse range of settings as well as to be sure to interview

people from different backgrounds.

Methods These three chapters are all empirical ones for which I have relied on a wide range

of methods. Chapter 1 is a descriptive paper in which the main challenges were to think about

the representativeness of the data, and especially how observations needed to be weighted

when several surveys were pooled together (a point the DHS documentation is mostly silent

one) and then used to assess time trends. Chapter 2 relies on a sibling fixed effect specification

that is common in family economics and has already been used to study the impact of divorce

on children in developed countries (Bjorklund and Sundstrom, 2006; Le Forner, 2020). As many

parental and family characteristics are likely to simultaneously influence the probability that

parents divorce and the schooling of their children, a simple comparison of children according

to the divorce status of their parents would be biased. Sibling fixed effects control for all the

factors that are common to all children in a family, such as parental preferences regarding

education or the level of education of the parents. Short of allocating divorce randomly, this

strategy is one of the few that can get closer to a causal estimate of the impact of divorce.

Chapter 3 combines the use of a natural experiment with a spatial regression discontinuity

10 Introduction

design. The natural experiment is a redistributive land reform program, the settlement scheme

program that was implemented in Kenya after independence. Selection of households into the

program was made based on area-specific ethnic criteria, resulting in an ethnic homogenization

of program areas as a result of its implementation. We use the exact borders of these program

areas to assess the impact of an ethnic homogenization policy on the provision of public goods

(schools so far).

CHAPTER 1

INTERETHNIC AND INTERFAITH MARRIAGES IN SUB-SAHARAN AFRICA

Abstract1 This paper documents interethnic and interfaith marriage patterns to better under-stand which identity-related cleavages matter in sub-Saharan Africa. Using Demographic andHealth Surveys (DHS) spanning 15 countries, I build a representative sample of women bornbetween 1955 and 1989. Extrapolating to the population of these countries, I find that 20.4%of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and religious differ-ences are not always barriers. Accounting for diversity levels, both shares are similar. Regard-ing the pooled sample of these 15 countries, the share of interethnic marriages increased, andthere is no country where interethnic marriages became less frequent. The share of interfaithmarriages decreased in the pooled sample. Only in Cameroon did interfaith marriages becomemore frequent. The share of Muslim-Christian marriages remained stable in the pooled sam-ple. The increase in the share of interethnic marriages can only partly be explained by increasesin urbanization and education levels, suggesting that changes in preferences and social normsmay also be at work. The decrease in the share of interfaith marriages is due to decreasing lev-els of religious diversity: traditional religions were replaced by Islam and Christianity. Theseresults show that some ethnic boundaries became more porous whereas religious boundariesdid not. However, religious boundaries shifted as a result of changes in the religious landscape.

1This chapter was published in World Development in 2020 (Crespin-Boucaud, 2020).I extend special thanks toDenis Cogneau for carefully reading my paper several times and for suggesting numerous changes that greatlyimproved the paper. I am grateful to Yannick Dupraz, Sylvie Lambert, Alexander Moradi, Lisa Oberlander, andOliver Vanden Eynde for insightful discussions and comments on previous versions of this work. I thank seminarparticipants at the Paris School of Economics (PSE) and at the University of Sussex as well as two anonymousreferees for their constructive comments.

12 Interethnic and interfaith marriages in sub-Saharan Africa

1.1 Introduction

Social identity is an individual characteristic that has long been demonstrated to be complex

and multi-layered (Posner, 2005). However, a large fraction of the literature on sub-Saharan

Africa has relied on a unidimensional view of identity, equating identity with ethnicity. Even

though surveys and censuses in most countries now include categories for “mixed race” or

“mixed ancestry”, very few large-scale surveys conducted in African countries include such an

option, perpetuating the idea that ethnic identity is the allegiance to one group or tribe, and to

one homeland. Moreover, while the impact of ethnic fractionalization has been investigated2,

the manner in which ethnic identities are formed and maintained has received little attention.

As ethnicity is transmitted through descent3, we would expect interethnic marriages to be rare

in societies where ethnic cleavages are rigid: marrying within one’s group is a means for an

individual to ensure that her/his identity is passed down to her/his children4. Intermarriages

have long been used to measure the strength of cleavages within societies (Kalmijn, 1998), as

they combine a measure of segregation (who meets whom and where) and a measure of who is

thought to be an acceptable spouse. However, there is no quantitative evidence on interethnic

marriages in the case of sub-Saharan Africa. This paper aims to fill this gap.

I study interethnic and interfaith marriages in 15 countries in sub-Saharan Africa using data

from the Demographic and Health Surveys (DHS). This paper has four aims: providing de-

scriptive statistics on interethnic and interfaith marriages, discussing results at the extensive

margin (marrying outside one’s ethnic group) and the intensive margin (how far outside one’s

ethnic group?), assessing time trends, and analyzing which factors have contributed to the

time trends. Contrasting interethnic with interfaith marriages provides a broader picture of

intermarriage patterns in African countries, but it must be emphasized that religious identity

is more fluid than ethnic identity, as conversion allows individuals to change their religious

affiliation.

First, I find that 20.4% of married women are in an interethnic union, contrasting with 9.7%

2Since the seminal paper of Easterly and Levine (1997a), several works have pointed out the detrimental effectsof ethnic diversity on growth and public good provision (Alesina and La Ferrara, 2005; Churchill and Smyth, 2017;de la Cuesta and Wantchekon, 2016; Gershman and Rivera, 2018), Goren (2014), with only a few surveys reachingdifferent conclusions (Gisselquist et al., 2016).

3Chandra (2006) defines ethnic membership as “determined by attributes associated with, or believed to beassociated with, descent”. Kanbur et al. (2011) summarize debates over definitions of ethnic identity and of relatedmeasures. In this paper, I define ethnic groups based on DHS classification, and I do not discuss how intermarriagescould have influenced or determined these classifications.

4Bisin and Verdier (2000) develop a model of transmission of identity across generations and find that intermar-riages result in minority-group parents having access to a weaker socialization technology. In sub-Saharan Africa,ethnic fractionalization does not necessarily result in majority-minority settings. I nonetheless expect the theoret-ical result put forward to hold: high shares of intermarriage should be associated with weaker ethnic/religiousaffiliations for parents and their children.

1.1 Introduction 13

being in an interfaith union. Interethnic unions are hence far from rare events in sub-Saharan

countries, and their share ranges from 10.4% in Burkina Faso to 46% in Zambia. Interfaith

marriage shares range from 1.8% in Niger to 19.3% in Cote d’Ivoire. Second, using a sample

of women born between 1955 and 1989, I find that interethnic marriages became more com-

mon for later-born cohorts relative to earlier-born ones, while interfaith marriages became less

common. There is no country where the share of interethnic marriages decreased, and inter-

faith marriages increased only in Cameroon. Third, building on recent research on how to

measure ethno-linguistic diversity (Desmet et al., 2016; Gershman and Rivera, 2018), I compute

new linguistic distance measures that allow me to take into account diversity within and across

countries. In the case of interfaith marriages, I do not use a distance measure but instead study

Muslim-Christian marriages separately, as this type of union is arguably the most distant kind

of interfaith marriage in sub-Saharan Africa. I find that changes at the extensive margin do

not translate into changes at the intensive margin. Interethnic marriage shares increased, but

there is no clear pattern regarding variation in linguistic distance. Interfaith marriage shares

decreased, but Muslim-Christian marriage shares remained stable. Fourth, I examine whether

time trends on intermarriage shares can be explained by increased education and urbanization

levels. To do so, I compare time trends across specifications with controls and without controls.

The results for interethnic marriages point at the fact that, while education and urbanization

play a role in the increase of interethnic marriage shares, part of the increase could come from

changes in norms and preferences about interethnic marriages. Likewise, I find that urbaniza-

tion and education are not the key drivers of the decrease in interfaith marriages: this decrease

is mostly due to the decreased levels of religious diversity over time. This study of intermar-

riages finds that some – though not all – ethnic boundaries became more porous. Religious

boundaries did not become more porous, but the religious landscape changed as traditional

religions were replaced by Islam and Christianity. Finally, I confirm that my results are robust

to varying definitions of intermarriages. I also test the hypothesis that spouses become more

similar as the length of their marriage increases. Ethnic “assimilation” does not drive the re-

sults. However, there is evidence of conversion during marriage: my estimate is a lower bound

on the decline of interfaith marriages.

This paper contributes to three strands of the literature. First, this paper extends the empirical

literature on intermarriages (Fryer Jr, 2007; Furtado and Theodoropoulos, 2011; Kalmijn and

Van Tubergen, 2006; Monden and Smits, 2005; Qian and Lichter, 2007, 2011). Second, it con-

tributes to a growing literature that nuances or contests the idea that ethnicity is always the

key cleavage in sub-Saharan Africa. Contributions have suggested that which identity cate-

gory is salient depends on the context (Eifert et al., 2010; Miles and Rochefort, 1991). Looking


at the micro-level literature, Berge et al. (2018) show that there is little evidence of co-ethnic

bias in behavior games, contradicting results based on Implicit Association Tests (IAT) (such as

Habyarimana et al. (2007) in Uganda; Lowes et al. (2015) in DRC). The complexity of relation-

ships between ethnic groups (at the political level) was emphasized by Francois et al. (2015) and

Mozaffar et al. (2003) regarding electoral coalitions and power sharing. Simson (2018) shows

that once education is controlled for, public sector jobs in Kenya and in Uganda are rather eq-

uitably distributed along ethnic lines. Third, this paper adds to the literature comparing the

evolution and salience of ethnic and religious cleavages (McCauley, 2014).

The rest of the paper is organized as follows: Section 1.2 presents the data. Section 1.3 lists

factors that could explain the prevalence of intermarriages. Section 1.4 presents the empiri-

cal strategy used. Section 1.5 reports results on the pooled sample and section 1.6 results at

the country-level. Section 1.7 tests alternative stories and provides robustness checks on the

findings. Section 1.8 concludes.

1.2 Data

In this section, I present the data sources used and explain how the sample is built.

Data sources: DHS and Ethnologue

I use Demographic and Health Surveys (DHS) that were implemented in sub-Saharan Africa

(surveys used listed in Table A-1.1, Appendix A-1.1.). DHS questionnaires procure information

on who is married5 to whom (within the household), and in specific countries and waves, these

questionnaires also procure information on respondents’ ethnic and religious identity. The

descriptive sample includes the 25 countries with information on ethnic identity6. The main

sample is made up of 15 countries for which there are at least two survey waves that gather in-

formation on the ethnic and religious identity of respondents: Benin, Burkina Faso, Cameroon,

Cote d’Ivoire, Gabon, Ghana, Guinea, Kenya, Malawi, Mali, Niger, Senegal, Uganda, Togo,

and Zambia. The main sample is made up of women born between 1955 and 1989 and of their

husbands.

Additionally, I exploit the Ethnologue dictionary (Simons and Fennig, 2017) to get information

5Throughout the paper I use the terms “marriage,” “spouse,” “husband,” and “wife” to refer to married couplesas well as to cohabiting couples.

6These 25 countries are the 15 countries from the main sample, plus Central Africa Republic, Chad, the Republicof the Congo (Congo-Brazzaville), the Democratic Republic of the Congo (DRC), Ethiopia, Liberia, Mozambique,Namibia, Nigeria, and Sierra Leone. The countries included in this study are not a random sample of Africancountries. Including a question on the respondent’s ethnic identity is not a decision made independently fromwhether ethnicity matters in a country: the sample above cannot be considered to be representative of countriesthat did not include such a question in DHS.

1.2 Data 15

on the classification of each ethnic group’s traditional language. I use these classifications to

compute the linguistic distance of all of the pairs of ethnic groups. For each pair, I identify

the lowest common linguistic node that they share and compute the number of nodes between

each group and the common node. The mean of these two distances is the linguistic distance

of this pair (detailed methodology in Appendix A-1.2.).

Comparability over time: Reweighting and recoding

The main sample includes at least two data waves for each country, thus raising issues about

comparability over time. I explain briefly the steps taken to ensure that I can identify time

trends using this sample (the online Appendix details the processes used in this study).

The main sample is made up of women born between 1955 and 1989: for each year within

this period, the sample includes women from all of the 15 countries of the main sample. I

reweight the sample to make it representative of the population of married women in each

country. Reweighting and selecting the time period 1955-1989 ensure that the share of each

country remains (roughly) constant over time. Changes over time are hence not due to changes

in the respective weights of countries in the sample over time.

I recode both ethnic and religious categories to build a classification that fulfills two criteria.

First, the classification does not vary within a country. Second, for all of the cohorts and survey

waves, all of the groups listed in this classification have a least one member of each gender.

Grouping in fewer categories mechanically decreases the number of unions appearing as in-

terethnic/interfaith, so a time-invariant classification is needed to measure changes over time.

After recoding, ethnic classifications are specific to each country and include less than 10 cate-

gories for most countries7. The category “other (ethnicity)” groups together members of ethnic

groups that were not listed in all waves, people who did not identify with a specific group, and

foreign nationals. Recoding religious classifications makes apparent a key change during the

survey period: the surge of Pentecostalism in Africa (Mayrargue, 2004; Meyer, 2004). Changes

in classification are likely to reflect the agenda of church leaders, as “new Churches” have an

interest in being formally recognized in order to proselytize, which is not the behavior of faiths

less invested in proselytizing, such as traditional religions. Because new faith groups continue

to be listed, harmonizing nomenclatures across waves requires a high level of aggregation:

Christian, Muslim, other. The category “other (faith)” includes followers of traditional reli-

7Depending on the countries, ethnic classifications became more or less detailed. For instance, Akan subgroupsare listed separately in Ghana in the older survey waves but are only listed as “Akan” in the recent survey waves.However, the reverse phenomenon happened in Kenya, where groups listed together (Meru/Embu) are listed sep-arately in more recent survey waves.


gions, atheists, and members of new religious movements that cannot be linked to Christianity

or Islam. Among the women who belong to the “other (faith)” group, at least 41.8% identify

with a traditional religion. It is a lower bound on their share, as many survey waves do not

distinguish traditional religions from other faiths that do not belong to Christianity or Islam.

Variables: Intermarriages and individual characteristics

To study intermarriages, I build variables that measure intermarriages as well as variables that

are likely to influence the likelihood of intermarriage.

Ethnic and religious identity are self-declared in the DHS: I hence consider that the respon-

dent’s answer is a measure of their “true identity”. A marriage is interethnic (interfaith) if the

spouses’ answers correspond to different ethnic (faith) categories in the recoded classification.

Ethnic and religious identity categories may be fluid and change, especially in the case of con-

version for marriage. I discuss how religious conversion and ethnic “assimilation” might affect

my results in section 1.7.

I consider two main variables that can lead to intermarrying: education and urban residence.

The DHS include little retrospective information, so I cannot reconstruct the individual char-

acteristics at the time that the marriage started. Marriage decisions are taken based on the

characteristics of individuals but also on expectations, such as joining a spouse in the city or

being able to graduate high school. I use characteristics at survey date to proxy for past char-

acteristics and expectations: current characteristics do not perfectly correspond to past char-

acteristics but allow me to take into account (realized) expectations. I use information on the

highest completed level of education at survey date, which should be a good proxy of the level

of education at the time that the union was formed8. I use urban residence at the survey date.

Migration mostly takes place from rural to urban areas9: the current place of residence captures

some unobserved characteristics of individuals that might be correlated to their propensity to

intermarry, such as one’s occupation. Using this variable thus results in overestimating the

relationship between urban residence and intermarriage.

8It is unlikely that women can stay in school after getting married. Considering only women in union who haveattended primary school, I find that, under the assumption that girls start school at age 8, 69% of women startedtheir first cohabitant union at least two years after completing their schooling, 21% around the same time as theycompleted their schooling, and 10% before that.

9Among women for whom I have information on childhood place of residence, 12.6% of women who live in arural area at the time of the survey grew up in an urban area, whereas 41.3% of women living in an urban area grewup in a rural area. As this sample consists of women belonging to earlier-born cohorts, these figures may be evenhigher for later-born cohorts.

1.3 Intermarriages and marriage markets: Preferences, norms, and diversity 17

1.3 Intermarriages and marriage markets: Preferences, norms, and

diversity

I provide in this section a framework for interpretation of the models and of the results pre-

sented in the paper.

Types of factors influencing intermarriage shares

In his seminal paper, Kalmijn (1998) distinguishes three factors that could explain the preva-

lence of intermarriages: individual preferences, diversity levels within (local) marriage mar-

kets, and the influence of norms and of third parties.

The individual preferences factor gathers all of the preferences that individuals have concerning

their matches on the marriage market. Two main characteristics of matches on the marriage

market are socio-economic resources and cultural resources: people are likely to want to marry

someone whose economic prospects are good and with whom they share values and prefer-

ences.

The diversity level factor encompasses all of the channels related to how diverse marriage mar-

kets are. Some societies are highly heterogeneous, and others are more homogeneous, for in-

stance if there is a majority group. Moreover, spatial segregation affects how diverse local

marriage markets are. Low levels of diversity are associated with low levels of intermarriages

(by sheer limitation due to the numbers of potential spouses from other groups).

The third parties/norms factor includes the channel of group identification, the one of group

sanctions, and in the case of the setting studied, the fact that members of one’s kin may be

directly responsible for choosing one’s spouse. Field studies have shown transitions from kin-

selected to self-selected marriages, for instance, Bertrand-Dansereau and Clark (2016) (Malawi)

and Clark et al. (2010) (Kenya): elders and parents are less involved in the matching process.

Third parties influence is likely to work against intermarriages (Sporlein et al., 2014).

DHS variables and the Kalmijn framework

As indicated in section 1.2, I use two variables that are likely to influence the likelihood of

intermarriage: education and urban residence. These variables capture aspects of the types of

factors listed above.

Education could affect individual preferences through several channels. Education, especially


secondary and higher, is in many countries conducted in a vehicular language, thus helping

to remove language barriers in marriage markets. Additionally, by transmitting a common

culture, education could switch preferences away from group identification and towards a na-

tional identification. Moreover, higher education takes place in (mixed) urban settings (diver-

sity level factor). Educated women might have more of a say in the choice of their spouse: third

parties may be less involved in the matching process.

Urban areas are on average more mixed than rural ones: diversity levels are likely to be higher

in cities than in the countryside, and marriage markets are likely to be less segregated. Social

norms may be different in cities, and more accepting of intermarriages.

How did marriage markets change?

Figure 1.1: Changes over time: Education, Urban residence, Diversity levels

0

10

20

30

40

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Primary only Secondary/HigherLiving in urban area

0

10

20

30

40

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Religious fractionalizationShare other religion

Sample & data: Women in union, pooled sample. 95% confidence intervals included (except on the measure of reli-gious fractionalization).Left panel: Share of women living in urban area, share of women whose highest completed education level is pri-mary/secondary school. The share of women living in urban areas reflects current place of residence: the magnitudeof the change would be higher using information on type of place of birth.Right panel: Weighted average of religious fractionalization at country-level and share of women belonging to thegroup “other (faith)”

My main sample is made up of women born between 1955 and 1989. During this period, several

changes linked to the variables listed above took place. Figure 1.1 shows a visual representation

of the changes that I can observe in the data. Its left panel shows that education levels as well

as urbanization increased: these changes could lead to higher rates of intermarriages for later-

born cohorts than for earlier born cohorts.

Moreover, another key change took place: the decrease in the religious diversity of the pop-

ulation. The right panel of Figure 1.1 shows the decrease in the share of people identifying

with faiths other than Islam and Christianity and the associated decrease in religious diver-

1.4 Empirical strategy 19

sity. Under the assumption that people meet on a national marriage market and that no other

factors affected interfaith marriages, the decrease in religious diversity will mechanically re-

sult in lower interfaith marriage shares. Under the same assumptions, the share of interethnic

marriages should remain stable, as there is no sizable change in ethnic diversity levels10.

1.4 Empirical strategy

In this section, I present measures of ethnic and religious diversity and introduce the specifica-

tions used to measure time trends.

1.4.1 Descriptive statistics: Comparisons across countries and across identity cate-

gories

Cross-country comparisons must be done carefully in order to be meaningful: if intermarriage

shares are low in a given country, it could be that, according to the classification used, there is

not much diversity in this country to begin with. As discussed in section 1.2 in the case of Pente-

costal churches, the classification chosen in each survey cannot be considered as encompassing

the same reality across countries and identity categories. To neutralize this classification effect,

I compute the share of intermarriages that we would observe if individuals were matched ran-

domly in a national marriage market. This share measures how diverse each country is. It is

equivalent to computing an ethnic fractionalization index (EF) and a religious fractionalization

index (RF) for each (national) marriage market. More formally, this fractionalization measure

(Fc) corresponds to:

Fc = 1 −n∑

i=1pwi ∗ pmi

In this equation, c denotes countries. n is the number of ethnic (respectively religious) groups in

the survey. Subscripts w denote women, subscripts m denote men. pwi is the share of married

women who belong to the group i, pmi the share of married men who belong to the group

i. I assume that a person’s decision to marry does not depend on the ethnic and religious

identity of her/his matches. There are no additional entries on or exits from the marriage

market compared to what we observe in the data: polygamous men are thus counted as many

times as the number of their cohabiting wives.

Fractionalization indices do not include any information on how (dis)similar groups are11: they

10A more thorough discussion of ethnic diversity at the country-level and interethnic marriage patterns can befound in the online Appendix.

11Since the seminal paper by Greenberg (1956), adjusting measures of fractionalization by a measure of similarity


are useful to identify changes at the extensive margin but do not allow for identification of

changes at the intensive margin.

In the case of ethnicity, I account for these differences by using linguistic distance measures

(detailed methodology in Appendix A-1.2.). Computing the country-specific random linguistic

distance (dc)is done according to the formula below:

dc =n∑

i=1

n∑j=1

pwi ∗ pmj ∗ dij

where dij is the linguistic distance between group i and group j, pwi the share of married

women who belong to the group i, and pmj the share of married men who belong to the group

j. The linguistic distance dii is set to 0. The linguistic distances dzi and diz , where z is the group

“other (ethnicity)”, are set to the median value of the linguistic distance between pairs of ethnic

groups listed in country c.

In the case of religious differences, there is no standard way to quantify the differences be-

tween groups. I study separately interfaith marriages – mixed marriages between Muslims,

Christians, and “other (faith)” – and Christian-Muslim marriages, the latter arguably being the

most distant kind of intermarriage along the religious dimension in sub-Saharan Africa.

1.4.2 Assessing time trends

I design a simple additive model to estimate trends on intermarriages. My baseline specifica-

tion is a linear probability model12 that I run on the pooled sample (including country fixed-

effects) and at the country level. I present below the specifications used at the country level. I

specify the model at the woman level13: individuals (w) are women. This decision is due to the

fact that 17% of women appear with at least one co-wife in the dataset: a woman appears only

one time in the dataset, but a man appears matched with all of his cohabiting wives.

between groups has been suggested to take into account the depth of cleavages: for instance, some ethnic groupsspeak the same language (e.g. in Zambia Posner (2005)) while others are very distant according to the linguistictree (e.g. the Somali and the Kikuyu, in Kenya). These differences are not taken into account by fractionalizationmeasures: the distance between all groups is set to be the same.

12The results do not change when using a logit model. Results available from the author.13The results do not change if I study intermarriage by collapsing the dataset to keep one observation per man.

In this case, I consider a man to be in an interethnic (interfaith) union if at least one of his wives is not a member ofhis ethnic (religious) group. Results available from the author.

1.4.2 Assessing time trends 21

Model 1 : Time trends

Intermarriagew = α+ β1BirthY earw + β2Agew + β3Age2

w + εw (1.1)

Intermarriagew is an indicator variable that equals 100 if the union is interethnic (interfaith),

and 0 otherwise14. I consider unions to be interethnic (interfaith) if spouses do not belong to

the same group. When both spouses belong to the group “other,” I consider them to be in an

intraethnic (intrafaith) union. In Section 1.7, I test whether the results are robust to considering

these unions as interethnic (interfaith).

BirthY earw is a continuous variable defined as the year of birth of each woman. It is the main

variable of interest: if the coefficient associated to it is positive, it means that the share of in-

termarriages has increased over time. I use birth year rather than year of first cohabitation to

capture time trends for two reasons. First, birth year is available for all women, while using

cohabitation year would restrict the sample to women in their first union, as I only have in-

formation on the year of first cohabitation. Second, age at marriage or year of cohabitation is

endogenous to educational achievements and to the type of place of residence, while year of

birth is more exogenous to these individual characteristics. I compare birth year to cohabitation

year as a robustness check in Section 1.7. Agew is the age at survey date.

Age and birth year effects

Figure 1.2: Survey effect: Marital status and age at survey date

15

20

25

30

35

40

45

50

1955 1965 1975 1985Birth year

Age Maximum age Minimum age

0

10

20

30

40

50

60

70

80

90

100

1955 1965 1975 1985Birth year

Share married women Share remarried women

Left panel: The sample includes only women in union at the time of the survey. 95% confidence intervals.The “maximum age” is the age of the oldest woman surveyed for each birth year. The “minimum age”is the age of the youngest woman surveyed for each birth year. These ages depend on the timing ofsurveys within each country. Women aged 15 to 49 are surveyed in DHS, hence the flat lines at thesetwo points.Right panel: These shares are computed using all women surveyed. 95% confidence intervals.“Married women” are the women in union at the time of the survey, not the ever-married women.

14The indicator variable is set to 100 so that coefficients can be read as changes in percentage points.


I add quadratic controls for age in the model to control for age effects. This ensures that the

patterns that I identify in the data are due to change across cohorts and not to age effects15. I

use surveys implemented from 1992 to 2018: women born in 1955 were older than 35 in the

first DHS survey of each country; women born in 1985 were younger than 35. As shown in

Figure 1.2, the timing of survey waves can be seen in the age composition of cohorts: earlier-

born cohorts are older at survey date than later-born cohorts. Whether a woman is married or

not is a function of age: differences in age composition of cohorts (left panel) are mirrored by

differences in the share of married and remarried women by cohort (right panel). As women in

earlier-born cohorts are older at the survey date, they are more likely to have married and more

likely to have remarried, either after a divorce or being widowed. The same characteristics

are likely to drive both the type of marital status that I observe (married/remarried/never

married) and the type of marital outcome that I observe (intermarried or not). For instance,

if women who marry young are more likely to marry within their group, then, without age

controls, I would estimate time trends that are due to the fact that cohorts differ with respect to

their age composition.

Model 2: Time trends with controls

Intermarriagew = α+ β1BirthY earw + β2Agew + β3Age2

w

+β4Primaryw + β5Secondaryw + β6Urbanw + β7 ∗Remarriedw + εw

(1.2)

I introduce additional variables in the model to test whether they explain changes in inter-

marriage shares. I assume these variables have a constant effect over time. I add dummies

for the highest education level: Primaryw and Secondaryw, the reference category being “no

education”16. Urbanw is a dummy that takes the value 1 if the respondent lives in an urban

area. Moreover, to control further for cohort composition effects, I add a dummy variable,

Remarriedw, which takes the value 1 if the respondent has remarried. I discuss alternative

ways to measure the impact of remarriage in section 1.7.

Throughout the paper, I compare the coefficient associated with birth year between the spec-

ification 1.1 and the specification 1.2. The birth year coefficient in specification 1.1 measures

time trends. The birth year coefficient in specification 1.2 measures time trends that cannot be

15This is possible as my main sample is made of up of countries for which I have at least two survey waves. Ihence observe birth cohorts at different ages. Thus, 82.7% of women belong to birth cohorts that were sampled atleast twice in their country. Given the quadratic function that I use to estimate age effects, I do not need all of thecohorts to have been sampled twice to estimate age effects and birth years effects separately.

16DHS classification distinguishes between secondary and higher education. Only 2.3% of married women in mysample completed university, so I aggregate secondary education and higher education into a single category.

1.5 Results on pooled sample 23

explained by changes in education levels, in urbanization, and in cohort composition due to

remarriage. As such, it could capture changes in preferences and in social norms. However, it

should be noted that that several other variables may contribute to individuals’ likelihood to

intermarry, such as parental education, whether one’s parents intermarried, or whether both

parents are still alive at the time of the marriage decision: the coefficient in specification 1.2

may also capture some of these omitted factors.

1.5 Results on pooled sample

This section presents the results on the pooled sample. The country-specific results are detailed

in section 1.6.

1.5.1 Descriptive statistics

Table 1.1: Average intermarriage shares and linguistic distance

Interethnic marriages (%) Linguistic distance (nodes)Observed Random Ratio N Observed Random Ratio N

20.4 80.0 25.5 97111 3.29 3.25 1.01 21704

Interfaith marriages (%) Muslim/Christian marriages (%)Observed Random Ratio Na Observed Random Ratio N

9.7 33.8 28.7 96549 2.4 21.7 11.2 83291Data & sample: Women in union, pooled sample. Weighted data.Interethnic, interfaith, and Muslim-Christian marriage shares: The observed share corresponds to the share observedin the population. The random share corresponds to the share that we would observe if people currently inunion had matched at random, under two assumptions. First, there is no exit or entry into the marriagemarket compared to what we observe. Second, polygamy decisions are independent from women’s ethnicityand religion: Polygamous men appear on the random market the same number of time as in the observedmarket. Random shares are computed for each country, considering a national marriage market. The randomshare for the pooled sample is the weighted average of those national random shares. The ratio is computedas the ratio of observed share to random share.Linguistic distance: The random and observed linguistic distances are computed considering a national mar-riage market, using information on interethnically married couples. Linguistic distance between two spouseswhen only one of them belongs to the group “other (ethnicity)” is set to the country-specific median linguisticdistance, computed on distances for all pairs of ethnolinguistic groups.Muslim-Christian marriage shares (observed and random) are computed using only marriages in which neitherspouse is member of the group “other (faith)”.

a One survey wave in Senegal does not include a question on religion.

Table 1.1 displays the estimations of observed intermarriage shares and contrasts them with

the intermarriage shares that we would have observed under random matching. Interethnic

unions are on average more frequent than interfaith unions: 20.4% of women are married to a

man who is not from the same ethnic group as them, and 9.7% of women are married to a man

who is not a member of the same religious group as them. However, the number of categories

and the level of diversity differ depending on whether we consider ethnicity or faith: under

random matching, we would observe around 80% of interethnic marriages and around 33.8%

of interfaith marriages. When we look at the ratio of the observed share of intermarriages to


the random share of intermarriages, interfaith marriages and interethnic marriages are roughly

as common: between 25% and 30% of the random share of intermarriages is realized.

I find that interethnic unions take place at a linguistic distance that is similar to what we would

observe under random matching. In contrast with the fact that 28% of interfaith unions are

realized, Muslim-Christian marriages are rare17: they make up 2.4% of marriages when con-

sidering only Muslim and Christian respondents, and 2.1% otherwise. It is 11.2% of what we

would observe under random matching. Most interfaith unions hence involve a spouse who

identifies as Muslim or Christian and a spouse who belongs to the group “other (faith)”. Indi-

viduals who are neither Muslim nor Christian are more likely to be in an interfaith union than

Muslims and Christians. Couples that include at least one follower of “other (faith)” make up

14% of the sample, but 79% of interfaith couples, most of them taking place between a Christian

spouse and an “other (faith)” spouse. The high propensity of “other (faith)” members to inter-

marry is consistent with the fact that traditional religions are more tolerant of intermarriages.

It is also likely that the conversion process from a traditional religion to Islam or to a Christian

denomination might not concern both spouses at the same time.

1.5.2 Time trends

Assessing time trends

Figure 1.3 shows the shares of each type of intermarriage over birth cohort of women, thus

providing visual evidence on the magnitude of changes.

1766.5% of Muslim-Christian unions are unions between a Muslim man and a Christian woman. Most Islamicscholars consider that it is forbidden for Muslim women to marry non-Muslim men, but Muslim men can marrywomen who belong to other monotheist religions. Hence, this imbalance in the types of Muslim-Christian unionsindicates that such unions, while rare, are not only counted due to measurement errors.

1.5.2 Time trends 25

Figure 1.3: Intermarriage shares on pooled sample

0

5

10

15

20

25

30

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Interethnic marriages Interfaith marriagesMuslim-Christian marriages

Sample & data: Women in union, pooled sample. 95% confidence inter-vals included.Observed share of intermarriages by birth cohort of women.

Table 1.2 shows time trends for the three kinds of intermarriages studied: interethnic, interfaith,

and Muslim-Christian marriages. Interethnic marriages are more frequent for women in later-

born cohorts than women in earlier-born cohorts: in column (1), the coefficient for birth year is

positive and significant. After controlling for education, urbanization, and remarriage (column

(2)), the magnitude of the coefficient decreases, but the coefficient itself remains positive and

significant. This result indicates that other factors, possibly changes in norms and preferences,

contributed to the increase in the share of interethnic marriages.

Interfaith marriages decreased over time (column (3)) as expected from the decrease in religious

diversity. The share of Muslim-Christian marriages (column (5)) did not vary over time: it

is likely that the norms and preferences concerning interfaith marriages did not change over

the period. Norms concerning religious intermarriages remaining the same is consistent with

the fact that the decrease in interfaith marriages is due to the decrease in religious diversity

documented in section 1.3 rather than to norms shifting towards less tolerance of interfaith

marriages.

Extrapolating the estimations from the model over 25 years (1960-1985), the model (columns

(1) and (3)) estimates an increase of 6.6 percentage points in interethnic marriage shares and

a decrease in interfaith marriage shares by 4.5 percentage points. Once education and urban

residence are controlled for (column (2)), the magnitude of the increase in interethnic marriage

shares is 4.2 percentage points. The fact that women in later-born cohorts are more educated

and more likely to live in urban areas than women in earlier-born cohorts explains one-third of

the trend on interethnic marriages. Once education and urban residence are controlled for (col-


Table 1.2: Trends - Intermarriage shares

(1) (2) (3) (4) (5) (6)Dependent variable Interethnic marriage Interfaith marriage Muslim-Christian marriage

Birth year 0.265*** 0.168*** -0.179*** -0.149*** 0.0143 0.00461(0.0421) (0.0403) (0.0289) (0.0287) (0.0157) (0.0154)

Age 0.428** -0.152 -0.306** -0.305** -0.0307 -0.120(0.199) (0.197) (0.141) (0.141) (0.0777) (0.0770)

Age squared -0.00515* 0.00202 0.00131 0.00134 0.000225 0.00134(0.00294) (0.00292) (0.00221) (0.00220) (0.00117) (0.00115)

Primary 2.925*** 0.568 0.568**(0.637) (0.457) (0.222)

Secondary/Higher 7.914*** -1.512*** 1.339***(0.787) (0.561) (0.306)

Urban 13.12*** -2.523*** 1.141***(0.661) (0.414) (0.232)

Remarried 6.443*** 3.808*** 1.650***(0.643) (0.502) (0.285)

ControlsCountry-fixed effects X X X X X X

Observationsa 97111 97111 96549 96549 96549 96549R-squared 0.247 0.272 0.134 0.139 0.029 0.033Mean dependent variable 20.4 9.7 2.1Data & sample: Women in union, pooled sample. Weighted data.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. As the specification includes country-fixed effects, thereis no constant in the model.Columns (1) and (2): the dependent variable is a variable that equals 0 if the union is an intraethnic one, 100 if it is an interethnic one.Columns (3) and (4): the dependent variable is a variable that equals 0 if the union is an intrafaith one, 100 if it is an interfaith one. Three faithgroups are defined: “Muslim”, “Christian”, and “other (faith)”.Columns (5) and (6): the dependent variable is a variable that equals 100 if one spouse is Christian and the other one Muslim, and 0 otherwise.Results for all columns should be read as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

a One survey wave in Senegal does not include a question on religion.

umn (4)), the magnitude of the increase in interfaith marriage shares is 3.75 percentage points:

these variables explain little of the trend observed, corroborating the idea that the decrease in

interfaith marriages is mainly due to declining levels of religious diversity.

Assessing individual characteristics

Looking at interethnic marriages and at Muslim-Christian marriages, the coefficients on educa-

tion and on urban residence are consistent with what we would expect from variables capturing

parts of the individual preferences factor and of the diversity level factor. Completion of primary

school rather than having no education is associated with a higher likelihood of intermarrying.

Completion of secondary school is associated with an even higher likelihood. Urban residence

is also associated with an increase in the likelihood of intermarriage 18. In the case of interfaith

marriages, secondary education and urban residence are negatively correlated to the likelihood

of being in an interfaith marriage, but these variables capture the likelihood of belonging to the

18Using a sample of earlier-born cohorts for which I have information on childhood place of residence, I run thespecifications from Table 1.2. The coefficient estimated for the current place of residence is slightly higher than thecoefficient for the childhood place of residence.

1.6 Results at country-level 27

group “other (faith)”. Members of this group are often followers of traditional religions, and

attendance in school and urban residence are negatively correlated with the likelihood of being

a member of this group19.

Remarried women are more likely to be married outside of their group than women who are

still in their first union, whatever the kind of intermarriage considered. Social norms may

be different for women who marry for the first time and for women who remarry, as women

have more freedom in choosing a spouse when they have already been married (Locoh and

Thiriat, 1995). Similarly, earlier-born women may remarry under the same set of (more ac-

cepting) norms as later-born women who enter their first union. Last, women who remarried

may have different (unobserved) characteristics that also lead them to marry outside of their

group, whether in their first union or in the subsequent ones. I discuss these hypotheses about

remarriage in section 1.7.

1.6 Results at country-level

This section presents the descriptive statistics and results on time trends at the country level.

For brevity, tables include only the coefficient associated with the variableBirthY ear and show

results for the two main specifications (with and without controls). The full results at the coun-

try level and the results with control variables introduced one by one are available in the online

Appendix.

1976.6% of women who belong to the group “other (faith)” did not complete primary school while 44.4% ofMuslim and Christian women did not complete primary school. 12.8% of women who belong to the group “other(faith)” live in an urban area while 28.5% of Muslim and Christian women live in an urban area.


1.6.1 Descriptive statistics

Figure 1.4: Shares of interethnic marriages

Share(70,80] (0)(60,70] (0)(50,60] (0)(40,50] (2)(30,40] (3)(20,30] (7)(10,20] (9)[0,10] (4)No data (26)

Share(70,80] (0)(60,70] (0)(50,60] (2)(40,50] (2)(30,40] (4)(20,30] (8)(10,20] (8)[0,10] (1)No data (26)

Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.Left panel: Share of interethnic marriages. Right panel: Ratio of observed share to random share of interethnicmarriages. Higher ratios mean that the share of interethnic marriages is closer to what would be observed underrandom matching.The corresponding data can be found in the online Appendix.

The maps in figure 1.4 show the observed share of interethnic marriages and the ratio of the

observed to random share of interethnic marriages. Striking differences between countries ap-

pear. In Congo-Brazzaville and in Zambia, more than 40% of married women are in an intereth-

nic marriage, whereas this share is lower than 10% in DRC, Kenya, Namibia, and Nigeria. The

observed share and the ratio of observed to random shares are similar. This is because coun-

tries have high random shares of interethnic marriages: there are only two countries where this

random share is lower than 75%.

1.6.1 Descriptive statistics 29

Figure 1.5: Shares of interfaith marriages

Share(70,80] (0)(60,70] (0)(50,60] (0)(40,50] (0)(30,40] (0)(20,30] (5)(10,20] (6)[0,10] (14)No data (26)

Share(70,80] (2)(60,70] (3)(50,60] (0)(40,50] (4)(30,40] (5)(20,30] (7)(10,20] (2)[0,10] (2)No data (26)

Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.Left panel: Share of interfaith marriages. Right panel: Ratio of observed share to random share of interfaith mar-riages. Higher ratios mean that the share of interfaith marriages is closer to what would be observed under randommatching.The corresponding data can be found in the online Appendix.

The maps in figure 1.5 show the observed share of interfaith marriages and the ratio of the

observed to random share of interfaith marriages. In stark contrast to interethnic marriage pat-

terns, the share of interfaith marriages is low. The highest share of interfaith marriages is 29.6%

(Congo-Brazzaville), while the highest share of interethnic marriages is over 40%. However,

the level of religious fractionalization is much lower than the level of ethnic fractionalization,

hence the much darker shades of the map on the right panel. Countries are also more het-

erogeneous with respect religious fractionalization, which ranges from 3.6% (Niger) to 64.1%

(Benin).

On the pooled sample, ratios of observed to random shares are similar for interethnic and in-

terfaith marriages shares, but it is not the case when looking at countries separately. Notably,

the distribution of this ratio is wider when looking at interfaith marriages rather than at in-

terethnic marriages: there is no country for which this ratio is higher than 60% when looking at

interethnic marriages, but it is higher than 60% for interfaith marriages in Congo-Brazzaville,

Gabon, Namibia, Niger, and Zambia.


1.6.2 Time trends on interethnic marriages

Extensive margin

Time trends

Figure 1.6: Observed interethnic marriage shares over birth cohorts

0

5

10

15

20

25

30

35

40

45

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Birth cohort

MaliUgandaSenegalGhanaCIGuineaTogoBeninKenya

0

5

10

15

20

25

30

35

40

45

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Birth cohort

ZambiaGabonMalawiCameroonNigerBF

Sample & data: Women currently in union, weighted DHS data at country level.Panel A (left): Countries for which the trend on interethnic marriages is significantly different from 0.Panel B (right): Countries for which the trend on interethnic marriages is not significantly different from 0.Countries are sorted into these two panels according to regression results from Table 1.3. Countries appear in thelegend in descending order with respect to the share of interethnic marriages in the 1985-1889 cohort.BF: Burkina Faso; CI: Cote d’Ivoire.

Figure 1.6 presents a visual representation of the changes in the share of interethnic marriages

over time. Panel A shows that trajectories of countries where interethnic marriages became

more frequent look similar. When looking at panel B, we notice that out of six countries where

interethnic marriage shares did not increase, three – Zambia, Gabon, Malawi – already had

shares of interethnic marriages higher than 25%. Two countries, Burkina Faso and Niger, are

the only countries in the sample where the ethnic fractionalization index is lower than 70%.

They both have huge majority groups – the Mossi in Burkina Faso, the Hausa in Niger – which

may mean that the context in which unions take place in these two countries is different from

what happens in countries where there is no majority group in the demographic sense. The

exception is Cameroon: it has a positive but not significant increase in interethnic marriage

shares20, while having the same share of interethnic marriages as the average on the pooled

sample, and having no majority group.

Turning to regression analysis, Table 1.3 lists the coefficient associated with birth year for two

sets of regressions, without and with the following controls: education, urban place of resi-

dence, and remarriage. The share of interethnic marriages significantly increased over time in

Benin, Cote d’Ivoire, Ghana, Guinea, Kenya, Mali, Senegal, Togo, and Uganda (Panel A). In

20While there seems to be a trend for Cameroon on Figure 1.6, the trend is insignificant when age controls areadded.

1.6.2 Time trends on interethnic marriages 31

Table 1.3: Trend - Observed interethnic marriage shares

(1) (2) (3) (4)Dependent variable Interethnic marriage Mean N

Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage sharesBenin 0.242*** 0.176** 15.1 10977

(0.0741) (0.0727)Cote d’Ivoire 0.254* 0.262* 19.2 2677

(0.136) (0.134)Ghana 0.262* 0.192 19.4 6487

(0.140) (0.140)Guinea 0.728*** 0.694*** 14.0 4732

(0.200) (0.203)Kenya 0.257*** 0.0820 10.5 9169

(0.0624) (0.0538)Mali 0.253* 0.305** 30.6 8499

(0.130) (0.129)Senegal 0.326*** 0.177* 23.5 8339

(0.0914) (0.0913)Togo 0.338*** 0.164 14.4 3701

(0.113) (0.114)Uganda 0.455*** 0.413*** 24.3 2465

(0.135) (0.123)Panel B: No change in interethnic marriage sharesBurkina Faso 0.00631 -0.0680 10.4 9170

(0.0856) (0.0849)Cameroon 0.545 0.215 20.5 3066

(0.332) (0.317)Gabon 0.364 0.410 38.0 2274

(0.289) (0.278)Malawi 0.00730 -0.156 31.8 9241

(0.121) (0.120)Niger -0.154 -0.114 12.7 5603

(0.120) (0.123)Zambia 0.0675 0.0821 46.0 10711

(0.132) (0.125)

ControlsAge & Age2 X XEducation XUrban XRemarried X

Sample & data: Women currently in union, weighted DHS data at country level. Specification: OLSregressions run separately for the 15 countries of the sample. Standard errors are clustered at theDHS-cluster level. The dependent variable is a variable that equals 0 is the union is intraethnic,100 if the union is interethnic.Columns (1) and (2) report the coefficient associated to the birth year variable. Each cell corre-sponds to a separate regression. Column (3) reports the mean number of interethnic marriages inthe regression sample. Column (4) reports the number of observations for each country.Results in columns (1) and (2) can be interpreted as changes in percentage points.

Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.


terms of magnitudes, estimates of a 0.3 increase in percentage points by birth year translate

into an increase of 7.5 percentage points when extrapolating over 25 years. Once I control for

the individual characteristics correlated with interethnic marriages, the trends remain positive

and significant in six countries out of nine: Benin, Cote d’Ivoire, Guinea, Mali, Senegal, and

Uganda. In Ghana, the coefficient turns insignificant when introducing either education or ur-

ban residence to the model. In Kenya, the coefficient drops when introducing the type of place

of residence to the model but only loses significance when all of the variables are introduced

jointly. In the case of Togo, adding education levels to the model explains away the trend.

Looking at Panel B, the introduction of control variables does not change the results.

Individual characteristics

Countries are strikingly similar with respect to correlates of interethnic marriages, and there

are no differences between countries where interethnic marriage shares increased and countries

where they did not. The results are consistent with what is found in the pooled sample: primary

education, secondary education, urban residence, and remarriage are all positively correlated

to the likelihood of being in an interethnic union. The two exceptions are Uganda, where

women who attended primary school are less likely than their uneducated counterparts to

marry outside of their ethnic group – when urban residence is controlled for –, and Gabon,

where urban residence is uncorrelated to interethnic marriage. The share of Gabonese women

living in an urban areas is above 80%, while the share of women living in urban areas is lower

than 45% in all of the other countries: urbanization might stop being a mixing factor once

urbanization levels are high.

Intensive margin: Linguistic distance

Descriptive statistics and time trends

Table 1.4 shows results of the regression of the linguistic distance (conditional on being in an

interethnic union) on birth year, and on both sets of controls. Changes at the extensive margin

do not necessarily correspond to changes at the intensive margin. The linguistic distance of

interethnic marriages increased in three countries, two countries where interethnic marriages

became more frequent – Benin and Togo – and one – Cameroon – where interethnic marriages

did not increase. The linguistic distance decreased in Cote d’Ivoire, Kenya, and Senegal, all

countries where the share of interethnic marriage increased. The linguistic distance did not

change in the nine other countries of the sample. Introducing individual characteristics in

the model changes the results only in Uganda, where the trend turns negative. The ratio of the

1.6.2 Time trends on interethnic marriages 33

Table 1.4: Trend - Linguistic distance between spouses

(1) (2) (3) (4) (5)Dependent variable Linguistic distance Mean Ratio N

Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage shares

Increase in linguistic distanceBenin 0.0151** 0.0145** 3.4 0.8 1650

(0.00692) (0.00686)Togo 0.0155** 0.0148* 3.4 1.5 459

(0.00746) (0.00790)No change in linguistic distance

Ghana 0.0124 0.0102 3.2 1.0 1238(0.00841) (0.00881)

Guinea 0.0230 0.0151 5.9 1.4 673(0.0199) (0.0209)

Mali -0.00813 -0.0105 6.6 1.3 2568(0.00819) (0.00812)

Uganda -0.0170 -0.0209* 1.7 0.7 605(0.0117) (0.0126)

Decrease in linguistic distanceCote d’Ivoire -0.0197** -0.0196** 4.0 1.0 515

(0.00941) (0.00876)Kenya -0.0577*** -0.0460** 3.9 1.2 922

(0.0205) (0.0199)Senegal -0.0277** -0.0283** 3.5 0.8 1813

(0.0126) (0.0123)Panel B: No change in interethnic marriage shares

Increase in linguistic distanceCameroon 0.0779*** 0.0566*** 3.0 0.7 653

(0.0225) (0.0198)No change in linguistic distance

Burkina Faso -0.0169 -0.0125 4.9 1.2 995(0.0111) (0.0112)

Gabon 0.00515 0.00530 1.4 0.6 923(0.00538) (0.00542)

Malawi 0.00144 0.000873 1.8 1.1 3023(0.00181) (0.00182)

Niger 0.0108 0.0120 4.4 1.3 794(0.0102) (0.0102)

Zambia -0.00172 -0.00172 1.6 0.9 4873(0.00269) (0.00261)

ControlsAge & Age2 X XEducation XUrban XRemarried X

Sample & data: Women currently in an interethnic union, weighted DHS data atcountry level. Specification: OLS regression run separately for the 15 countriesof the sample. Standard errors are clustered at the DHS-cluster level. Depen-dent variable is the linguistic distance (measure defined in Appendix A-1.2.)associated to each interethnic union.Columns (1) and (2) report the coefficient associated to the birth year variable.Each cell corresponds to a separate regression. Column (3) reports the meanlinguistic distance for intermarried couples. Column (4) reports the ratio of themean observed linguistic distance to the random linguistic distance, computedby randomly matching individuals who married outside of their ethnic group.Column (5) reports the number of observations for each country.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

observed to random linguistic distance (column (4)) is close to one for all countries: conditional

to being in an interethnic union, most of the linguistic distance is realized. Moreover, in seven

countries of the sample, this ratio is larger than one, indicating that interethnic marriages are

more distant than they would be if they were formed at random (considering only intermarried

people). There is only one country, Senegal, where this ratio is lower than one and where the

linguistic distance of interethnic marriages decreased.


The evolution of linguistic distances depends on the type of interethnic marriages that are

observed in the earlier-born cohorts. For instance, comparing the cases of Benin and Kenya

(linguistic trees for these two countries are depicted in Appendix A-1.2.), the average linguis-

tic distance of interethnic marriages decreased in Kenya and increased in Benin. In Benin,

as interethnic marriages became more frequent, all groups started intermarrying more, thus

resulting in a decrease of the share of Adja-Fon unions, whose distance is one node, and an in-

crease in the share of unions with a distance larger than four nodes (e.g. Yoruba-Peulh unions).

Increasing linguistic distances indicate that women in later-born cohorts marry further away

from their group, and hence that some ethnic cleavages may have lost salience. In Kenya, the

decrease in linguistic distance stems mostly from the fact that in earlier-born cohorts there are

more interethnic couples in which at least one spouse is Kalenjin or Luo – the only two groups

that belong to the Nilo-Saharan branch – than in later-born cohorts21. As interethnic marriages

became more common, the share of such unions among intermarried people decreased, result-

ing in the decrease of the average linguistic distance of interethnic unions. This result is still

consistent with the fact that some ethnic barriers – not captured by linguistic distances – are

becoming less salient.


Correlates of the linguistic distance of interethnic marriages are not the same as correlates of

interethnic marriages: primary and secondary education are negatively correlated to the lin-

guistic distance of marriage in most countries. Such a reversal between the extensive and the

intensive margin can be explained by ‘over-selection’ of individuals. Higher education levels

are correlated with a higher likelihood of marrying outside of one’s ethnic group, so individu-

als who marry outside of their ethnic group and who have also not attended school are likely

to have unobserved characteristics, such as being strong-willed, that also make them marry

further away from their group or marry without any consideration of group differences.

1.6.3 Time trends on interfaith and Muslim-Christian marriages 35

Figure 1.7: Observed interfaith marriage shares over birth cohorts

0

5

10

15

20

25

30

35

40

45

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Birth cohort

GabonTogoGhanaBeninBFKenyaZambia

0

5

10

15

20

25

30

35

40

45

50

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Birth cohort

CICameroonMalawiMaliUgandaGuineaNigerSenegal

Sample & data: Women currently in union, weighted DHS data at country level.Panels A1 and B1 (left): Countries for which the trend on interfaith marriages is negative and significantly differentfrom 0.Panels A2, A3 and B2 (right): Countries for which the trend on interfaith marriages is not significantly different from0; Cameroon (A3) is the only country for which the share of interfaith marriages increased.Countries are sorted into two panels according to the regression results from Table 1.5. Countries appear in thelegend in descending order with respect to the share of interfaith marriages in the 1985-1889 cohort.BF: Burkina Faso; CI: Cote d’Ivoire.

1.6.3 Time trends on interfaith and Muslim-Christian marriages

Time trends

Figure 1.7 presents a visual representation of the change in interfaith marriage shares over

time. Comparing the two panels, it appears that countries seem to converge towards low levels

of interfaith marriages. Apart from Cameroon and Cote d’Ivoire, all of the countries where

interfaith marriages did not become less frequent are countries where the share of interfaith

marriages is lower than 10% for all of the cohorts.

Table 1.5 shows the coefficient associated with birth year for country-specific regressions of the

likelihood of being in an interfaith union on birth year and on the two sets of additional vari-

ables. Without individual controls other than age, the share of interfaith marriages increased

only in Cameroon. The share of interfaith marriages decreased in Benin, Burkina Faso, Gabon,

Ghana, Kenya, Togo, and Zambia. Controlling for education levels, urban place of residence,

and remarriage explains the trend in Benin, Gabon, and Togo. In Benin, the trend turns insignif-

icant when introducing an indicator variable for remarried status. In Togo, the introduction of

education variables as well as of urban residence explains the trend. In Gabon, it is the joint

effect of the three variables.

21Linguistic distances does not capture perfectly the cleavages between groups and, contrary to share of inter-marriages, is sensitive to extreme values. For instance, in Kenya, the distance between, on the one hand, the Luoand Kalenjin groups (Nilo-Saharan branch) and, on the other hand, all other ethnic groups but the Somali is high(over 7.5 nodes). Interethnic unions in Kenya have an average linguistic distance of 7.2 nodes if at least one of thespouses is Luo or Kalenjin, and a distance of 3.3 nodes otherwise. Luo-Kalenjin unions themselves make up 1.6%of interethnic unions in Kenya, despite the fact the linguistic distance of the pair is lower than with other groups,indicating that other factors than linguistic distance are also at play.


Table 1.5: Trend - Observed interfaith marriage shares

(1) (2) (3) (4) (5) (6) (7)Dependent variable Interfaith marriage Interfaith marriage

Muslim/Christian/Others Mean Muslim-Christian Mean N

Birth year Each cell: coefficient from a separate regressioncoefficientPanel A: Increase in interethnic marriage shares

Panel A1: Decrease in interfaith marriage sharesBenin -0.150** -0.103 16.7 0.0430 0.0396 2.9 10977

(0.0754) (0.0753) (0.0332) (0.0341)Ghana -1.520*** -1.301*** 18.3 -0.167*** -0.172*** 2.8 6487

(0.133) (0.131) (0.0581) (0.0585)Kenya -0.189*** -0.123*** 6.4 0.0112 -0.00311 1.0 9169

(0.0445) (0.0451) (0.0174) (0.0173)Togo -0.214** -0.0945 18.9 0.0585** 0.0512* 1.2 3701

(0.109) (0.112) (0.0248) (0.0284)Panel A2: No change in interfaith marriage shares

Cote d’Ivoire 0.122 0.148 19.3 0.0544 0.0616 2.6 2677(0.134) (0.134) (0.0472) (0.0474)

Guinea 0.0423 0.0481 5.1 0.0167 0.00898 0.9 4732(0.122) (0.122) (0.0268) (0.0271)

Mali 0.00585 -0.000175 6.2 0.0198 0.0201 1.1 8499(0.0660) (0.0659) (0.0257) (0.0252)

Senegal -0.0399 -0.0469 1.9 -0.0222 -0.0281 1.3 7777(0.0594) (0.0585) (0.0516) (0.0500)

Uganda 0.0496 0.0140 5.7 0.0948* 0.0572 4.3 2465(0.0570) (0.0603) (0.0501) (0.0507)Panel A3: Increase in interfaith marriage shares

Cameroon 0.489** 0.523** 10.8 0.0439 0.0467 1.6 3066(0.233) (0.236) (0.0840) (0.0870)

Panel B: No change in interethnic marriage sharesPanel B1: Decrease in interfaith marriage shares

Burkina Faso -0.251*** -0.237*** 12.1 0.00294 -0.0213 3.6 9170(0.0898) (0.0894) (0.0447) (0.0451)

Gabon -0.570** -0.402 18.6 0.0729 0.0402 4.0 2274(0.239) (0.244) (0.118) (0.128)

Zambia -0.161*** -0.146*** 4.4 -0.0123 -0.0126 0.5 10711(0.0496) (0.0492) (0.0188) (0.0191)Panel B2: No change in interfaith marriage shares

Malawi -0.123 -0.0130 7.8 0.0250 0.0432 2.2 9241(0.0786) (0.0816) (0.0370) (0.0383)

Niger 0.0912 0.0962* 1.8 -0.00343 -0.00439 0.4 5603(0.0572) (0.0567) (0.0161) (0.0163)

ControlsAge & Age2 X X X XEducation X XUrban X XRemarried X X

Sample & data: Women currently in union, weighted DHS data at country level. Specification: OLS regres-sion run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-clusterlevel.Columns (1) to (2): Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith, considering three religious groups (Christians, Muslims, Other (faiths)). Columns (1)and (2) report the coefficient associated to the birth year variable. Each cell corresponds to a separateregression. Column (3) reports the mean number of interfaith marriages.Columns (4) to (5): Dependent variable is a variable that equals 0 is the union is a Muslim-Christianunion, 100 if the union is not. Columns (4) and (5) report the coefficient associated to the birth yearvariable. Each cell corresponds to a separate regression. Column (6) reports the mean number of Muslim-Christian marriages. Column (7) reports the number of observations for each country.Results in columns (1), (2), (4) and (5) can be interpreted as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Muslim-Christian union shares only changed in three countries, decreasing in Ghana and in-

creasing (although to a small extent) in Togo and Uganda. The coefficients are not significant

in all of the other countries, a finding that is consistent with the fact that there is no trend when

looking at the pooled sample (Table 1.2).

1.7 Robustness analysis 37

Crossing these two sets results with results on variation of the share of “other (faith)” over time

(the full results can be found in the online Appendix), I find that the share of “other (faith)”

decreased in all of the countries where the share of interfaith marriages decreased, with the

exception of Zambia. Such share increased only in Senegal and Niger, where there was no

decrease in the share of interfaith marriages. In keeping with results on the pooled sample,

decreasing interfaith marriage shares are likely to be driven by the decline in the share of the

group “other (faith)” and the resulting decrease in the level of religious diversity. However, at

the country-level, education and urbanization suffice to explain the trend in Benin, Gabon, and

Togo. In Ghana, where both interfaith marriages and Muslim-Christian marriages became less

common, it is likely that social norms or preferences are moving away from tolerating interfaith

marriages.


Countries are heterogeneous with respect to correlates of interfaith marriages, as education

levels and urban residence also capture the likelihood of being a member of “other (faith)” in

most countries. Consistent with results on the pooled sample, urban residence and secondary

education are negatively correlated to the likelihood of being in an interfaith union in most

countries, and signs on primary schooling differ across countries. Contrasting with these re-

sults, education levels and urban residence are either insignificant or positively correlated with

the likelihood of being in a Muslim-Christian marriage, thus mirroring results on individual

characteristics associated with marrying outside of one’s ethnic group.

1.7 Robustness analysis

I implement four robustness checks on my findings. First, I relax the assumption that a mar-

riage is intraethnic or intrafaith when both spouses belong to the group “other”. Second, I test

whether the results are robust to alternative assumptions on remarried women’s first unions

and whether the trends are also found when considering separately women in their first union

and remarried women. Third, using only women in their first union, I test whether “assim-

ilation” and conversion take place over the length of a marriage. Fourth, using only women

in their first union, I compare time trends measured using birth year and using cohabitation

year. Table 1.6 displays results from the main specification and from the regressions when vary-

ing the assumptions as mentioned above. For brevity, the results are presented for the pooled

sample. A full discussion of the country-level results can be found in the online Appendix.


1.7.1 Testing for heterogeneity in the “other” group

The categories “other ethnicity” and “other faith” are categories that are more heterogeneous

than other categories. In the main specification, I assume that when both spouses belong to

the group “other”, their union is in-group. Assuming that these unions are in fact out-group

unions (“other-other” assumption), more unions appear as intermarriages. Figure 1.8 shows the

comparison of intermarriage shares using the main assumption and using the “other-other”

assumption: more unions are now counted as interethnic and as interfaith, thus providing an

upper bound on the share of such unions.

In table 1.6, columns (1) and (2) are similar to columns (3) and (4): the results regarding the

pooled sample are robust to counting the “other”-“other” unions as inter-group unions. The

absolute magnitude of the coefficient is higher under the “other-other” assumption, especially for

interfaith marriages, which is consistent with the overall decrease in the share of “other (faith)”.

At the country level, the main results carry through. There is no country for which the share

of interethnic marriages decreased. The trend on interfaith marriages in Niger turns positive

and significant, which is consistent with the fact that the share of “other (faith)” increased over

time in this country.

Figure 1.8: Shares of interethnic/interfaith marriages over birth cohort

0

10

20

30

40

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Main Bound otherBound high Bound low

0

10

20

30

40

1955

-59

1960

-64

1965

-69

1970

-74

1975

-79

1980

-84

1985

-89

Main Bound otherBound high Bound low

Sample & data: Women in union, pooled sample. 95% confidence intervals included.Left panel: Interethnic marriages. Right panel: Interfaith marriages.Bound high: Shares under the assumption that all remarried women were in interethnic/interfaith first unions. Boundlow: Shares under the assumption that all remarried women were in intraethnic/intrafaith first unions. Boundother: Shares under the assumption that when both spouses belong to the group “other”, they are in an intereth-nic/interfaith union.

1.7.2 Testing the remarriage story

Women remarry either after a divorce or after being widowed. As such, remarriage is a function

of age – older women are more likely to be widows – and may also be a function of the charac-

1.7.2 Testing the remarriage story 39

teristics of a woman’s first union – intermarriages may be more likely to end by a divorce22.The

age controls do not capture the fact that some unions may be more likely to end than others:

cohort composition effects might bias the estimated time trends. Remarried women are more

likely to be in an interethnic (interfaith) marriage than women in their first marriage, but it

might be that they have already been in a (first) marriage which was interethnic (interfaith). To

understand better how remarriage patterns affect my results, I study bounds on the estimates

and use a sub-sample analysis.

First, I bound my estimates by making assumptions on first unions of remarried women. Fig-

ure 1.8 depicts how the different assumptions affect intermarriage shares. The “higher bound”

assumption assigns an interethnic union to all of the women who have remarried. The “lower

bound” assumption assigns an intraethnic union to all of the women who have remarried. Three

points must be noted on these assumptions. First, they are extreme assumptions: either 100%

or 0% of remarried women are assumed to have had an interethnic (interfaith) first marriage

while 25.6% of them are in an interethnic marriage at survey date and 13.1% in an interfaith

one. Second, the “higher bound” assumption on interethnic marriages is an extreme assumption

as the ratio of the share of remarried women who have married outside of their group to the

share of not-remarried women who have married outside of their group changes over birth

cohorts.Third, there is a trend on remarriage. When regressing the Remarried variable on the

year of birth, and quadratic age controls, there is a negative and significant trend: the share of

remarried women is higher in earlier-born cohorts than in later-born cohorts, even when age

is controlled for23. It means that the “higher bound” assumption works against finding a positive

trend and that the “lower bound” assumption works in favor of finding a positive trend.

Table 1.6 (columns (5) to (8)) shows the coefficient on birth year under these two assumptions.

Under the “lower bound” assumption, the trend on interethnic marriages remains positive and

significant. However, under the “higher bound” assumption, there is no trend on interethnic (first)

marriages, and a significant negative trend (of small magnitude) once education and urban res-

idence are controlled for. At the country-level, the time trends turn negative and significant in

Benin, Senegal, and Togo, resulting on the insignificant trend on the pooled sample. As the

trend on interethnic marriages is barely negative and significant under the “higher bound” as-

sumption that works in favor of finding a negative trend, it is extremely unlikely that interethnic

22Whether spouses belong to the same group might affect the likelihood of divorce. However, not all remarriedwomen have divorced, some were widowed. Interethnic unions are associated to lower age gaps between spouses,to a higher likelihood to live in an urban area, and to higher education levels: it is likely that intraethnic marriagesare more likely to be ended by the husband’s death than interethnic marriages are.

23Two factors are likely to explain this trend on the share of remarried women. First, widowhood being a lesscommon experience due to changes in life expectancy. Second, not remarrying may be an option that is accessibleto a higher share of later-born women than to their earlier-born counterparts.


first marriages became less frequent over time, and unlikely that their share remained constant.

The trend on interfaith marriages remains negative and significant under both assumptions.

Nonetheless, the magnitude of the coefficient drops under the “lower bound” assumption, as

expected. The results on interfaith marriages at the country-level are robust to these bounds.

Second, I test whether the trends that I observe come from remarried women or from women

in their first union (columns (9) to (11), and columns (13) and (14), Table 1.6). There are trends

in all of the sub-samples. Such trends are not found for all of the countries, a result that may

stem from differences between countries and sub-samples, or from the fact that the sample size

is too small in the remarried sub-sample. I find no sub-sample in which the trend on interethnic

marriages is negative and significant. Regarding interfaith marriages, the trend is positive and

significant only for remarried Nigerien women and for Cameroonian women.

Even if we believe that the results from the “higher bound” assumption on interethnic first mar-

riages are correct and hence that the share of interethnic first marriages did not change over

time, the results from the sub-sample analysis show that there is a positive and significant

trend when considering remarriages, even when education and urban residence are controlled

for. These results are consistent with the hypothesis that women who remarry have more of a

say on whom they marry, and that social norms around interethnic marriages have relaxed over

time. These results may also capture changes in the composition of the remarried sub-sample:

among remarried women, the share of widows is likely to be higher in earlier-born cohorts

than in later-born cohorts, the share of divorcees is likely to be lower in earlier-born cohorts

than in later-born cohorts. Divorced women might have different preferences from widows,

and they may also be more likely to choose to whom they remarry.

1.7.3 Testing the “assimilation”/conversion story

Older women have spent more time in a union than younger women: as spouses spend a longer

time in a union, their ethnic or religious identity may change. I cannot test for explanations

about conversion or “assimilation” that take place before cohabitation or marriage, but I can

test these two channels during the time in union. Exploiting the fact that there are at least

two survey waves for each country, I can study whether women who married for the first time

the same year and were born the same year are more (less) likely to report having the same

ethnic (religious) group as their husband when the length of union increases. However, the

identification ultimately rests on differences across survey waves, so this also captures any

effect linked to survey waves.

1.7.3 Testing the “assimilation”/conversion story 41

Table 1.6: Robustness checks on interethnic marriages and interfaith marriages – Pooledsample

Regression results - Dependent variable: Interethnic marriage - Each cell: birth year coefficient from a separate regression

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)Sample All married women First union RemarriedAssumptions Main Main Bound Other Higher bound Lower bound Main Main Main Main Main Main

Dependent variable: Interethnic marriageBirth year 0.265*** 0.168*** 0.296*** 0.228*** -0.00227 -0.0808* 0.268*** 0.157*** 0.263*** 0.135*** 0.212*** 0.353*** 0.302***

(0.0421) (0.0403) (0.0514) (0.0501) (0.0457) (0.0445) (0.0369) (0.0351) (0.0437) (0.0415) (0.0433) (0.101) (0.0981)Year of cohabitation 0.174***

(0.0357)Number of years since cohabitation -0.0597

(0.0458)Age at cohabitation 0.821***

(0.0799)Share interethnic marriages 20.4 27.9 32.5 16.2 19.4 25.6

Dependent variable: Interfaith marriageBirth year -0.179*** -0.149*** -0.402*** -0.299*** -0.354*** -0.304*** -0.0960*** -0.0789*** -0.146*** -0.123*** -0.146*** -0.273*** -0.268***

(0.0289) (0.0287) (0.0394) (0.0388) (0.0416) (0.0413) (0.0242) (0.0241) (0.0293) (0.0292) (0.0292) (0.0776) (0.0778)Year of cohabitation -0.0796***

(0.0252)Number of years since cohabitation -0.221***

(0.0329)Age at cohabitation -0.258***

(0.0545)Share interfaith marriages 9.7 15.5 23.8 7.6 9.1 13.1

Country fixed-effects X X X X X X X X X X X X X XAge & Age2 X X X X X X X X X X X X XEducation & Urban X X X X X X XRemarried X

Data & sample: Women in union, pooled sample. Weighted data.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. As the specification includes country-fixed effects, there is no constant in the model. Dependent variable is a variable thatequals 0 is the union is intraethnic (intrafaith), 100 if the union is interethnic (interfaith). Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification,but on two sub-samples in columns (9) to (14).Columns (1) to (10) and (13) and (14) report the coefficient associated to the birth year variable.Column (11) reports the coefficient associated to the birth year variable, as well as the number of years since first cohabitation and the age at first cohabitation.Column (12) reports the coefficient associated to the year of first cohabitation variable.(1), (2): Main specification, All women: Dependent variable: Interethnic (interfaith) marriages as observed in the data.(3), (4): Other bound, All women: Dependent variable: Interethnic (interfaith) marriages, with “other”-“other” unions counted as interethnic ones.(5), (6) : Higher bound, All women: Dependent variable: Interethnic (interfaith) marriages, with all women who remarried counted as being in an interethnic (intrafaith) union.(7), (8): Lower bound, All women: Dependent variable: Interethnic (interfaith) marriages, with all women who remarried counted as being in an intraethnic (intrafaith) union.(9), (10), (11), (12): Main specification, First unions: Only women in their first union.(13), (14): Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Table 1.6 (column (11)) shows results from the regression of the Intermarriage variables on

birth year, length of union, and age at first union. Comparing column (9) to column (11), I find

that the time trends are robust to controlling for length of union and age at first union.

The coefficient of the number of years since cohabitation is not significant for interethnic mar-

riages but is negative and significant for interfaith marriages. Therefore, in the case of intereth-

nic marriages, “assimilation” and selective divorces seem unlikely. In the case of interfaith

marriages, the longer the union, the more likely it is that spouses have the same faith. This re-

sult is consistent with the fact that the share of “other (faith)” decreased over time: conversion

during marriage may be one of the mechanisms that is behind this decrease. Another hypoth-

esis would be that interfaith unions are more likely to break than intrafaith unions, but this

hypothesis does not account for the decline in traditional religions.

Concerning the age at first cohabitation, women who were older when they started cohabiting

are more likely to be in an interethnic union, which is consistent with the fact that these women

are more educated and more likely to live in an urban area than their counterparts, and that

these characteristics are positively correlated to the likelihood of being in an interethnic union.

Older women at the time of their first cohabitation are less likely to be in an interfaith union,


which is consistent with the fact that they are less likely to belong to a traditional religion.

1.7.4 Testing Birth year v. Cohabitation year

The start of cohabitation may be a better measure of norms at the time that the union started:

it corresponds to what people may perceive, such as the fact that more (less) people are getting

married outside of their group. However, cohabitation year is less exogenous than birth year,

and age at marriage is higher for later-born cohorts, leading these cohorts to start cohabiting

at even later dates. In table 1.6, comparing columns (10) to (12), the results are robust to us-

ing cohabitation year instead of birth year on the sample of women who are still in their first

union. At the country-level, the results are robust to using cohabitation year instead of birth

year. Cohabitation year appears to indeed be endogenous to education and urban residence.

When using cohabitation year rather than birth year, there is a trend in all of the countries ex-

cept Burkina Faso and Niger, but after controlling for education and urban residence, this effect

remains significant in the countries where there was a trend using birth year and in Gabon. Af-

ter controlling for education and urban residence, the trends estimated on interfaith marriages

are different depending on whether birth year or cohabitation year is used only in Cote d’Ivoire

and in Senegal.

1.8 Concluding remarks

This paper documents patterns of interethnic and interfaith marriages in sub-Saharan African

countries. I use data from Demographic and Health Surveys that gather information on marital

history, education and geographic location to build a sample of women born between 1955 and

1989. I find that the share of interethnic marriages varies between countries but that such

unions are not uncommon: 20.4% of women are married to a man who is not from the same

ethnic group as them, contrasting with 9.7% of women who are married to someone who does

not share their faith, and 2.1% of women in Muslim-Christian marriages.

Studying marital outcomes of women born between 1955 and 1989, I find that interethnic mar-

riages became more common in half of the sample, and that their share remained constant in

the other half. This study concludes that higher educational achievements and widespread

urbanization contributed to that increase, but that these changes cannot explain all of the it,

suggesting that ethnic cleavages may lose salience in some parts of sub-Saharan Africa. Lin-

guistic distance for intermarried couples increased in some countries, suggesting that all eth-

nic boundaries were lessened. In other countries, linguistic distance decreased: boundaries

between groups that are close in terms of linguistic distance disappear and boundaries with

1.8 Concluding remarks 43

other groups may be reinforced. As interethnic marriage shares are far from rare and are even

increasing in half of the countries of the sample, using ethnicity as a proxy for a strong identifi-

cation with one single group may be misguided. In contrast, interfaith marriages are becoming

less common, a fact that can mostly be attributed to the conversion of former followers of tra-

ditional religions to Islam or Christianity, thus resulting in lower levels of religious diversity.

The share of Muslim-Christian marriages remains low and has not changed in most countries.

However, new religious groups, especially new religious branches within Christianity, are be-

ing separately counted in recent survey waves: religious boundaries within faith groups may

become more salient as a result of the expansion of Christianity and Islam.

Future research could aim at studying marriages that are intrafaith but interdenominational.

An additional strand of research could focus on better understanding the channels by which

education and urbanization impact the likelihood of marrying outside one’s own ethnic group:

Are cities different from rural areas because they are more heterogeneous in terms of ethnic-

ity? Or do people living in cities have more agency to choose a partner? Are educated women

more likely to marry outside of their ethnic group because they have more agency in choosing

a spouse or because they accessed more mixed markets by attending higher education institu-

tions? Related work could investigate whether marrying outside one’s own ethnic group is the

result of strategic behavior (Luke and Munshi, 2006). Are urban-dwellers turning away from

membership in their ethnic group to benefit from their membership in a religious group? Faith

groups or networks related to attending the same church or mosque could also be an opportu-

nity to access jobs and support. Deepening our understanding of whether marriage decisions

reinforce or change identity affiliations would bring important contributions to the political

economy literature on conflict, as well as to the literature on networks.

44 Appendix

Appendix

A-1.1. Data

Table A-1.1: Data description by survey wave

Survey Ethnicity Religion Remarried Survey Ethnicity Religion Remarried(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Benin 1996 9 11 4.4 7 35.9 20.0 Malawi 2000 9 11 7.7 8 2.6 24.6Benin 2001 9 11 4.4 7 28.2 18.7 Malawi 2004 9 9 7.6 8 1.5 19.9Benin 2006 9 11 4.3 11 27.5 14.5 Malawi 2010 9 14 6.8 9 1.9 21.2Benin 2011 9 10 1.8 10 22.0 9.7 Malawi 2015 9 11 5.6 8 0.5 23.6Benin 2017 9 10 5.1 10 16.7 14.3 Mali 1995 10 12 10.5 4 8.3 12.7Burkina Faso 1993 10 12 7.5 5 20.1 13.2 Mali 2000 10 11 4.5 5 4.4 12.5Burkina Faso 1998 10 19 11.2 5 18.3 13.5 Mali 2010 10 14 5.6 6 6.1 14.0Burkina Faso 2003 10 12 7.8 5 14.0 11.3 Mali 2012 10 13 6.9 8 3.3 6.3Burkina Faso 2010 10 17 13.2 6 9.6 10.3 Niger 1992 7 12 2.5 4 0.3 29.5Cameroon 1998 23 43 24.1 6 11.5 19.5 Niger 1998 7 11 1.7 4 0.7 24.9Cameroon 2004 23 48 21.3 8 10.9 19.2 Niger 2006 7 9 1.1 4 1.3 17.7Cote d’Ivoire 1994 14 44 44.3 5 29.0 16.8 Senegal 1992b 7 9 5.3 23.5Cote d’Ivoire 2011 14 56 46.8 9 18.3 14.7 Senegal 2005 7 9 8.3 2 0.0 19.4Gabon 2000 9 10 26.6 7 12.2 29.1 Senegal 2010 7 8 9.4 4 0.7 12.0Gabon 2012 9 11 31.7 8 5.1 25.3 Senegal 2014 7 8 5.1 3 0.5 12.7Ghana 1993 8 13 3.7 7 23.5 23.9 Senegal 2015 7 8 7.7 3 1.0 12.8Ghana 1998 8 13 4.7 9 15.6 23.9 Senegal 2016 7 8 6.6 3 0.5 12.4Ghana 2003 8 10 8.6 9 11.0 22.1 Senegal 2017 7 8 9.0 4 0.0 13.5Ghana 2008 8 10 4.7 11 9.8 19.3 Togo 1998 6 7 8.6 7 48.5 22.1Ghana 2014 8 9 3.3 10 6.4 21.1 Togo 2013 6 8 6.8 13 28.9 15.7Guinea 1999 7 8 1.3 5 7.3 15.0 Uganda 1996 20 32 10.4 5 1.4 21.5Guinea 2005 7 8 1.1 4 6.1 16.6 Uganda 2016 20 47 11.1 11 0.9 22.3Guinea 2012 7 7 3.8 5 7.4 15.6 Zambia 1996 31 46 6.8 6 1.8 17.8Kenya 1993 11 12 4.7 6 3.7 5.2 Zambia 2001 31 48 8.1 6 1.5 18.3Kenya 1998a 10 11 4.8 6 3.0 4.5 Zambia 2007 31 52 6.2 5 1.2 16.0Kenya 2003 11 15 7.1 6 2.7 5.8 Zambia 2013 31 54 5.3 5 1.0 17.9Kenya 2008 11 14 4.5 5 2.7 4.9Kenya 2014 11 23 6.2 5 1.6 7.2

Note: This table lists all survey waves included in the main sample, and presents results useful to understand recodingchoices across waves.Sample: Women in union at the time of the survey.Ethnicity: Columns (1) and (7): Number of ethnic groups in the common classification (at least one married woman andone married man in each survey wave and each birth cohort). Columns (2) and (8): Number of ethnic groups with at leastone woman listed in the DHS classification. Columns (3) and (9): Share of women belonging to the “other (ethnicity)”group (includes foreigners) in the common classification.Religion: Columns (4) and (10): Number of religious groups with at least one woman listed in the DHS classification.Common classification is made up of three groups for all waves. except Senegal 2005, where Christian and Others arepulled together. Columns (5) and (11): Share of women belonging to the “other (faith)” group in the common classifica-tion.Remarriage: Columns (6) and (12): Share of remarried women.a The only Somali people in this wave are two men: this wave is not representative of the north-east of Kenya.b Religious affiliation not included in questionnaire. I use a different set of weights when not using this wave.

Appendix 45

A-1.2. Linguistic distance measures

Figure A-1.1: Ethnolinguistic tree for groups listed in DHS Benin

“Proto-Human”

Niger-Congo

Nilo-Saharan

Songhai

Southern

Dendi

Atlantic-Congo

Atlantic

Northern

Senegambian

Fulani-Wolof

Fula

West Central

Peulh

Volta-Congo

Benue-Congo

Defoid

Yoruboid

Edekiri

Yoruba

North

Central

Northern

Oti-Volta

Eastern

Betamaribe

Southern

Grusi

Eastern

Yoa & Lopka

Gur

Bariba

Bariba

Kwa

Left Bank

Gbe

Fon

Fon

Aja

Adja

The names in bold are the names of the ethnic groups listed in DHS Benin. The names in italic are the names oflinguistic groups the ethnic groups were matched to.

Gershman and Rivera (2018) describe the specificity of linguistic trees in the case of ethnolin-

guistic groups in sub-Saharan Africa. While languages have not always been associated with

ethnicity (Canut, 2002), there is currently a strong association between languages (or language

groups) and ethnic groups. First, I match each group listed the recoded DHS classification to

the corresponding linguistic group according to the information listed in the Ethnologue dic-

tionary (Simons and Fennig, 2017). Second, I compute the linguistic distance of each pair of

linguistic groups within a country.

I define linguistic distance between two groups as the mean number of nodes to their first

common subfamily. The number of nodes is computed from the linguistic group level. For

instance, the linguistic distance between the ethnic groups Adja and Fon, in Benin, is 1 (one

step needed to go to the last common linguistic subfamily: Aja-Gbe (1) and Fon-Gbe (1)). The

linguistic distance between Kamba and Kikuyu is 0 (they share a subfamily: Kikuyu-Kamba).

The linguistic distance between Kalenjin and Luo is 4.5 (the average of the distance Kalenjin-

Nilotic (3) and Luo-Nilotic (6)).

When computing the value of the observed linguistic distance, if one spouse does not belong to

any identified ethnic group, but her/his partner does, I define the linguistic distance between

them as the mean linguistic distance of intermarried couples within the country. When com-

46 Appendix

puting the value of the random linguistic distance, I define the linguistic distance of couples in

which only one spouse is “other (ethnicity)” as the median value of the defined distances at the

language pair level.

Figure A-1.2: Ethnolinguistic tree for groups listed in DHS Kenya.

“Proto-Human”

Afro-Asiatic

Cushitic

East

Somali

Somali

Nilo-Saharan

Eastern Sudanic

Nilotic

Western

Luo

Southern

Luo-Acholi

Luo

Luo

Southern

Kalenjin

Kalenjin

Niger-Congo

Atlantic-Congo

Volta-Congo

Benue-Congo

Bantoid

Southern

Narrow Bantu

Central

J

Masaba-Luyia

Luhya

G

Swahili

Mijikenda Swahili

E

Nyika

Taita & Taveta

Kuria

Kisii

Kikuyu-Kamba

Meru & EmbuKikuyuKamba

The names in bold are the names of the ethnic groups listed in DHS Kenya. The names in italic are the names oflinguistic groups the ethnic groups were matched to.

Appendix 47

Online Appendix

B-1.1. Supplementary Appendix on data

DHS data

The main sample is made up of 15 countries. Below are listed the criteria for inclusion in

the main sample, as well as more detailed methodological information on reweighting and

recoding the data. Table 7 in paper lists the data waves used in the main sample. Table B-1.1

lists the survey waves that are not included in the main sample, as well as the reason why they

were not included.

Criteria - Main sample

The criteria for inclusion are as follows: First, countries must have implemented at least two

survey waves that include ethnic information24. How ethnic classifications are chosen is not

mentioned in the DHS reports. Second, the ethnic classifications must be comparable across

waves. Third, ethnic groups must be ethnolinguistic groups that can be matched to linguis-

tic groups25 using Ethnologue (Simons and Fennig, 2017). Fourth, the surveys must include

women born between 1955 and 1989, in order to observe women from all of the countries for

each birth year within the study period.

24A question on religious identity is included in all of the surveys except Senegal 1992. I compute the specificweights for specifications run without Senegal 1992. Excluding this survey does not change the study period.

25For instance, DRC and Chad list groups that correspond to geographic areas (e.g. “cuvette central” and “uelelac albert” in DRC). These places are heterogenous in terms of ethnic groups, thus leading me to exclude Chad andDRC from the main sample.

48 Appendix

Table B-1.1: DHS – Countries and waves not included in the main sample

Country in main sample Survey wavea Reason why wave not in main sample

Cameroon 2011 (10) No common classificationCote d’Ivoire 1998b –Malawi 1992 Ethnicity not availableNiger 12012 Ethnicity not availableSenegal 1997c

Uganda 2000 2006 2011 (19 / 5d ) Ethnicity not available

Country in maps section 6.1 Survey wave and nb ethnic groups Reason why country not in main sample

Central African Republic 1994 (10) Only one waveChad 1996 (13) 2004 (13) 2015 (13) Restricted study periodf

Congo (Brazzaville) 2005 (85) 2011(12) No common classificationCongo Democratic Republic 2007 (10) 2013 (11) Restricted study periodEthiopia 2000 (42) 2004 (48) 2011 (47) 2016 (42) Common classification only 2011 and 2016

Restricted study periodLiberia 2007 2013 (18) Only one waveMozambique 1997 (7) 2003 2011 (21) No common classificationNamibia 2000 (10) 2006 2013 Only one waveNigeria 2003 2008 (11) 2013 (400) No common classificationSierra Leone 2008 (10) 2013 (12) Restricted study period

Country not included Survey wave Reason why not included

Angola 2015 Ethnicity not availableBurundi 2010 2016 Ethnicity not availableComoros 1996 2012 Ethnicity not availableEswatini 2006 Ethnicity not availableLesotho 2004 2009 2014 Ethnicity not availableMadagascar 1992 1997 2003 2008 Ethnicity not availableRwanda 1992 (3) 2000 2005 2010 2014 Ethnicity not availableSao Tome and Principe 2008 Ethnicity not availableSouth Africa 1998e Ethnicity not availableTanzania 1991 1996 1999 2004 2010 2015 Ethnicity not availableZimbabwe 1994 1999 2005 2010 2015 Ethnicity not availableListed countries and waves: The list includes only countries and waves for which information about couples was included. For each survey wave, if a questionabout ethnicity was included, I indicate the number of ethnic groups with at least one female member.Number of ethnic groups: Number of ethnic groups used when pooling survey waves for a country. The number of groups includes the group “others”.Survey waves in brackets are not included in the sample.

a When a survey took place during two calendar years, the year listed is the year when data collection started.b Cote d’Ivoire 1998: Men were not asked their ethnic identity.c Senegal 1997: No information on whether women have remarried or not.d Uganda 2011: Men’s ethnic identities are classified into 5 groups, while women’s are classified into 19 groups. This wave is not included in the sample in

order to avoid loosing too much information by recoding ethnic groups into 5 categories.e South Africa 1998: Race is included, ethnicity is not.f Restricted study period: Inclusion of this country would lead to studying only a restricted sample, as the overlap of birth cohorts in this country and in countries

in the main sample is too small.

Reweighting - Main sample

When reweighting each survey wave, I take into account several issues associated with weights.

First, the weights provided by DHS do not sum up to population size. Using World Bank pop-

ulation statistics, I make sure that the weights of each survey correspond to the population size.

Second, women aged 15-49 are surveyed in all of the households, but men are surveyed in a

fraction of the surveyed households. The lowest sampling rate of men is 25% (Malawi 2000),

and the highest is 100% (DHS Ghana 2014, DHS Zambia 2013/2014). I adjust the weights by

multiplying them by the inverse of the sampling rate of men. Third, the number of survey

Appendix 49

waves differs across countries. I correct for these differences.

Recoding ethnic and religious groups

The common ethnic classification includes only the groups that were listed in all surveys. In

a few cases, such as Cameroon, I do not use the survey wave whose classification differs too

much from other waves. When the number of groups does not vary much across waves, I re-

code the ethnic classifications under the assumptions that individuals have a preferred answer

to the question “what is your ethnic group?” and that this answer is not affected by changes

in the classification. There are two cases: groups that appear only in some waves, and groups

that are alternatively listed as several subgroups and as one group. If individuals give an an-

swer that is not in the list (e.g. Maasai), this answer is recoded in “other (ethnicity)”. I hence

assume that a Maasai individual would have been coded as belonging to the “other” group in

DHS surveys that do not list this group. Based on that assumption, I assign to all of the Maasai

individuals the identity “other (ethnicity)” as the common classification for Kenya does not

include the group Maasai. I assume that subgroups are recoded into the corresponding group

in the classification. For instance, early DHS in Ghana list ethnic groups as “1 Asante, 2 Ak-

wapim, 3 Fante, 4 other Akan 5. Ga/Adangbe (etc.)”, whereas later waves list only “1. Akan

2. Ga/Adangbe (etc)”. I recode all of the Akan answers into a single category. To alleviate

concerns about measurement error due to recoding of ethnic groups, I check that the share of

respondents listed in the group “other ethnic group” (detailed statistics in Table 7, Appendix A

of the paper) remains roughly constant across cohorts and survey waves.

The case of religious groups is more straightforward. I recode the religious groups into three

different groups: Muslims, Christians, and “other (faiths)”. “Others (faiths)” includes mem-

bers of traditional religions, agnostics/atheists, members of other religions listed as such, and

a handful of very small religious groups. The share of members of “other (faiths)” does not

remain constant over time, reflecting changes in religious composition of countries rather than

errors in categorization of groups, as identification of religious groups is easier than identifica-

tion of ethnic groups.

B-1.2. Descriptive statistics

Intermarriages: random, observed, and ratio of intermarriages.

Table B-1.2 and table B-1.3 show the results for random, observed and ratio at the country-

level. This corresponds to what is displayed in Figures 4 and 5 of the paper. Table B-1.4 shows

descriptive statistics on linguistic distances.

50 Appendix

Table B-1.2: Observed and random intermarriage shares

Interethnic marriages shares Interfaith marriages sharesSurvey year Country Observed Random Observed Observed Random Observed

/Random /Random

2006 Benin 14.0 77.5 18.1 14.9 64.1 23.32003 BF 11.5 67.5 17.0 11.3 55.6 20.32004 Cameroon 24.3 94.3 25.8 12.1 50.6 23.81994 CARa 25.1 81.6 30.8 3.4 16.2 21.12004 Chad 19.0 87.3 21.7 7.4 53.7 13.92005 Congo 50.5 94.5 53.5 29.6 48.3 61.22012 Cote d’Ivoire 26.1 88.4 29.5 20.2 63.8 31.72007 DRCb 9.2 81.3 11.3 4.9 10.3 48.12003 Ethiopia 11.4 76.6 14.9 3.3 45.6 7.31900 Gabon 35.8 82.4 43.5 21.4 34.6 61.82003 Ghana 19.2 75.5 25.4 16.8 50.5 33.32005 Guinea 10.9 75.1 14.5 4.6 27.4 16.72003 Kenya 8.7 87.4 9.9 7.9 20.7 38.02013 Liberia 35.6 88.1 40.4 14.1 30.0 47.12004 Malawi 30.4 79.3 38.4 6.6 25.8 25.62006 Mali 28.6 84.3 33.9 5.8 15.4 37.72011 Mozambique 22.4 86.5 25.9 21.9 53.2 41.22000 Namibia 9.1 80.5 11.4 5.1 7.5 68.92006 Niger 11.9 59.7 20.0 2.4 3.3 73.02008 Nigeria 9.5 80.6 11.8 3.6 51.1 7.02005 Senegal 22.9 73.6 31.1 2.1 7.5 27.82008 SL 17.8 73.1 24.3 11.0 29.8 36.71998 Togo 11.0 72.3 15.2 21.7 59.8 36.21995 Uganda 23.5 93.4 25.2 5.8 22.5 25.72007 Zambia 48.9 91.9 53.2 4.2 5.6 75.1

Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.These shares correspond to what is plotted in the maps in figure 4 and figure 5 of the paper.

a Central African Republicb Democratic Republic of the Congo

Descriptive statistics on explanatory variables: primary, secondary, and urban residence

Appendix 51

Table B-1.3: Muslim/Christian marriages & religious structure

Intermarriage share Population shareInterfaith Christian/Muslim Other/marriage marriage Muslim Christian Traditional

Benin 16.3 2.8 24.9 47.1 28.0Burkina Faso 11.7 3.7 57.7 28.3 14.0

Cameroon 11.2 1.7 20.7 69.6 9.8Cote d’Ivoire 18.3 2.7 43.8 35.6 20.6

Gabon 21.2 2.6 6.7 81.6 11.7Ghana 18.9 2.7 20.6 66.1 13.3

Guinea 4.0 0.9 87.1 8.0 4.9Kenya 6.5 1.1 9.9 87.5 2.6

Malawi 6.6 2.2 11.1 87.7 1.2Mali 5.8 1.0 91.3 3.5 5.2

Niger 1.7 0.6 98.8 0.6 0.6Senegal 1.8 1.0 89.3 3.3 7.3

Togo 19.9 1.2 17.2 43.0 39.9Uganda 6.7 4.9 10.3 88.2 1.5Zambia 4.1 0.4 0.5 98.3 1.2

Data: Pooled DHS for each country. Weighted data. Sample: Women currently in union.

Table B-1.4: Linguistic distance - descriptive statistics

All couples Intermarried couplesLinguistic distance Mean SD Mean SD

Benin 0.5 1.3 3.2 1.4Burkina Faso 0.9 2.0 4.9 1.5

Cameroon 0.8 1.8 3.4 2.1Cote d’Ivoire 0.9 2.0 3.8 2.5

Gabon 0.9 1.3 2.3 0.9Ghana 0.6 1.4 3.2 1.2

Guinea 0.9 2.2 6.0 1.7Kenya 0.5 1.7 4.6 3.2

Malawi 0.9 1.3 2.8 0.4Mali 2.2 3.3 6.5 1.9

Niger 0.6 1.6 4.4 1.0Senegal 0.8 1.8 3.7 2.0

Togo 0.4 1.1 3.3 1.1Uganda 0.5 1.7 2.0 2.7Zambia 1.1 1.4 2.3 1.1

Data: Pooled DHS for each country. Weighted data. Sample: Women currently in

union.

52 Appendix

Table B-1.5: Descriptive statistics on education and urban residence levels

Country Primary only Secondary/higher Urban residenceMean Trend Mean Trend Mean Trend

(1) (2) (3) (4) (5) (6)

Benin 15.3 0.0558 8.3 0.235*** 0.4 0.00414***(0.0634) (0.0490) (0.000999)

Observations 10977 10977 10977 10977 10977 10977Burkina Faso 9.9 0.348*** 4.3 0.156*** 0.1 0.00401***

(0.0607) (0.0357) (0.000728)Observations 9170 9170 9170 9170 9170 9170

Cameroon 39.8 0.832** 24.7 -0.110 0.4 0.0259***(0.407) (0.309) (0.00465)

Observations 3066 3066 3066 3066 3066 3066Cote d’Ivoire 23.3 -0.172 9.1 0.0583 0.4 -0.000642

(0.152) (0.0887) (0.00168)Observations 2677 2677 2677 2677 2677 2677

Gabon 31.1 -1.430*** 61.5 1.349*** 0.8 0.00891***(0.269) (0.282) (0.00216)

Observations 2274 2274 2274 2274 2274 2274Ghana 23.0 -0.742*** 41.3 1.603*** 0.4 0.0105***

(0.152) (0.142) (0.00185)Observations 6487 6487 6487 6487 6487 6487

Guinea 8.2 0.155 5.7 0.199** 0.2 0.00266(0.106) (0.0936) (0.00168)

Observations 4732 4732 4732 4732 4732 4732Kenya 59.7 -0.000866 29.3 0.538*** 0.3 0.0124***

(0.111) (0.108) (0.00109)Observations 9169 9169 9169 9169 9169 9169

Malawi 62.7 0.318** 13.6 1.150*** 0.2 0.000716(0.131) (0.0797) (0.00123)

Observations 9241 9241 9241 9241 9241 9241Mali 10.2 -0.152** 5.5 0.177*** 0.2 -0.00374***

(0.0743) (0.0647) (0.00105)Observations 8499 8499 8499 8499 8499 8499

Niger 9.5 0.111 3.1 0.124*** 0.1 -0.000198(0.0960) (0.0399) (0.00103)

Observations 5603 5603 5603 5603 5603 5603Senegal 19.8 0.478*** 10.2 0.306*** 0.4 0.00514***

(0.0760) (0.0727) (0.00114)Observations 8339 8339 8339 8339 8339 8339

Togo 34.8 0.748*** 16.4 0.778*** 0.3 0.00802***(0.139) (0.0998) (0.00123)

Observations 3701 3701 3701 3701 3701 3701Uganda 56.7 0.703*** 19.6 0.800*** 0.2 0.00653***

(0.148) (0.115) (0.000968)Observations 2465 2465 2465 2465 2465 2465

Zambia 60.9 -0.397*** 25.9 0.606*** 0.4 -0.00184(0.112) (0.102) (0.00116)

Observations 10711 10711 10711 10711 10711 10711

Sample & data: Women currently in union, weighted DHS data at country level.Columns (1), (3), and (5): Percentage of women whose highest educational outcomeis primary school (column (1)), secondary school or higher (column (3)), and wholive in an urban area (column (5)).Columns (2), (4), and (6): OLS regressions run separately for the 15 countries of thesample. Standard errors are clustered at the DHS-cluster level. The dependentvariable is listed on the header of each part of the table: it is a dummy equals eitherto 0 or to 100. Results in columns (2), (4), and (6) can be interpreted as the change inpercentage points associated with being born a year later, once quadratic controlsfor age are introduced.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Appendix 53

B-1.3. Ethnic composition over time

Are changes on intermarriages rates due to changes in market composition? I show below how

random shares of intermarriages changed over cohorts.

Figure B-1.1 shows random shares of intermarriages for countries where the share of interethnic

marriages increased over time (results from specification 1.1). For these countries, random

shares remained stable over time: the level of ethnic diversity does not change, so it cannot

explain the increase in interethnic marriages shares.

Figure B-1.2 shows the random interethnic marriage shares for countries where the share of

interethnic marriages did not significantly increase over time. Fluctuations of these random

shares are due to changes in the share of “other (ethnicity)” over birth cohorts in Gabon and in

Burkina Faso.

The only country where the level of ethnic diversity may have decreased is Niger. The random

shares are lower for the later-born cohorts than for the earlier-born ones, due to the fact that

the share of Haussa women increased from 58 to 64% of the married population. This increase

is not due to changes in the population, but to the fact that Haussa girls married even younger

than other girls. Hence, this composition effect should be controlled for by age effects. In Niger,

time trends on interethnic marriages are negative but not significant.

54 Appendix

Figure B-1.1: Random interethnic marriage shares - Panel A

.6

.7

.8

.9

Beni

n

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989

.6

.7

.8

.9

ELF

1 2 3 4 5 6 7Cote d'Ivoire

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Ghana

.6

.7

.8

.9EL

F

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Guinea

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Kenya

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Mali

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Senegal

.6

.7

.8

.9

ELF

1 2 3 4 5 6 7Togo

.7

.8

.9

1

ELF

1 2 3 4 5 6 7Uganda

Sample & data: Women and men in union at the time of the survey, by birth cohort of women.

Appendix 55

Figure B-1.2: Random interethnic marriage shares - Panel B

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Burkina Faso

.6

.7

.8

.9

ELF

1 2 3 4 5 6 7Cameroon

.7

.8

.9

1

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Gabon

.6

.7

.8

.9

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Malawi

.4

.5

.6

.7

ELF

1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984 1985-1989Niger

.7

.8

.9

1

ELF

1 2 3 4 5 6 7Zambia

Sample & data: Women and men in union at the time of the survey, by birth cohort of women.

B-1.4. Additional results at country-level

Tables 4 and 5 present the detailed results from a the regression in model 2. Table B-1.8 and

Tables B-1.9 present the coefficient for the variable BirthY ear when the control variables are

introduced one-by-one.

56 Appendix

Table B-1.6: Women’s characteristics and interethnic marriage

Panel A Benin Cote d’Ivoire Ghana Guinea Kenya Mali Senegal Togo Uganda

Dependent variable: Interethnic marriage

Birth year 0.176** 0.262* 0.192 0.694*** 0.0820 0.305** 0.177* 0.164 0.413***(0.0727) (0.134) (0.140) (0.203) (0.0538) (0.129) (0.0913) (0.114) (0.123)

Age 0.713* 1.326 -1.152* 1.003** -1.009* 0.359 -0.192 0.215 -1.207(0.364) (0.860) (0.671) (0.494) (0.579) (0.687) (0.717) (0.725) (0.928)

Age squared -0.00840 -0.0196 0.0146 -0.00664 0.0147* -0.000660 0.00504 -0.000603 0.0171(0.00558) (0.0133) (0.00990) (0.00697) (0.00860) (0.0107) (0.0110) (0.0111) (0.0139)

Primary 6.487*** 5.199* 1.717 4.087* 1.975 7.836*** 11.34*** 4.347** -4.724*(1.236) (2.818) (1.871) (2.203) (1.251) (2.416) (2.138) (1.691) (2.699)

Secondary/Higher 13.53*** 21.25*** 3.216** 12.28*** 2.928** 22.24*** 17.74*** 10.38*** -0.526(1.775) (3.869) (1.635) (3.441) (1.405) (3.432) (2.894) (2.900) (3.509)

Urban 10.31*** 13.85*** 4.469*** 3.223* 12.46*** 16.05*** 16.27*** 9.563*** 17.07***(1.078) (2.517) (1.497) (1.847) (1.634) (2.611) (1.739) (2.126) (4.118)

Remarried 0.0223** 0.0141 0.0419*** 0.0388** 0.0859*** 0.0427* 0.0699*** 0.0367* 0.125***(0.0114) (0.0235) (0.0157) (0.0182) (0.0221) (0.0244) (0.0196) (0.0191) (0.0269)

Constant -352.5** -528.4** -342.4 -1383.2*** -141.0 -588.0** -337.6* -322.8 -773.6***(144.9) (267.6) (278.9) (405.3) (106.0) (256.5) (182.2) (226.6) (245.8)

Observations 10977 2677 6487 4732 9169 8499 8339 3701 2465R-squared 0.046 0.071 0.013 0.025 0.044 0.049 0.091 0.041 0.050Share intermarriage 15.1 19.2 19.4 14.0 10.5 30.6 23.5 14.4 24.3

Panel B and pooled sample Burkina Faso Cameroon Gabon Malawi Niger Zambia Pooled sample


Birth year -0.0680 0.215 0.410 -0.156 -0.114 0.0821 0.168***(0.0849) (0.317) (0.278) (0.120) (0.123) (0.125) (0.0403)

Age -0.504 -0.505 1.951 0.426 0.0473 0.0633 -0.152(0.440) (0.790) (1.591) (0.582) (0.568) (0.555) (0.197)

Age squared 0.00681 0.00387 -0.0270 -0.00896 -0.00360 -0.00476 0.00202(0.00689) (0.0111) (0.0241) (0.00928) (0.00916) (0.00860) (0.00292)

Primary 2.323* 0.919 22.33*** 3.986*** 3.496 3.246* 2.925***(1.355) (2.029) (4.073) (1.474) (2.439) (1.831) (0.637)

Secondary/Higher 20.61*** 5.697** 24.74*** 13.67*** 10.25*** 12.29*** 7.914***(2.650) (2.781) (3.732) (2.458) (3.330) (2.236) (0.787)

Urban 9.881*** 11.83*** -1.667 19.22*** 10.77*** 27.53*** 13.12***(1.343) (2.507) (2.986) (2.094) (1.902) (1.589) (0.661)

Remarried 0.0201* 0.0817*** 0.138*** 0.0520*** 0.0718*** 0.0818*** 0.0644***(0.0114) (0.0222) (0.0341) (0.0137) (0.0181) (0.0170) (0.00643)

Constant 150.3 -400.4 -828.9 327.6 235.1 -130.2(169.7) (633.4) (554.2) (238.4) (246.2) (247.8)

Country-fixed effects X

Observations 9170 3066 2274 9241 5603 10711 97111R-squared 0.045 0.037 0.039 0.038 0.029 0.099 0.272Share intermarriage 10.4 20.5 38.0 31.8 12.7 46.0 20.4Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women in union at the time of the survey. Specification:OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic. The regression equationis the same as displayed in column (2) of table 3 of the paper.Panel A: Countries with a positive and significant trend on interethnic marriages, when only age is controlled for.Panel B: Countries for which the trend on interethnic marriages is insignificant when only age is controlled for.Results can be interpreted as changes in percentage points.


Appendix 57

Table B-1.7: Women’s characteristics and interfaith marriage

Panels A1&B1 Benin Burkina Faso Gabon Ghana Kenya Togo Zambia Pooled sample

Dependent variable: Interfaith marriage

Birth year -0.103 -0.237*** -0.402 -1.301*** -0.123*** -0.0945 -0.146*** -0.149***(0.0753) (0.0894) (0.244) (0.131) (0.0451) (0.112) (0.0492) (0.0287)

Age -0.672 -0.554 -0.479 -0.960* -0.456 0.198 -0.181 -0.305**(0.421) (0.453) (1.277) (0.574) (0.389) (0.756) (0.267) (0.141)

Age squared 0.00701 0.00630 -0.00484 -0.00698 0.00318 -0.00703 0.000812 0.00134(0.00643) (0.00702) (0.0187) (0.00857) (0.00590) (0.0115) (0.00413) (0.00220)

Primary -2.268** -0.273 5.925 5.406*** -7.841*** 0.0553 -4.198*** 0.568(1.113) (1.268) (5.517) (1.666) (1.572) (1.887) (1.150) (0.457)

Secondary/Higher -2.822** 5.394** -2.611 -6.159*** -10.19*** -4.480** -5.036*** -1.512***(1.422) (2.092) (4.867) (1.359) (1.564) (2.279) (1.242) (0.561)

Urban -1.549 -3.256*** -3.408 -5.931*** -0.989 -7.608*** -0.496 -2.523***(0.973) (1.160) (2.788) (1.301) (0.756) (1.854) (0.620) (0.414)

Remarried 0.0592*** 0.0308** 0.0524* 0.0472*** 0.00353 0.0551** 0.00903 0.0381***(0.0120) (0.0134) (0.0309) (0.0146) (0.0132) (0.0231) (0.00797) (0.00502)

Constant 234.7 490.3*** 835.1* 2628.1*** 267.0*** 208.8 300.9***(149.8) (178.7) (487.8) (261.7) (88.90) (222.0) (97.79)


Observations 10977 9170 2274 6487 9169 3701 10711 96549R-squared 0.006 0.005 0.026 0.092 0.019 0.017 0.009 0.139Share intermarriage 16.7 12.1 18.6 18.3 6.4 18.9 4.4 9.7Share other 25.6 14.0 7.8 11.2 2.5 36.4 1.3 8.5

“Other (faiths)” -0.841*** -0.650*** -0.610*** -0.725*** -0.0705** -1.235*** -0.0355 -0.288***(0.110) (0.127) (0.160) (0.132) (0.0300) (0.179) (0.0299) (0.0297)

Panels A2&B2 — A3 Cote d’Ivoire Guinea Malawi Mali Niger Senegal Uganda Cameroon


Birth year 0.148 0.0481 -0.0130 -0.000175 0.0962* -0.0469 0.0140 0.523**(0.134) (0.122) (0.0816) (0.0659) (0.0567) (0.0585) (0.0603) (0.236)

Age -0.628 -0.493 -0.199 -0.678* 0.0450 0.211 -1.403*** 1.458*(0.979) (0.334) (0.405) (0.358) (0.175) (0.235) (0.540) (0.767)

Age squared 0.00904 0.00742 0.00316 0.00941* 0.00102 -0.00373 0.0205** -0.0165(0.0151) (0.00520) (0.00660) (0.00553) (0.00329) (0.00353) (0.00847) (0.0106)

Primary 9.623*** 3.711** -4.231*** 0.502 -0.362 -0.0795 0.536 -0.764(2.595) (1.518) (1.196) (1.007) (0.725) (0.617) (1.303) (1.441)

Secondary/Higher 12.28*** 1.520 -7.307*** 0.958 -0.634 2.541** 3.275 3.121*(3.928) (1.408) (1.317) (1.376) (0.693) (1.269) (2.028) (1.755)

Urban -8.541*** -3.718*** 0.265 -4.008*** -0.232 0.0281 3.241* -0.790(2.236) (0.912) (1.090) (0.740) (0.513) (0.582) (1.796) (1.337)

Remarried 0.0955*** 0.0316** 0.0320*** 0.0220** 0.00516 0.00570 0.0621*** -0.0137(0.0360) (0.0127) (0.0102) (0.0106) (0.00569) (0.00727) (0.0147) (0.0165)

Constant -264.1 -82.25 39.36 18.35 -190.2* 91.71 -1.933 -1047.0**(268.2) (244.3) (161.8) (131.6) (111.2) (116.2) (118.4) (476.0)

Observations 2677 4732 9241 8499 5603 7777 2465 3066R-squared 0.031 0.009 0.011 0.006 0.002 0.004 0.020 0.006Share intermarriage 19.3 5.1 7.8 6.2 1.8 1.9 5.7 10.8Share other 21.9 6.9 1.6 5.2 0.9 0.5 1.1 11.2

“Other (faiths)” -0.575*** 0.0224 -0.120*** -0.206*** 0.0711** 0.0254* -0.0208 0.198(0.189) (0.222) (0.0397) (0.0685) (0.0360) (0.0133) (0.0237) (0.429)

Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women in union at the time of thesurvey. Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith. I consider three religious groups (Christians, Muslims, Others), intermarriage happens between these three groups. The regression equation is the same asdisplayed in column (2) of table 5.Panel A1 & B1: Countries with a negative and significant trend on interfaith marriages, when only age is controlled for.Panel A2 & B2: Countries for which the trend on interfaith marriages is insignificant when only age is controlled for.Panel A3: Countries with a positive and significant trend on interfaith marriages, when only age is controlled for.Trend on “others”: Coefficient associated to birth year, from a regression where the dependent variable is an indicator variable equals to 100 it the respondent is neither Muslimnor ChristianResults can be interpreted as changes in percentage points.


58 Appendix

Table B-1.8: Trend on interethnic marriage shares

(1) (2) (3) (4) (5) (6) (7)Dependent variable Interethnic marriage Mean N

Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage sharesBenin 0.242*** 0.193*** 0.188** 0.253*** 0.176** 15.1 10977

(0.0741) (0.0727) (0.0737) (0.0743) (0.0727)Cote d’Ivoire 0.254* 0.249* 0.265** 0.253* 0.262* 19.2 2677

(0.136) (0.135) (0.134) (0.137) (0.134)Ghana 0.262* 0.205 0.208 0.277** 0.192 19.4 6487

(0.140) (0.142) (0.139) (0.139) (0.140)Guinea 0.728*** 0.692*** 0.712*** 0.733*** 0.694*** 14.0 4732

(0.200) (0.202) (0.202) (0.200) (0.203)Kenya 0.257*** 0.224*** 0.0978* 0.252*** 0.0820 10.5 9169

(0.0624) (0.0597) (0.0543) (0.0625) (0.0538)Mali 0.253* 0.215* 0.331** 0.278** 0.305** 30.6 8499

(0.130) (0.127) (0.130) (0.131) (0.129)Senegal 0.326*** 0.158* 0.213** 0.367*** 0.177* 23.5 8339

(0.0914) (0.0900) (0.0903) (0.0926) (0.0913)Togo 0.338*** 0.178 0.241** 0.351*** 0.164 14.4 3701

(0.113) (0.114) (0.113) (0.113) (0.114)Uganda 0.455*** 0.445*** 0.342*** 0.485*** 0.413*** 24.3 2465

(0.135) (0.131) (0.125) (0.133) (0.123)Panel B: No change in interethnic marriage sharesBurkina Faso 0.00631 -0.0553 -0.0531 0.00936 -0.0680 10.4 9170

(0.0856) (0.0849) (0.0854) (0.0855) (0.0849)Cameroon 0.545 0.540 0.213 0.528 0.215 20.5 3066

(0.332) (0.333) (0.321) (0.331) (0.317)Gabon 0.364 0.377 0.385 0.413 0.410 38.0 2274

(0.289) (0.278) (0.288) (0.289) (0.278)Malawi 0.00730 -0.242** -0.00862 0.0211 -0.156 31.8 9241

(0.121) (0.123) (0.116) (0.121) (0.120)Niger -0.154 -0.181 -0.152 -0.104 -0.114 12.7 5603

(0.120) (0.121) (0.120) (0.123) (0.123)Zambia 0.0675 -0.0649 0.124 0.0823 0.0821 46.0 10711

(0.132) (0.128) (0.125) (0.133) (0.125)

ControlsAge & Age2 X X X X XEducation X XUrban X XRemarried X X

Data: Pooled DHS for each country. Weighted data. Sample: Women in union at the time of the survey. Specification: OLSregression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependentvariable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Columns (1) to (5) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.Column (6) reports the number of observations for each country. When comparing columns (1) and (5), we can see whetherthere is a trend (column (1)) on the share of interethnic marriage and whether there is a trend (column (5)) once we controlfor individual characteristics (education, urban residence, whether the woman is not in her first union) which are positivelycorrelated with the likelihood to be in an interethnic union. Results can be interpreted as changes in percentage points.Results in columns (1) to (5) can be interpreted as changes in percentage points.


Appendix 59

Table B-1.9: Trend on interfaith marriage shares

(1) (2) (3) (4) (5) (6) (7)Dependent variable Interfaith marriage Mean N

Birth year Each cell: coefficient from a separate regressioncoefficientPanel A: Increase in interethnic marriage shares

Panel A1: Decrease in interfaith marriage sharesBenin -0.150** -0.140* -0.141* -0.118 -0.103 16.7 10977

(0.0754) (0.0754) (0.0753) (0.0755) (0.0753)Ghana -1.520*** -1.351*** -1.433*** -1.498*** -1.301*** 18.3 6487

(0.133) (0.129) (0.137) (0.133) (0.131)Kenya -0.189*** -0.133*** -0.162*** -0.190*** -0.123*** 6.4 9169

(0.0445) (0.0436) (0.0456) (0.0446) (0.0451)Togo -0.214** -0.142 -0.144 -0.187* -0.0945 18.9 3701

(0.109) (0.112) (0.110) (0.108) (0.112)Panel A2: No change in interfaith marriage shares

Cote d’Ivoire 0.122 0.133 0.118 0.146 0.148 19.3 2677(0.134) (0.135) (0.133) (0.135) (0.134)

Guinea 0.0423 0.0404 0.0505 0.0472 0.0481 5.1 4732(0.122) (0.124) (0.122) (0.122) (0.122)

Mali 0.00585 0.00798 -0.00809 0.0149 -0.000175 6.2 8499(0.0660) (0.0661) (0.0659) (0.0658) (0.0659)

Senegal -0.0399 -0.0508 -0.0409 -0.0367 -0.0469 1.9 7777(0.0594) (0.0594) (0.0592) (0.0586) (0.0585)

Uganda 0.0496 0.0136 0.0232 0.0644 0.0140 5.7 2465(0.0570) (0.0612) (0.0560) (0.0559) (0.0603)Panel A3: Increase in interfaith marriage shares

Cameroon 0.489** 0.498** 0.482** 0.493** 0.523** 10.8 3066(0.233) (0.231) (0.236) (0.233) (0.236)

Panel B: No change in interethnic marriage sharesPanel B1: Decrease in interfaith marriage shares

Burkina Faso -0.251*** -0.251*** -0.242*** -0.243*** -0.237*** 12.1 9170(0.0898) (0.0896) (0.0895) (0.0896) (0.0894)

Gabon -0.570** -0.431* -0.512** -0.549** -0.402 18.6 2274(0.239) (0.242) (0.244) (0.238) (0.244)

Zambia -0.161*** -0.146*** -0.164*** -0.158*** -0.146*** 4.4 10711(0.0496) (0.0487) (0.0497) (0.0497) (0.0492)Panel B2: No change in interfaith marriage shares

Malawi -0.123 -0.0212 -0.122 -0.109 -0.0130 7.8 9241(0.0786) (0.0811) (0.0784) (0.0789) (0.0816)

Niger 0.0912 0.0927 0.0911 0.0953* 0.0962* 1.8 5603(0.0572) (0.0580) (0.0571) (0.0564) (0.0567)

ControlsAge & Age2 X X X X XEducation X XUrban X XRemarried X X

Data: Pooled DHS for each country. Weighted data. Sample: Women in union at the time of thesurvey. Specification: OLS regression run separately for the 15 countries of the sample. Standard errorsare clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union isintrafaith, 100 if the union is interfaith.Columns (1) to (5) report the coefficient associated to the birth year variable. Each cell correspondsto a separate regression. Column (6) reports the share of interfaith marriages. Column (7) reports thenumber of observations for each country. When comparing columns (1) and (5), we can see whetherthere is a trend (column (1)) on the share of interfaith marriage and whether there is a trend (column(5)) once we control for individual characteristics (education, urban residence, whether the woman isnot in her first union) which are positively correlated with the likelihood to be in an interfaith union.Results can be interpreted as changes in percentage points.Results in columns (1) to (5) can be interpreted as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

60 Appendix

B-1.5. Additional robustness analyzes at country-level

I implement four robustness checks on my findings: I present here the results for each coun-

try. First, I relax the assumption that a marriage is intraethnic or intrafaith when both spouses

belong to the group “other”. Second, I test whether results are robust to alternative assump-

tions on remarried women’s first unions. Third, using only women in their first union, I test

whether “assimilation” and conversion take place over the length of a marriage. Fourth, using

only women in their first union, I compare time trends measured using birth year and using

marriage years. Table B-1.13 (ethnicity) and Table B-1.14 (religion) display results from the

main regression and from the regressions when varying the assumptions, as mentioned above.

Testing for heterogeneity in the “other” group

The group “other ethnicity/faith” is a group that is more heterogenous than other groups. In

the main specification, I assumed that when both spouses belonged to the group “other”, their

union was an in-group one. Assuming that these unions are in fact out-groups unions, more

unions are now counted as intermarriages.

In the case of interethnic marriages, the results change in a few countries (main results in

columns (1) and (2), results under this hypothesis in columns (7) and (8), Table B-1.13 (eth-

nicity) and Table B-1.14 (religion)). Among countries where interethnic marriages increased,

trends turn insignificant for Cote d’Ivoire and Mali. Among countries where they did not in-

crease using the main specification, trends turn significant for Burkina Faso and Gabon. In

Burkina Faso, Gabon and Mali, these changes are due to the fact that the share of “other-other”

unions has varied over time. The share of “others” is around 45% in Cote d’Ivoire, the highest

share among all countries in the sample, and only 16% of “others” are married outside of their

group: the “other” hypothesis shifts a large fraction of unions from intraethnic to interethnic.

Even under this hypothesis, there is no country for which interethnic marriage shares decrease.

Results on interfaith unions do not change for countries where the share of such unions de-

creased. Trends on interfaith unions turn negative in Cote d’Ivoire, Malawi and Mali. This

finding is consistent with the fact that the share of members of traditional religions decreased

in these countries, so counting unions between members of this group as intrafaith or as inter-

faith does not affect the trend. By contrast to these results, Niger saw an increase in interfaith

marriages. This is due to the fact that the share of “other” increased in the youngest Nigerien

cohorts.

Appendix 61

Testing the remarriage story

First, I bound my estimates by making assumptions on first unions of remarried women. Ta-

ble B-1.13 (ethnicity) and Table B-1.14 (religion) (columns (1) to (6)) show the results at the

country-level. The “lower bound” (on the birth year coefficient) hypothesis assigns an intereth-

nic union to all the women who have remarried (translating into an higher share of interethnic

marriages. The “higher bound” (on the birth year coefficient) hypothesis assigns an intraeth-

nic union to all the women who have remarried (translating into an lower share of interethnic

marriages on average. For Panel A, the sign of the bounds conflict in Benin, Senegal and Togo.

Results for other countries and for interfaith marriages are robust to these changes. The fact

that, under the lower bound hypothesis, results change in the same direction for both intereth-

nic and interfaith marriages (i.e. trends turn negative, but never positive) indicates that the ef-

fect captured is mostly that remarried women are more likely to be older women, and women

belonging to earlier-born cohorts, and that I assign to these earlier-born cohorts high shares of

intermarriages, which are even higher than what is observed in later-born cohorts.

Second, I test whether trends I observed come from remarried women or from women in their

first union (Table B-1.13 (ethnicity) and Table B-1.14 (religion), results on columns (9) to (13)).

Look at interethnic marriages, there is no trend in both sub-samples in all countries of Panel

B. Any trend found on the whole sample in found among women in their first unions, and

there are positive trends for remarried women in Benin, Senegal and Uganda (coefficients are

positive and high in all countries but Cote d’Ivoire). In the case of interfaith marriages, I find

negative trends for both sub-samples in all countries where interfaith shares decreases, expect

in Benin and Togo, where trends turn insignificant. Point estimates are high for the remarried

sample, indicating it is likely an issue of power. Coefficient turns negative for Malawi, where

remarried women and women in their first unions do not experience the same trends. In Niger

the coefficient is positive for remarried women.

Testing the assimilation/conversion story

Older women have spent more time in union than younger women: as spouses spend longer

in union, their ethnic or religious identity may change26. Exploiting the fact that I have several

survey waves for each country, I can study whether women who married for the first time

the same year and were born the same year are more (less) likely to report having the same

ethnic (religious) group as their husband when the length of union increases. However, the

26Conversation or “assimilation” may take place before cohabitation or marriage, but I cannot estimates thoseusing DHS.

62 Appendix

Table B-1.10: Ethnic identification and time in union

Panel A Benin Cote d’Ivoire Ghana Guinea Kenya Mali Senegal Togo Uganda


Birth year 0.230*** 0.286** 0.185 0.746*** 0.196*** 0.175 -0.0774 0.320*** 0.398***(0.0765) (0.143) (0.140) (0.204) (0.0649) (0.134) (0.204) (0.114) (0.146)

Number of years since cohabitation 0.0903 -0.0811 -0.257* 0.596*** -0.0948 0.0466 -0.248 0.148 -0.259(0.0854) (0.158) (0.152) (0.204) (0.0747) (0.157) (0.202) (0.130) (0.174)

Age at cohabitation 0.941*** 1.030*** 0.369* 1.167*** 0.612*** 1.110*** 1.297*** 1.261*** 0.413(0.125) (0.283) (0.206) (0.262) (0.154) (0.235) (0.261) (0.234) (0.355)

Constant -458.3*** -564.5** -350.5 -1487.9*** -386.8*** -335.6 155.6 -643.4*** -767.9***(153.2) (283.1) (280.8) (407.4) (127.6) (268.5) (408.5) (226.9) (291.1)

Observations 9390 2262 5124 3977 8569 7549 6687 3048 1915R-squared 0.012 0.019 0.010 0.016 0.013 0.007 0.029 0.024 0.016Share intermarriage 14.9 19.3 18.9 13.5 10.0 30.1 23.6 14.0 22.0Panel B and pooled sample Burkina Faso Cameroon Gabon Malawi Niger Zambia Pooled sample


Birth year -0.0143 0.500 0.214 0.0501 -0.139 -0.0246 0.210***(0.0874) (0.342) (0.306) (0.131) (0.122) (0.143) (0.0447)

Number of years since cohabitation -0.0796 0.0556 -0.256 -0.0447 -0.209* -0.343** -0.0630(0.0935) (0.348) (0.359) (0.133) (0.122) (0.142) (0.0470)

Age at cohabitation 0.805*** 0.759* 0.897** 0.732*** 0.477* 1.403*** 0.811***(0.216) (0.427) (0.421) (0.256) (0.262) (0.240) (0.0812)

Constant 25.43 -981.5 -404.1 -80.28 280.4 72.78(175.2) (684.4) (612.0) (261.8) (241.8) (284.9)


Observations 8040 2452 1553 7196 4305 8743 80810R-squared 0.005 0.011 0.018 0.003 0.003 0.015 0.244Share intermarriage 10.3 19.5 34.3 31.1 11.2 45.4 19.4Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women still in their first union at the time of the survey.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Under the assumption that the occurrence of divorce and of widowhood are not correlated with a woman’s ethnicity, marital status and her husband’s ethnicity, then the variable “number of years sincecohabitation” would indicate whether women are more likely to declare they belong to the same ethnic group as their husband (thus “assimilating” into his ethnic group as the length of the union increases).The results are more suggestive of a “selection” into divorce/widowhood story than of an “assimilation” story, as the length of the union is not significant in most countries, and as the signs are conflictingwhen this coefficient is significant.


identification ultimately rests on differences across survey waves, so this will also capture any

effect linked to survey wave. Results using this specification should be compared with results

on women still in their first union (Table B-1.13 (ethnicity) and Table B-1.14 (religion) (columns

(9) to (11)).

Table B-1.10 shows the test of the assimilation story for interethnic marriages. Women who

were older when they started cohabiting are more likely to be in an interethnic union, which

is consistent with the fact that these women are more educated and more likely to live in an

urban area than their counterparts, and that these characteristics are positively correlated to the

likelihood to be in an interethnic union. The coefficient of the number of years till cohabitation

is not significant in the pooled sample, which hides discrepancies across countries. The length

of union is only positively correlated to the likely to be in an interethnic union in Guinea, and

insignificant in other countries where the share of interethnic unions has increased. The posi-

tive coefficient for Guinean women is indicative of selective divorce: women who were in an

intra-ethnic union are more likely to divorce than their counterparts, maybe because they were

less likely to have chosen their first husband than women who married outside of their ethnic

group. This story is consistent with the fact that point estimates of birth year is high for the sub-

sample of remarried women, even if insignificant. In countries where interethnic marriages did

not become more frequent, the length of union is always negative, and is significant in Ghana,

Appendix 63

Niger, and Zambia. This indicates that there might be selective divorces or assimilation in these

countries, which might be the reason why we do not observe a trend on interethnic marriage

shares.

Table B-1.11: Religious identification and time in union

Panels A1&B1 Benin Burkina Faso Gabon Ghana Kenya Togo Zambia Pooled sample


Birth year -0.121 -0.213** -0.449* -1.423*** -0.160*** -0.113 -0.140** -0.146***(0.0770) (0.0909) (0.264) (0.149) (0.0441) (0.116) (0.0551) (0.0292)

Number of years since cohabitation -0.194** -0.200** -0.727*** -1.522*** -0.258*** -0.191 -0.101* -0.221***(0.0877) (0.0993) (0.271) (0.160) (0.0614) (0.142) (0.0571) (0.0329)

Age at cohabitation -0.298** 0.0835 -0.886*** -2.096*** -0.504*** -0.703*** -0.264*** -0.258***(0.124) (0.200) (0.303) (0.182) (0.0907) (0.223) (0.0979) (0.0544)

Constant 262.6* 433.5** 930.0* 2886.0*** 335.0*** 257.3 287.0***(154.2) (181.7) (528.2) (298.2) (87.85) (231.9) (110.2)


Observations 9390 8040 1553 5124 8569 3048 8743 80810R-squared 0.001 0.002 0.009 0.069 0.006 0.004 0.003 0.125Share intermarriage 15.8 11.7 17.0 17.0 6.3 17.8 4.2 9.1

Panels A2&B2 — A3 Cote d’Ivoire Guinea Malawi Mali Niger Senegal Uganda Cameroon


Birth year 0.0901 0.0318 -0.183** 0.0371 0.0828 -0.0268 0.0989* 0.392*(0.135) (0.129) (0.0779) (0.0693) (0.0702) (0.0601) (0.0566) (0.223)

Number of years since cohabitation -0.209 -0.0509 -0.0751 -0.0600 0.0648 -0.0682 -0.134 0.333(0.184) (0.112) (0.0765) (0.0795) (0.0597) (0.0636) (0.0934) (0.254)

Age at cohabitation 0.479 0.0726 -0.146 -0.105 0.134 0.195 -0.0630 0.791***(0.321) (0.141) (0.160) (0.117) (0.0992) (0.120) (0.153) (0.297)

Constant -166.7 -58.77 372.0** -64.73 -164.5 52.21 -188.2* -779.5*(268.6) (258.3) (155.4) (138.5) (140.0) (120.3) (112.6) (447.2)

Observations 2262 3977 7196 7549 4305 6687 1915 2452R-squared 0.007 0.001 0.002 0.001 0.002 0.008 0.005 0.004Share intermarriage 17.8 4.6 6.9 6.0 1.7 1.9 4.5 11.1Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women still in their first union at the time of thesurvey. Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if the union is interfaith. Iconsider three religious groups (Christians, Muslims, Others), intermarriage happens between these three groups.Under the assumption that the occurrence of divorce and of widowhood are not correlated with a woman’s religious affiliation, marital status and her husband’s religious affiliation, then thevariable “number of years since cohabitation” would indicate whether women are more likely to declare they belong to the same religious group as their husband (thus “assimilating” into hisreligious group as the length of the union increases).


Table B-1.11 shows the test of the conversion story for interfaith marriages. Patterns differ

across panels. When looking at countries where the share of interfaith marriages has not

changed (lower panel - A2 & B2), length of union and age at cohabitation are not signifi-

cant. When looking at Cameroon, the only country where interfaith marriages have become

more frequent, results are similar to what is seen when studying interethnic marriages: older

women at the time of their first cohabitation are more likely to be in an interfaith union. Look-

ing at countries where the share of interfaith unions decreased over time (upper panel - A1 &

B1), I find that older women at the time of their first cohabitation are less likely to be in an

interfaith union, which is consistent with the fact that they are less likely to belong to a tradi-

tional religion, in all countries but Burkina Faso. The length of union is negatively correlated

to the likelihood to be in an interfaith union in all countries of this panel, which is consistent

with either conversion, or with selective divorces. It seems like that followers of traditional

religions convert during their marriage: given intense proselytizing of other faiths, conversa-

tions pattern are more likely to go in this direction rather than Muslim or Christian individual

64 Appendix

converting to the faith of their spouse.

Testing Birth year v. Year of first cohabitation

The results from Table B-1.12 are commented in the paper.

Table B-1.12: Trend - Year of marriage

(1) (2) (3) (4)Dependent variable Interethnic marriage Mean N Dependent variable Interfaith marriage Mean N

Marriage year coefficient Each cell: coefficient from a separate regressionPanel B: No change in interethnic marriage shares (with birth year) Panels A2&B2: No change in interfaith marriage shares (with birth year)

Burkina Faso 0.119 -0.0450 10.3 8040 Cote d’Ivoire 0.238** 0.213* 17.8 2262(0.0830) (0.0821) (0.119) (0.119)

Cameroon 0.605*** 0.116 21.4 2452 Guinea 0.0675 0.0892 4.6 3977(0.216) (0.231) (0.0968) (0.104)

Gabon 0.622*** 0.501** 34.3 1553 Malawi -0.167** -0.0465 6.9 7196(0.212) (0.210) (0.0675) (0.0696)

Ghana 0.288*** 0.171 19.3 5124 Mali 0.0297 0.0301 6.0 7549(0.108) (0.111) (0.0526) (0.0539)

Malawi 0.220* -0.142 31.1 7196 Niger 0.0792 0.0863 1.7 4305(0.116) (0.120) (0.0617) (0.0667)

Niger 0.0342 -0.0774 11.2 4305 Senegal 0.135** 0.126** 1.9 6687(0.0978) (0.0992) (0.0654) (0.0558)

Zambia 0.407*** 0.144 45.5 8743 Uganda 0.120** 0.0725 4.5 1915(0.122) (0.120) (0.0546) (0.0590)

ControlsAge & Age2 X X X XEducation X X

Urban X X

Data: Pooled DHS for each country. Weighted data. Sample: Women in their first union at the time of the survey. Specification: OLS regression run separately for the 15 countries of the sample.Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Columns (1) and (2) report the coefficient associated to the year of marriage variable. Each cell corresponds to a separate regression. Column (3) reports the mean number of interethnic (interfaith)marriages in the regression sample. Column (4) reports the number of observations for each country.Results in columns (1) and (2) can be interpreted as changes in percentage points.Results are displayed only for countries in which interethnic (interfaith) marriages shares did not change using year of birth to measure time trends.


Appendix

65

Table B-1.13: Robustness table – Interethnic marriages

Regression results - Dependent variable: Interethnic marriage - Each cell: birth year coefficient from a separate regression Share of interethnic marriages

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)Sample All married women First union Remar. All married women First Remarried

Lower Upper Others Union NAssumptions Main Main Lower bound Upper bound Bound Other Main Main Main Main Main Main bound bound bound All

Panel A: Increase in interethnic marriage sharesBenin 0.242*** 0.176** -0.282*** -0.344*** 0.271*** 0.198*** 0.243*** 0.181** 0.228*** 0.154** 0.163** 0.387** 0.316** 15.1 27.6 12.7 16.4 14.9 16.5 10977

(0.0741) (0.0727) (0.0871) (0.0855) (0.0641) (0.0630) (0.0778) (0.0764) (0.0782) (0.0757) (0.0748) (0.152) (0.152)Cote d’Ivoire 0.310** 0.326** 0.148 0.154 0.374*** 0.379*** 0.143 0.125 0.390*** 0.370*** 0.346** -0.133 -0.0217 20.4 32.4 17.0 51.2 20.1 22.1 2677

(0.137) (0.135) (0.151) (0.146) (0.123) (0.120) (0.205) (0.199) (0.142) (0.139) (0.141) (0.291) (0.275)Guinea 0.728*** 0.694*** 0.501** 0.484** 0.649*** 0.616*** 0.892*** 0.856*** 0.744*** 0.703*** 0.716*** 0.654 0.652 14.0 27.2 11.4 15.1 13.5 16.7 4732

(0.200) (0.203) (0.197) (0.199) (0.174) (0.176) (0.214) (0.217) (0.203) (0.206) (0.207) (0.405) (0.404)Kenya 0.257*** 0.0820 0.293*** 0.148** 0.227*** 0.0684 0.286*** 0.221*** 0.247*** 0.0801 0.0605 0.332 0.125 10.5 15.1 9.5 15.1 10.0 18.2 9169

(0.0624) (0.0538) (0.0707) (0.0662) (0.0613) (0.0518) (0.0763) (0.0743) (0.0649) (0.0550) (0.0554) (0.291) (0.300)Mali 0.253* 0.305** -0.0803 -0.0417 0.370*** 0.387*** 0.121 0.172 0.261* 0.293** 0.256** 0.548 0.553 30.6 37.7 26.8 33.2 30.1 35.2 8499

(0.130) (0.129) (0.129) (0.124) (0.123) (0.120) (0.143) (0.141) (0.134) (0.129) (0.130) (0.349) (0.341)Senegal 0.326*** 0.177* -0.241** -0.395*** 0.369*** 0.197** 0.384*** 0.212** 0.291*** 0.106 0.0702 0.644*** 0.443** 23.5 34.8 19.1 26.7 23.6 32.4 8339

(0.0914) (0.0913) (0.107) (0.108) (0.0831) (0.0811) (0.0954) (0.0943) (0.102) (0.101) (0.102) (0.193) (0.186)Togo 0.338*** 0.164 -0.0762 -0.222* 0.367*** 0.206** 0.314** 0.173 0.382*** 0.178 0.170 0.255 0.146 14.4 29.6 11.5 18.2 14.0 16.3 3701

(0.113) (0.114) (0.132) (0.134) (0.0928) (0.0932) (0.139) (0.139) (0.115) (0.117) (0.116) (0.245) (0.240)Uganda 0.455*** 0.413*** 0.137 0.0870 0.387*** 0.257** 0.581*** 0.655*** 0.453*** 0.288** 0.261* 0.596** 0.740*** 24.3 39.1 17.1 30.9 22.0 32.8 2465

(0.135) (0.123) (0.151) (0.150) (0.116) (0.106) (0.149) (0.137) (0.149) (0.140) (0.140) (0.251) (0.234)Panel B: No change in interethnic marriage sharesBurkina Faso 0.00631 -0.0680 -0.241** -0.299*** 0.0337 -0.0374 0.265** 0.190 0.00690 -0.0636 -0.0665 0.0557 -0.106 10.4 20.7 9.1 18.4 10.3 11.3 9170

(0.0856) (0.0849) (0.104) (0.103) (0.0763) (0.0756) (0.128) (0.127) (0.0878) (0.0870) (0.0867) (0.189) (0.185)Cameroon 0.344 0.0297 0.412 0.318 0.143 -0.145 0.505 0.239 0.259 -0.0267 0.0272 0.612 0.326 22.3 36.6 17.2 34.0 21.4 26.3 3066

(0.348) (0.334) (0.370) (0.364) (0.312) (0.300) (0.482) (0.462) (0.372) (0.359) (0.357) (0.714) (0.709)Gabon 0.364 0.410 -0.0121 0.0609 0.320 0.186 0.844*** 0.763*** 0.282 0.212 0.176 0.707 0.882* 38.0 51.9 25.1 60.4 34.3 48.0 2274

(0.289) (0.278) (0.279) (0.271) (0.240) (0.233) (0.298) (0.287) (0.311) (0.294) (0.287) (0.541) (0.533)Ghana 0.199 0.133 -0.170 -0.169 0.195* 0.101 0.134 0.0929 0.162 0.0621 0.0408 0.380 0.367 19.8 36.4 15.2 22.5 19.3 21.5 6487

(0.140) (0.140) (0.153) (0.152) (0.108) (0.109) (0.162) (0.160) (0.142) (0.142) (0.142) (0.268) (0.270)Malawi 0.00730 -0.156 -0.230* -0.291** 0.161 -0.0585 -0.109 -0.269** 0.0513 -0.163 -0.155 -0.0690 -0.131 31.8 46.4 24.2 35.1 31.1 34.4 9241

(0.121) (0.120) (0.129) (0.128) (0.111) (0.113) (0.125) (0.127) (0.135) (0.135) (0.131) (0.238) (0.233)Niger -0.154 -0.114 -0.745*** -0.749*** 0.0209 0.00617 -0.168 -0.127 -0.0894 -0.102 -0.110 -0.167 -0.160 12.7 30.7 8.8 13.5 11.2 18.1 5603

(0.120) (0.123) (0.137) (0.137) (0.0916) (0.0923) (0.122) (0.125) (0.118) (0.119) (0.122) (0.294) (0.290)Zambia 0.0666 0.0808 -0.138 -0.129 0.174 0.128 0.0513 0.0600 0.0382 0.0402 0.0272 0.275 0.208 46.1 55.0 37.6 48.6 45.5 49.1 10711

(0.132) (0.124) (0.122) (0.118) (0.128) (0.122) (0.134) (0.127) (0.145) (0.138) (0.135) (0.291) (0.252)

Age & Age2 X X X X X X X X X X X XEducation & Urban X X X X X X XRemarried XLength of union & Age at cohabitation X

Data: Pooled DHS for each country. Weighted data. Specification: OLS regression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100if the union is interethnic. Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification, but on two sub-samples in columns (9) to (13). Columns (14) to (19) show the observed share of intermarriagesfor the different specifications and sub-samples. Column (20) displays the number of observations.Columns (1) to (13) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.(1), (2), (14) : Main specification, All women: Dependent variable: Interethnic marriages as observed in the data.(3), (4), (15) : Lower bound, All women: Dependent variable: Interethnic marriages, with all women who remarried counted as being in an interethnic union.(5), (6), (16) : Higher bound, All women: Dependent variable: Interethnic marriages, with all women who remarried counted as being in an intraethnic union.(7), (8), (17) : Higher bound, All women: Dependent variable: Interethnic marriages, with “other”-“other” unions counted as interethnic ones.(9), (10), (11), (18) : Main specification, First unions: Only women in their first union.(12), (13), (19) : Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

66A

ppendix

Table B-1.14: Robustness table –Interfaith marriages

Regression results - Dependent variable: interfaith marriage - Each cell: birth year coefficient from a separate regression Share of interfaith marriages

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)Sample All married women First union Remar. All married women First Remarried

Lower Upper Others Union NAssumptions Main Main Lower bound Upper bound Bound Other Main Main Main Main Main Main bound bound bound All

Panel A1 & B1: Decrease in interfaith marriage sharesBenin -0.150** -0.103 -0.559*** -0.535*** -0.00572 0.00693 -0.810*** -0.695*** -0.108 -0.0934 -0.106 -0.147 -0.145 16.7 28.4 13.4 36.4 15.8 21.5 10977

(0.0754) (0.0753) (0.0815) (0.0818) (0.0645) (0.0646) (0.120) (0.118) (0.0779) (0.0780) (0.0770) (0.173) (0.170)Burkina Faso -0.251*** -0.237*** -0.424*** -0.409*** -0.149* -0.147* -0.772*** -0.694*** -0.205** -0.201** -0.207** -0.491** -0.465** 12.1 22.0 10.4 23.3 11.7 15.2 9170

(0.0898) (0.0894) (0.0966) (0.0960) (0.0800) (0.0798) (0.153) (0.151) (0.0908) (0.0904) (0.0905) (0.221) (0.218)Gabon -0.570** -0.402 -0.619** -0.313 -0.287 -0.187 -0.584** -0.354 -0.469* -0.260 -0.249 -0.787* -0.722 18.6 39.2 12.4 20.9 17.0 22.9 2274

(0.239) (0.244) (0.282) (0.290) (0.192) (0.198) (0.246) (0.248) (0.263) (0.274) (0.275) (0.465) (0.458)Ghana -1.520*** -1.301*** -1.443*** -1.203*** -1.078*** -0.932*** -2.101*** -1.686*** -1.478*** -1.274*** -1.266*** -1.576*** -1.411*** 18.3 34.5 13.4 26.3 17.0 23.4 6487

(0.133) (0.131) (0.140) (0.139) (0.117) (0.117) (0.164) (0.155) (0.146) (0.145) (0.147) (0.242) (0.242)Kenya -0.189*** -0.123*** -0.101* -0.0187 -0.167*** -0.0985** -0.213*** -0.115** -0.173*** -0.0987** -0.101** -0.469** -0.508** 6.4 11.7 6.0 7.7 6.3 7.3 9169

(0.0445) (0.0451) (0.0593) (0.0588) (0.0417) (0.0417) (0.0513) (0.0554) (0.0443) (0.0443) (0.0440) (0.212) (0.218)Togo -0.214** -0.0945 -0.481*** -0.384*** -0.0375 0.0428 -1.054*** -0.385** -0.148 -0.0422 -0.0327 -0.447 -0.400 18.9 32.7 14.6 47.1 17.8 23.9 3701

(0.109) (0.112) (0.119) (0.125) (0.0975) (0.101) (0.177) (0.163) (0.118) (0.123) (0.122) (0.279) (0.282)Zambia -0.161*** -0.146*** -0.414*** -0.349*** -0.102** -0.0916** -0.158*** -0.141*** -0.141** -0.127** -0.130** -0.213* -0.208* 4.4 20.9 3.4 4.6 4.2 5.5 10711

(0.0496) (0.0492) (0.0998) (0.0983) (0.0448) (0.0441) (0.0506) (0.0499) (0.0560) (0.0552) (0.0543) (0.113) (0.114)Panel A2 & B2: No change in interfaith marriage sharesCote d’Ivoire 0.122 0.148 -0.0716 -0.0656 0.155 0.159 -0.357* -0.323 0.141 0.142 0.101 0.185 0.184 19.3 30.4 15.0 34.0 17.8 27.8 2677

(0.134) (0.134) (0.150) (0.148) (0.114) (0.116) (0.212) (0.206) (0.134) (0.135) (0.136) (0.405) (0.392)Guinea 0.0423 0.0481 -0.103 -0.0857 0.0453 0.0470 -0.0381 -0.0165 0.0469 0.0504 0.0356 0.0381 0.0245 5.1 19.7 3.9 9.1 4.6 7.6 4732

(0.122) (0.122) (0.166) (0.166) (0.109) (0.110) (0.230) (0.232) (0.129) (0.130) (0.130) (0.279) (0.279)Malawi -0.123 -0.0130 -0.500*** -0.274** -0.109* -0.0417 -0.163** -0.0376 -0.182** -0.0788 -0.0810 0.113 0.186 7.8 27.6 5.4 8.5 6.9 10.8 9241

(0.0786) (0.0816) (0.129) (0.132) (0.0586) (0.0584) (0.0820) (0.0838) (0.0778) (0.0772) (0.0768) (0.187) (0.200)Mali 0.00585 -0.000175 -0.385*** -0.379*** 0.0649 0.0498 -0.172* -0.177** 0.0474 0.0310 0.0158 -0.217 -0.227 6.2 16.2 5.3 8.0 6.0 7.7 8499

(0.0660) (0.0659) (0.0845) (0.0861) (0.0618) (0.0617) (0.0899) (0.0895) (0.0682) (0.0681) (0.0690) (0.188) (0.183)Niger 0.0912 0.0962* -0.690*** -0.678*** 0.0758 0.0767 0.111* 0.115* 0.0840 0.0854 0.0820 0.137* 0.139* 1.8 23.3 1.3 2.0 1.7 2.1 5603

(0.0572) (0.0567) (0.123) (0.122) (0.0544) (0.0549) (0.0620) (0.0614) (0.0691) (0.0697) (0.0692) (0.0817) (0.0795)Senegal -0.0399 -0.0469 -0.695*** -0.677*** 0.00852 0.00101 -0.0347 -0.0426 -0.00466 -0.0112 -0.0305 -0.160 -0.203 1.9 15.7 1.6 2.0 1.9 2.3 7777

(0.0594) (0.0585) (0.160) (0.158) (0.0510) (0.0509) (0.0605) (0.0595) (0.0620) (0.0619) (0.0609) (0.172) (0.168)Uganda 0.0496 0.0140 -0.148 -0.0970 0.103** 0.0728 0.0515 0.0166 0.127** 0.0881 0.0643 -0.114 -0.167 5.7 25.5 3.5 6.1 4.5 9.9 2465

(0.0570) (0.0603) (0.129) (0.132) (0.0430) (0.0453) (0.0576) (0.0615) (0.0567) (0.0613) (0.0616) (0.145) (0.143)Panel A3: Increase in interfaith marriage sharesCameroon 0.489** 0.523** 0.588* 0.771** 0.319* 0.308 0.481 0.736 0.432* 0.445* 0.413* 0.727 0.805 10.8 28.3 8.9 18.7 11.1 9.8 3066

(0.233) (0.236) (0.311) (0.309) (0.192) (0.193) (0.472) (0.458) (0.232) (0.234) (0.226) (0.573) (0.570)

ControlsAge & Age2 X X X X X X X X X X X X XEducation X X X X X X X XUrban X X X X X X X X

Data: Pooled DHS for each country. Weighted data. Specification: OLS regression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith. Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification, but on two sub-samples in columns (9) to (13). Columns (14) to (19) show the observed share of intermarriages for the differentspecifications and sub-samples. Column (20) displays the number of observations.Columns (1) to (13) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.(1), (2), (14) : Main specification, All women: Dependent variable: Interfaith marriages as observed in the data.(3), (4), (15) : Lower bound, All women: Dependent variable: Interfaith marriages, with all women who remarried counted as being in an interfaith union.(5), (6), (16) : Higher bound, All women: Dependent variable: Interfaith marriages, with all women who remarried counted as being in an intrafaith union.(7), (8), (17) : Higher bound, All women: Dependent variable: Interfaith marriages, with “other”-“other” unions counted as interfaith ones.(9), (10), (11), (18) : Main specification, First unions: Only women in their first union.(12), (13), (19) : Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

CHAPTER 2

PARENTAL DIVORCE AND CHILDREN’S EDUCATIONAL OUTCOMES IN

SENEGAL

Joint with Rozenn Hotte1

Abstract This paper provides new evidence on the consequences of parental divorce for chil-dren in Africa. Using survey data that collected the detailed life histories of Senegalese womenand their children, we investigate how children’s educational outcomes are affected by theirparents’ divorce. We use a sibling fixed-effects strategy that allows us to control for all thefactors that are common to all children in a family, such as parental preferences regarding edu-cation or the level of education of the parents, alleviating concerns of omitted variable bias. Wecompare children who were old enough to have been enrolled in primary school at the time ofthe divorce to their younger siblings, for whom enrollment decisions had not yet been madeat the time of the divorce. We find that younger siblings were more likely than their older sib-lings to have attended primary school. This higher level of investment does not persist in thelong run: there are no differences between siblings when considering primary school comple-tion. We find that custody and fostering decisions do not seem to mediate the positive effectson school attendance. Our findings are consistent with either an improvement of the financialsituation (due to remarriage) or an increase in the decision-making power of mothers after thedivorce

1This chapter was published in World Development (Crespin-Boucaud and Hotte, 2021). We thank Denis Cogneauand Sylvie Lambert for their advice as well as detailed comments that greatly improved the quality of this paper. Weare grateful to Sarah Deschenes, Oliver vanden Eynde, Michael Grimm, Helene Le Forner, Alexis Le Nestour, KarineMarazyan, Martin Ravallion, Paola Villar, Dominique van de Walle, and two anonymous referees for insightfuldiscussions, suggestions, and feedback on this paper. We also thank participants of the PSI-PSE and CFDS seminarsat the Paris School of Economics (PSE), of the internal seminar of the Department of Economics of the University ofSussex, of the CSAE Conference in Oxford, of the Nordic Conference on Development Economics in Copenhagen,of the Journees Augustin Cournot in Strasbourg, of the LAGV Conference in Aix-en-Provence and of the DIALConference in Paris for their helpful comments.

68 Parental divorce and children’s educational outcomes in Senegal

2.1 Introduction

Changes in family structure due to parental death or divorce are likely to affect children. How-

ever, while the consequences of orphanhood have been studied, little is known about the con-

sequences of divorce for children in sub-Saharan Africa, even though approximately 25% of

first unions end in divorce (Clark and Brauner-Otto, 2015). A divorce is likely to imply eco-

nomic losses, changes in caregivers, and psychological distress (Amato, 2000). All children

whose parents divorce may face these consequences, but in countries where formal safety nets

are scarce and poverty levels are high, such as those in sub-Saharan Africa, the consequences of

a divorce might be more severe than in developed countries and may potentially affect health

and access to education.

This paper addresses whether divorce affects the (basic) educational decisions parents make

for their children. We use data collected in Senegal to provide new insights into the conse-

quences of divorce in sub-Saharan Africa. Divorces in Senegal are neither rare nor common:

approximately 20% of first unions in Senegal end in divorce, close to the average divorce rate

in sub-Saharan Africa (Clark and Brauner-Otto, 2015). Moreover, couples from better-off back-

grounds are more likely to divorce than couples from poorer backgrounds (Lambert et al., 2019,

using the same dataset as we do), thus raising the question of whether children’s educational

outcomes suffer when their (educated) parents divorce.

We study whether age at divorce is correlated with primary schooling decisions. A simple

comparison of children according to the divorce status of their parents would not provide a

satisfactory answer to this question. Many parental and family characteristics are likely to si-

multaneously influence the probability that parents divorce and the schooling of their children,

leading to omitted variable bias. To avoid this issue, we use a sibling fixed-effects strategy that

allows us to control for all the factors that are common to all children in a family, such as

parental preferences regarding education or the level of education of the parents (Bjorklund

and Sundstrom, 2006; Le Forner, 2020). This strategy relies only on (within-family) differences

in age at the time of the divorce: we compare children for whom schooling decisions had not

yet been made by the divorce date to their older siblings for whom schooling decisions had

likely been made by the divorce date.

We study two related primary schooling outcomes. The first outcome is whether a child has

ever attended school (primary school attendance). This outcome captures the first investment

in (formal) education that a child can receive.

2.1 Introduction 69

The second outcome is whether the child has completed primary school (5th or 6th grade).

This outcome captures higher levels of investment in schooling as well as retention in the edu-

cational system. This paper focuses on primary school enrollment and does not discuss higher

levels of schooling due to sample size limitations.

The survey includes neither information on children’s academic performance (such as test

scores) nor the quality of the education to which they have access. These indicators would

have provided a more complete picture of the impact of divorce on education (Glewwe and

Kremer, 2006; Jones et al., 2014; Van der Gaag and Adams, 2010). While we do not know how

well children perform in school, it seems likely that children who attend school do not perform

worse than children who have never been enrolled in primary school: we can thus interpret an

increase in primary school as a likely increase in human capital.

We use the 2011 wave of the survey Pauvrete et Structure Familiale (PSF, De Vreyer et al. (2008)).

This survey combines two elements, the combination of which is rarely found in household

surveys, that are key to implementing the sibling fixed effects identification strategy. The PSF

survey collected detailed information on marital histories, including the year when and the

reason why—divorce or death—each marital union ended. It also collected information on

children younger than 25 born to all members of the surveyed households, thus including

children who do not live in the household. As custody of children is an outcome of divorce, in-

formation on children who are not household members ensures that the sample is not selected

on the results of divorce.

Overall, our findings suggest that divorce does not negatively affect the educational outcomes

of children who were young when their parents divorced. Children who were 5 or younger

when their parents divorced were more likely to have attended primary school than both other

children and their older siblings. We show that this difference is not caused by negative shocks

resulting both in divorce and in older children not attending primary school. The positive

effect we observe for younger children is instead explained by the fact that divorced parents

seem to be able to compensate their younger children after a divorce. This positive effect on

school attendance is robust to varying the cutoff for the age at divorce. However, this finding

must be nuanced: children who were 5 or younger by the divorce date were not more likely

to have completed primary school than their siblings who were of primary school age at the

divorce date or than their siblings who were supposed to have completed primary school by

the divorce date. Finally, children who were between 6 and 9 years old when their parents

divorced were as likely to have completed primary school as their older siblings. This finding


suggests that divorce might not have negative consequences for children when considering

primary-level educational outcomes.

Custody and fostering decisions do not seem to drive the positive effect on primary school

attendance. Part of the effect observed seems to be driven by the children of women who

remarry, which is consistent with the fact that remarriage often allows women to improve their

financial situation. Another potential channel is that some women might see their decision-

making power increase after they divorce: if a woman’s preference for education is higher than

her ex-husband’s, she might invest more in her children by sending them to school after the

divorce. Nevertheless, children who were young when their parents divorced were not more

likely to complete primary school than their older siblings, which indicates that this initial

investment is not sustained long enough to allow children to complete their primary education.

This paper makes two contributions to the literature. First, we provide new evidence that

divorce does not necessarily negatively affect children’s basic educational outcomes using a

sibling fixed-effects methodology that allows us to control for selection into divorce.2 While

not finding a negative impact of divorce on schooling outcomes may be surprising, it is in

line with recent evidence on the health outcomes of children whose parents divorce. Smith-

Greenaway (2020) concludes that in sub-Saharan Africa, following divorce, children’s health

benefits from their biological parents’ education to the same degree as children with married

parents, highlighting that selection into divorce and remarriage might be what drives the dif-

ferences previously observed between children.3

Second, this paper expands the literature on the link between divorce and children’s education

in developing countries by providing results in the Senegalese context that challenge previous

findings. Gnoumou Thiombiano et al. (2013) find that in Burkina Faso, children of divorced

parents are less likely to attend school than their counterparts but do not discuss selection into

divorce. Chae (2016) uses specifications with child fixed effects and finds that parental divorce

in rural Malawi is associated with lower grade attainment and a lower likelihood of children

attending school at the time of the survey. Our findings are at odds with both of these papers,

which is not surprising given that the selection into divorce is different in these three countries.

While divorce rates are approximately 50% in rural Malawi and very low in Burkina Faso, the

2When considering the broader category of paternal absence, the consensus seems to be that the death of thefather has either negative or no consequences for his children, depending on the outcome considered (Beegle et al.,2006; van de Walle, 2013), and that the death of the mother has negative consequences for her children (Beegle et al.,2010; Case and Ardington, 2006). A divorce does not necessarily imply paternal absence, and the impacts of bothshocks are likely to differ due to differences in the type of shock as well as differences in the characteristics of thefamilies affected by these shocks.

3Clark and Hamplova (2013) and Gnoumou Thiombiano et al. (2013) conclude that children of divorced mothershave worse health outcomes than children whose parents are still married.

2.2 Data: Pauvrete et structure familiale 71

Senegalese divorce rates are close to 20%. Relatedly, divorced mothers in Senegal are more ed-

ucated than their nondivorced counterparts, which is not the case for divorced mothers in rural

Malawi or in Burkina Faso. In the Senegalese case, children of divorced parents do not seem

to have worse schooling outcomes due to their parents’ divorce. This finding emphasizes the

need to study the consequences of divorce in several societies since context influences selection

into divorce and the potential consequences for children.

The rest of the paper is organized as follows. We introduce the dataset and survey used in our

analysis in section 2.2. Section 2.3 introduces elements of the context surrounding divorce and

education in Senegal. Section 2.4 details the identification strategy. Section 2.5 presents the

results and robustness checks. Section 2.6 discusses the channels that may mediate the impact

of parental divorce on children’s primary school enrollment. Section 2.7 concludes.

2.2 Data: Pauvrete et structure familiale

We use the second wave of the survey Enquete Pauvrete et Structure Familiale4 (PSF) that was

conducted in Senegal in 2011.5 The survey is described in detail in De Vreyer et al. (2008).

The PSF database has two specificities that allow us to identify women who divorced and chil-

dren born to parents who later divorced. First, the PSF records detailed information on marital

histories. Respondents were asked how many times they had experienced marital dissolu-

tion and, if relevant, what the reason for the most recent marital dissolution was (separation

or death of their husband). Respondents also provided the dates when their current and last

unions began and ended. In the case of a divorce or of a separation, the date likely reflects the

time when the separation became effective rather than the end of the eventual legal process.

Second, the PSF includes information on the children of each household member, regardless of

whether the children are themselves household members. Each individual in the household is

asked to indicate which children living in the household are her own and to list her children

living elsewhere, provided that they are younger than 25 years old. Thus, there is no selection

of children based on whom they live with, ensuring that our results are not biased by decisions

regarding custody and place of residence of the children after a divorce.

4Momar Sylla and Matar Gueye of the Agence Nationale de la Statistique et de la Demographie of Sene-gal (ANSD), Philippe De Vreyer (University of Paris-Dauphine and IRD-DIAL), Sylvie Lambert (Paris School ofEconomics-INRA) and Abla Safir (World Bank) designed the survey. The data collection was conducted by theANSD.

5The first PSF wave covered a representative sample of the Senegalese population, and the second wave includedrespondents of the first wave and the household members living with them. The number of respondents was almosttwo times higher in the second wave (28 000 individuals versus 14 450 individuals) than in the first wave. Thesample of interest is not large enough in the first PSF wave; hence, we decided to use the second wave.


We focus on divorced mothers and their children and do not study fathers who have divorced.

There are two reasons for this choice. First, we cannot identify the children affected by a di-

vorce through their father if he was polygamous at the time of the divorce, as the data contain

information on the date of the divorce, but not on the rank of the woman whom the man di-

vorced (for individuals already surveyed in 2006).Second, we need information on children’s

age at the divorce date, and women are more likely to accurately report the age of their children

than men, who, on average, have more children than women due to high polygamy rates.6 The

data we use are partly retrospective and, in some cases, concern children who are not house-

hold members. However, misreporting should not be a major issue, as it is likely that mothers

remember birth, marriage, and divorce years and know whether their children went to school,

even if they are not living in the same household as their children.

There are two main limitations to using the PSF. First, households in which divorced women

live at the time of the survey are not the households in which these women and their children

lived before the divorce.7 Information on the previous household is more limited than infor-

mation on the current household. The data allow us to know if the child was living in the

surveyed household or if she had already left it, but we do not have retrospective information

on the exact place of residence at the time when the child was in school. Second, information

collected on children who are not members of the surveyed household is limited. Two types

of education-related questions are asked about these children: whether the child is currently

attending school and what her highest level of education is. Information on the age at which a

child started school was collected only for children living in the surveyed households.

2.3 Background: Divorce and education in Senegal

2.3.1 Insights on divorces in Senegal

Two different worlds: Legal and customary divorces

We draw from a report by Lagoutte et al. (2014) that provides detailed information on and

analyses of marital dissolution practices in Senegal.

Since 1973, according to the Code Senegalais de la Famille, legal marital dissolution (divorce)

must be pronounced, even if the marriage was not legally registered. However, qualitative643% of married men older than 45 years old have more than one wife. Polygamous men who live with all their

wives have on average 7.5 children.7We can retrieve information on the household that existed before the divorce only if the divorce took place

between the two waves. There were only 65 women who divorced between the two waves. Among these 65women, only 43 had at least one child with their husband, and among them, only 24 had a child older than 6 in2011.

2.3.1 Insights on divorces in Senegal 73

work conducted by one of the authors suggests that most women do not formally divorce but

instead receive a customary divorce. Under customary law, a wife may ask her husband for a

divorce, but he needs to agree to the divorce for it to be effective. Conversely, a husband can

divorce his wife even if she does not agree to the divorce.8 Under the 1973 Senegalese family

law, which is meant to protect women and ensure gender equality, both parties can file for

divorce. However, women are much more likely to file for divorce than men, as women make

approximately 75% of the claims in court. This discrepancy may be because in a polygynous

society, a man can marry another woman rather than seek a divorce in the case of an unhappy

marriage. Most of the divorce rulings are granted on the grounds of incompatibilite d’humeur

(a rather vague term; literally, “mood incompatibilities”) or of “defaut d’entretien par le mari”

(the husband failed to support his wife economically).9 The existence of these two grounds for

divorce is anterior to the 1973 Senegalese family law: both are included in Islamic law and in

customary law.

If children were born to a couple who divorces, then who is to have custody of the children

needs to be decided. There is uncertainty over children’s residence: Lambert et al. (2019) stress

that women often declare in interviews that they worry about their children being taken away

from them should they separate from their husbands. In the case of a formal divorce, the judge

decides on the children’s residence and declares one parent to be the main caregiver. Mothers

are usually granted custody of their daughters and of their young sons, and the judge can order

the father to pay child support. In the case of a customary divorce, fathers can claim custody of

their children as soon as they are no longer being nursed. If the children stay with their mother,

their father might contribute to their living expenses if he is able and willing to do so. Below,

we describe the patterns we observe in the data regarding the families that are affected by a

divorce.

Characteristics of divorced mothers and their children

We contrast the characteristics of divorced mothers with the characteristics of mothers who

have never divorced (Panel A, Table 2.1).10 Among mothers of children younger than 25 years

8There are reports that repudiation—a unilateral divorce right granted only to the husband—while outlawed, isstill practiced in Senegal. However, no woman ever mentioned having been repudiated during qualitative inter-views, which might be due to social stigma associated with the practice.

9Lagoutte et al. (2014) reports that few divorces are jointly filed. The two grounds for divorce most commonlycited are also likely to hide other reasons for divorce, such as infidelity. Additionally, alimony can be provided onlyif the husband filed for divorce under the motive that he does not get along with his wife or in the case of a divorcedue to a serious illness.

10Throughout the paper “divorced mothers” refers to divorced mothers who had, at the time of the survey, atleast one child younger than 25 who was born from the union that ended in a divorce. Similarly, “mothers” refersto women who had at least one child younger than 25 years old on the survey date. It is important to note that 20%of women who divorce are childless: those women are not included in our analysis.


old, 11% have ever divorced.

Divorced mothers, on average, seem to be better off than their counterparts: they are more

educated, more likely to live in an urban area and more likely to come from a rather well-off

background, as indicated by the fact that the fathers of divorced mothers are more likely to be

self-employed or state-employed. This difference is not surprising: women who divorce need

to access resources to compensate them for the (potential) loss of resources associated with the

divorce, especially if their ex-husbands provided for them during their marriage. Educated

women are likely to have better outside options than uneducated women: they have access to

more valuable jobs and to better matches in the (re)marriage market.11 Their family network

may be able to provide more financial assistance. Additionally, the profession of the father

is a proxy for social class and may capture how empowered women are. Divorced women

are therefore positively selected: this may mean that either most women in our sample have

chosen to divorce their ex-husbands or that men do not abandon women who are vulnerable.

When considering the type of marriage, mothers who are the first wife in a polygamous union

are less likely to divorce than other mothers. Women of higher rank in a polygamous union are

more likely to divorce, which could be due to difficulties cohabiting. 12

When including all the variables defined before the divorce in a linear probability model (LPM)

(column 4), the findings are similar: higher levels of education as well as some categories re-

lated to fathers’ occupations are associated with a higher likelihood of divorce. Additionally,

the coefficients associated with two ethnic groups (Wolof and Poular) are also significant: social

norms and practices related to divorce may vary across groups. 13

This positive selection into divorce is also seen in the characteristics of the children (Panel B,

Table 2.1). Children whose parents divorced were 7 percentage points more likely to have

attended primary school than children whose parents did not divorce, 5 percentage points

more likely to have completed primary school by the expected age, and 6 percentage points

more likely to have attended secondary school. This difference disappears when we control for

the education of the mother and is consistent with the vast literature linking parental education

to investments in children’s human capital (on Senegal, Dumas and Lambert (2011)).

Children whose parents have divorced have, on average, a lower birth order than children

1176% of the sample of divorced mothers had already worked before the divorce (and this proportion does notvary according to educational status).

12These results should be interpreted with caution, as they could be affected by measurement issues. For instance,if a woman divorced her husband because he was about to take a second wife, she could declare her previousmarriage to have been either a polygamous union (with her being the first wife) or a monogamous union.

13These differences in the probability of divorce do not affect our identification strategy, since we compare chil-dren who have the same mother.


Figure 2.1: Age of mothers and of their children at the time of the divorce

(a) Age of mothers at the time of the divorce.0

24

68

10Pe

rcen

t

10 20 30 40 50 60Age at divorce

(b) Age of children at the time of the divorce.

05

1015

Perc

ent

0 5 10 15 20 25Age at divorce (children)

whose parents have not divorced. This difference is explained by the fact that there are fewer

children born to parents who divorce than to parents who do not; thus, there are few children

with a high birth order among children of divorced parents.

Characteristics at the time of divorce

Mothers were on average 29 years old when the divorce took place and almost all mothers

who divorced did so when they were between 20 and 35 years old (Figure 2.1a).14 The average

length of the marriage before divorce was 7 years, and the median was 5 years. Divorced

mothers had on average 1.77 children from their last union: 50% had only one child and 30%

had two children. Mothers who had not divorced had on average 2.96 children born to their last

union. This difference is explained by the fact that few women divorce after long marriages,

resulting in a lower number of children being born into the union. Relatedly, women who

divorce do so, on average, shortly after the birth of a child: 73% of divorces occur when the

youngest child is a toddler. Hence, children whose parents divorce are rather young: children

are on average 6 years old at the time of the divorce and few children are older than 10 when

their parents divorce (Figure 2.1b).

Characteristics of households after a divorce

Situation of divorced mothers A divorce almost always results in changes in the household

composition. The two most common types of living arrangements for a woman after her di-

vorce are to remarry or to move back in with her parents. A total of 41% of ever-divorced

women remarried at the time of the survey, and 42% lived with at least one of their parents.

Few divorced women live on their own due to financial constraints as well as social norms that

14The irregular shape of the histograms is likely due to age heaping. However, there is no heaping on divorcedates with respect to the survey date.


Figure 2.2: With whom do children usually live?

(a) Children whose parents have not divorced.

0

20

40

60

80

100

0-5 years 5-10 years 10-15 years 15-25 yearsby age on the survey date

Lives with other people Lives with motherLives with father Lives with both parents

(b) Children whose parents have divorced.

0

20

40

60

80

100


Lives with other people Lives with motherLives with father

prescribe that women of child-bearing age must be married (Lambert et al., 2019). Divorced

women may also choose to move back into their parents’ house after a divorce to obtain finan-

cial and emotional support as well as help with the children.

At the time of the survey, divorced mothers were part of households that were on average

wealthier in terms of per capita household consumption levels than women who were not di-

vorced (Panel A, Table 2.1). This finding is consistent with the fact that the selection into divorce

is positive. It also means that the potential negative impact of divorce on financial resources

does not lead to a reversal in women’s relative situations in terms of financial resources.

Custody: With whom do children of divorced parents live? A total of 64% of the children

of divorced parents live with their mother. Whatever their age, this is the most common situ-

ation (Figure 2.2). Nevertheless, whether parents are divorced or not, with whom the children

live is a function of both the child’s age and his/her gender. Teenagers and adults are more

likely to live with people who are not their parents, mostly because of marriage or work.15

Age-related coresidence patterns, however, vary more subtly for children whose parents are

divorced. Among children whose parents are still together, very few live with their father but

not with their mother. This share increases only slightly with age. Among children of divorced

parents, the share of children who live with their father increases as children become older.

Second, daughters are less likely to live with their father than sons are (Figure A-2.1 in the

Appendix). Both findings are consistent with qualitative reports that suggest that fathers can

claim custody of their children once they turn 7 and that they more often live with their sons

than with their daughters. Young children who do not live with either of their parents are

usually fostered. A fostered child is a child who was sent to live with a host family (often to15Twenty-nine percent of all young women (15-25) and 4.5% of all young men are married. This share is lower

(but not significantly different) for children of divorced mothers: 26.6% and 1.2%, respectively, for daughters andsons.


relatives: grandparents, an uncle or an aunt) by her parents (Marazyan, 2015). Fostering is

more common for children of divorced mothers (11% versus 6%) than for other children. This

difference remains significant even when controlling for the education of the mother.


Table 2.1: Characteristics of divorced women and of their children

(1) (2) (3) (4)Descriptive statistics Mean Mean Difference LPM

Panel A: Mothers Has divorced Never divorcedPre-divorce characteristicsAge 35.80 36.23 -0.44 0.00Highest education levelNo formal education 0.52 0.66 -0.14***Primary 0.31 0.20 0.11*** 0.04***Secondary or higher 0.15 0.09 0.06*** 0.04**Qur’anic 0.35 0.34 0.01 -0.01Ethnicity & religionMourid brotherhood 0.37 0.33 0.03 0.01Wolof 0.47 0.44 0.03 0.02*Serere 0.12 0.12 -0.01 0.01Poular 0.27 0.24 0.03 0.03**Father’s occupationInactivity of the father of the wife 0.13 0.14 -0.01Farmer 0.27 0.43 -0.16*** -0.02Independant or informal employee 0.26 0.21 0.05** 0.01State-employed or employer 0.24 0.15 0.09*** 0.03+Occupation unknown 0.09 0.07 0.02 0.07**Characteristics of the marriagePolygamy, first rank 0.10 0.18 -0.09*** -0.02**Polygamy, higher rank 0.25 0.18 0.06*** 0.03**Characteristics on the survey dateLives in rural area 0.37 0.57 -0.21***Household Consumption pcapFood expenditures (hh) 189259.97 164698.05 24561.93Other expenditures (hh) 263200.18 159468.74 103731.43**Family compositionMother lives with one of her parent 0.42 0.11 0.31***Number of children alive 3.21 3.43 -0.22Number of children (≤25 y.o) 2.76 3.03 -0.27**Number of children - last uniona 1.77 2.96 -1.19***

Number of womenb 290 3,952 4,242 3,927

Panel B: Children older than 7 Divorced Never divorced parentsAge 14.50 14.97 -0.46+Birth order 2.37 3.48 -1.11***Child is a girl 0.48 0.49 -0.02Child lives with mother 0.59 0.81 -0.22***Has been fostered 0.12 0.08 0.04***Attended primary school (≥ 7 y.o.) 0.72 0.65 0.07***Completed primary school (≥ 10 y.o.) 0.51 0.47 0.05+Attended secondary school (≥ 14 y.o.) 0.42 0.36 0.06*

Number of children 387 8,030 8,417

Note: The table presents characteristics of mothers and of their children according to whether they experi-enced a divorce. Panel A Mothers of at least a child younger than 25 on the survey date. Divorced mothersare women who have divorced from the father of at least one of her children younger than 25 on the surveydate. Other mothers appear in the “never divorced” group (that thus includes all still-married mothers butalso widows). Panel B All children older than 7. Education levels variables are defined only for childrenolder than an age threshold (mentioned in parentheses) that reflects how the Senegalese school system isorganized, so the composition of the sample changes for these variables. The number of observations re-ported (387 and 8,030) correspond to the number of children older than 7 on the survey date in each group.Specifications Column (1) reports the mean of each variable for mothers who have divorced and for childrenwhose parents divorced. Column (2) reports the mean of each variable listed for mothers who have not di-vorced and for children whose parents did not divorce. Column (3) reports the difference in means betweenthe group of mothers (children) who experienced a divorce and the group of mothers (children) who didnot (column (2) - column (1)). Significance levels from a t-test are also reported. Column (4) reports theresults of a linear probability model in which the dependent variable is a binary variable that takes on thevalue 1 when a woman has divorced and 0 otherwise. Significance levels are denoted as follows: + p<0.15,* p<0.10, ** p<0.05, *** p<0.01.a Number of children younger than 25 being born either into the current union (if not divorced) or into thelast union (if divorced). Half-siblings of children whose parents divorced are excluded from the sample.b The number of observations displayed corresponds to the number of mothers in each group. For somecharacteristics, the number of observations is lower than what is displayed, due to missing values on theQur’anic and polygamy variables. The number reported in column (4) corresponds to the number of womenfor whom the variables included in the model are not missing.

2.3.2 The Senegalese education system 79

2.3.2 The Senegalese education system

A dual education system In Senegal, schooling is mandatory for children aged 6 to 16.16 Chil-

dren must attend school within the formal school system. The formal school system is made

up of 4 educational blocks: preschool, primary school, secondary school (middle and high

schools), and higher education. Formal schools are referred to either as “French schools” or as

“French-Arabic schools”, depending on the language of instruction of the school. Most chil-

dren start attending school after they have turned 5 and before they turn 7, but some children

start attending school for the first time when they are older than 7 (Figure 2.3a). 17 Despite

the fact that schooling is mandatory from the age of 6, not all children attend primary school.

The low supply of schools in rural areas (Cisse et al., 2004) may explain why some children

start attending school at older ages than 6. Moreover, public schools do not charge school fees,

but there are additional monetary costs to attending school, such as transportation fees and the

costs of school supplies, that may prevent some families from sending their children to school.

Religious schools, known as Qur’anic schools or as daara in Wolof (Chehami, 2016), are com-

mon in Senegal, but are not part of the formal education system. While there exist a few

Qur’anic schools teach both a standard curriculum and a religious one, most focus almost

exclusively on religious education (Andre and Demonsant, 2014). The religious education sys-

tem and the formal education system are not necessarily exclusive: children can attend primary

school and a part-time Qur’anic school.18

Outcome variables We study three educational outcomes. The main variable of interest is

whether a child has any primary schooling. This outcome captures the first investment in

(formal) education that a child can receive. If a child attended a Qur’anic school but never at-

tended primary school, she is considered as having no primary education, since the curriculum

of Qur’anic schools is mainly religious. In our main specification, we consider this variable to

be defined for children who were older than 7 by the survey date. We check that our results are

robust to moving this age cutoff upwards, excluding children up to the age of 10.

The second variable of interest is whether a child (almost) completed primary school. We con-

16The law n° 2004-37, passed in December 2004, made primary schooling mandatory for children older than six.As the survey took place in 2011, children aged 6 at the time of the survey should hence have been enrolled inschool at the time the survey was conducted.

17Some children attend preschool, hence they appear as having attended formal school even when they areyounger than 6. Children who attend formal preschool usually go on to attend formal primary school.

18Using information on all children who are members of surveyed households, we find that 38% of childrenwho attended primary school also attended a Qur’anic school. 47% of children older than 7 who have never beenenrolled in primary school attended a Qur’anic school. Only 17.5% of children attended neither Qur’anic norprimary school.


sider that a child completed primary school if she attended fifth grade (CM1, the second to last

grade in primary school).19 It hence captures an increased investment in children’s education

than the first outcome variable does. A child who attends primary school will not necessarily

complete it: only 70% of children who have attended primary school complete it. Several fac-

tors explain why the completion rate is not higher. First, grade repetition is common in Senegal

(Ndaruhutse et al., 2008). In 2006, 12% of students had repeated at least a grade (Boubacar and

Francois, 2007). As the opportunity cost of schooling increases with age, repeating a grade is

likely to increase the likelihood that the child drops out of school. Second, the supply of schools

is even more limited for the higher grades of primary education: 36% of primary schools do

not offer the whole primary cycle (Boubacar and Francois, 2007). In our main specification,

we consider this variable to be defined for children aged 10 and older, as children who start

school at age 6 are supposed to be in fifth grade by age 10. Due to grade repetition, some chil-

dren complete primary school by age 13 or 14. As such, the outcome variable studied captures

whether a child has completed primary school at the time of the survey. It is likely that a few

children will complete primary school by an age greater than 10.

The third variable of interest is whether a child has (exclusively) attended Qur’anic school.

Attending a Qur’anic school was reported separately from the highest level of education for

children living in the surveyed households. For children living elsewhere, Qur’anic school

was only listed as a possible answer to the question regarding their highest education level, so

it is reported only for children who did not attend primary school.20 Hence, we cannot study

whether children attended Qur’anic school as a complement to formal schooling. Considering

other outcomes (such as transitions into secondary school) is not possible due to sample size

limitations.21

19For children who were not living in the surveyed household, the list of possible answers to the highest level ofeducation question pooled together the last two years of primary school (“5 or 6 years of primary school”), thus wecannot distinguish fifth grade completion from sixth grade completion.

20Enumerators were trained to report any formal schooling as the highest level of education, even if the child hadalso attended a Qur’anic school.

21A sibling fixed effects model estimated for secondary school attendance would be estimated on 23 identifyingobservations.

2.4 Methodology 81

Figure 2.3: Formal education by age and birth year

(a) Share of children who have any formalschooling.

0.2

.4.6

.8An

y fo

rmal

sch

oolin

g

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25Age on the survey date

Girls Boys

95% confidence intervals

(b) Share of individuals who have any formalschooling (corresponding ages: 10 to 70 y. o.).

0.2

.4.6

.8An

y fo

rmal

sch

oolin

g

1940

1945

1950

1955

1960

1965

1970

1975

1980

1985

1990

1995

2000

Birth year

Women Men

95% confidence intervals

2.4 Methodology

How are children affected by their parents’ divorce? Many confounding factors could explain

the differences found when comparing children whose parents have divorced and children

whose parents have not divorced. The method we use relies on sibling fixed effects, thus con-

trolling for any (potentially unobserved) factors that are common to siblings. These factors are,

for instance, the education level of the parents and of other family members, parental pref-

erences for education, the socioeconomic background of the family and its status within the

community, and family norms and rules, including the language spoken at home before the

divorce.

2.4.1 Empirical strategy

Primary school enrollment

Framework We consider the impact of parental divorce on children’s enrollment in primary

school. As children are supposed to start attending primary school the year they turn 6, we

consider children who were 5 or younger when their parents divorced to be those affected by

the divorce and children who were 6 or older when their parents divorced to be not affected

by the divorce in terms of this specific schooling outcome: whether a child has ever attended

primary school.22

22Reverse causality is unlikely to be at work here, given the chronology of the events considered. Children can beenrolled in primary school beginning at the age of 6, and children can be affected by divorce only when they are 5 oryounger by the divorce date. For affected children, the education decision could not have been implemented beforethe divorce decision was made. Conflict over education decisions (for instance, if parents have differing preferencesregarding their children’s education) may preexist, but its effects would be mediated by the decision to divorce (forinstance, if a divorce results in a change regarding who makes decisions about the children’s education).


In our analysis, we consider primary school and Qur’anic school to be substitutes, and we de-

fine the variables with respect to that idea: the Qur’anic school variable captures attendance

at a Qur’anic school for children who do not attend primary school. To compare both types of

educational choices, we study the likelihood that a child has (exclusively) attended Qur’anic

school using the same framework and specifications used when studying primary school en-

rollment.

Model 2.1: Basic LPM The first model is a linear probability model23 without fixed effects,

specified as follows:

AnyPrimarySchoolis = α0 +α1AgeAtDivorce0/5i +α2AgeAtDivorce6/25i +βControlsi + εis

(2.1)

In this equation, i denotes children and s denotes a family (defined as a group of full siblings).

Standard errors are clustered at the family (sibling-group) level.

Variables The outcome variable, AnyPrimarySchool, is an indicator variable that takes on the

value 1 when a child has attended or is attending primary school and 0 otherwise. The main

variable of interest, AgeAtDivorce0/5, is an indicator variable that takes on the value 1 if the

child was 5 or younger when her parents divorced and 0 if her parents either divorced when she

was 6 or older or did not divorce. AgeAtDivorce6/25 is a binary variable that takes on the value

1 if the child was 6 or older when her parents divorced and 0 otherwise. Controls is a vector of

individual characteristics that includes the following variables: a binary variable that takes on

the value 1 if the child is a girl, quadratic controls for year of birth, and four binary variables

that account for birth order (birth orders higher than 4 are grouped together).24 We discuss the

correlations between each of these variables and the outcome variables in the Appendix of the

paper (Table A-2.1 and section A-2.1.). In an alternative specification, we add binary variables

for the highest education level of the mother as well as the interactions of those variables with

the variables included in Controls.

23We do not estimate logit models, as logit models with fixed effects estimate results only for groups in whichthere is variation in the outcome variable: many observations are lost, and this results in the control variables beingpoorly estimated. Therefore, to compare the results from the model without fixed effects with the results using thesibling fixed-effects model, we specify the first model as a linear probability model. The results using a logit areconsistent with the results estimated using the LPM without sibling fixed effects.

24The model includes only individual controls. Family size is not included in the controls, so higher birth ordersalso capture larger family sizes in the basic linear probability model. In the model with sibling fixed effects, familysize is captured by the fixed effects, as family size is the same for all full siblings. The coefficient on age at divorce,0-5, remains positive and significant (0.137) when we include family size in the basic linear probability model. Thecoefficients themselves on the birth order variables vary very little when adding family size to the basic linearprobability model.

2.4.1 Empirical strategy 83

Model 2.2: LPM with sibling fixed effects The second model is a linear probability model

with sibling fixed effects.25 Including sibling fixed effects is equivalent to controlling for (po-

tentially unobserved) factors that are common to all siblings. Estimates from this model should

hence be less biased than estimates from the basic LPM.

PrimarySchoolis = α0 + α1AgeAtDivorce0/5i + βControlsi + γs + εis (2.2)

In this equation, i denotes children and s denotes a family (defined as a group of full siblings).

Standard errors are clustered at the family (sibling-group) level.

Variables The outcome variables, the variable AgeAtDivorce0/5, and the vector of variables

Controls are defined as in model 2.1. γs represents the sibling fixed effects. As sibling fixed

effects are included, the right-hand-side variables can only be estimated if they vary within

families. Thus,AgeAtDivorce6/25 cannot be estimated, as it is collinear withAgeAtDivorce0/5

once the fixed effects are included.

Sample The estimation sample is made up of children older than 7 who have at least one

full sibling who is also older than 7. These conditions on age at survey date and on family

composition are needed given that the outcome variable is defined only for children who were

older than 7 by the survey date and that the sibling fixed-effects model can only be estimated

for families with at least two children. We consider only full siblings in the analysis.26 To

compare the results from the two models, we estimate the basic linear probability model on the

same sample as the (more data intensive) sibling fixed-effects model.

Primary school completion

Studying primary school completion allows us to test whether divorce has longer-run conse-

quences for children’s educational outcomes. We modify the analysis framework described

above to take into account the higher age threshold (10 years old) associated with this variable.

The higher age threshold reduces the sample size, thus limiting the type of analyses that can

be conducted. The results for primary school completion should hence be interpreted with

caution.

25Such models have been widely used in the literature on the impact of divorce (Bjorklund and Sundstrom, 2006;Bratberg et al., 2014; Ermisch and Francesconi, 2001; Francesconi et al., 2010; Le Forner, 2020).

26The results do not change when we include half-siblings from the mother’s side to the sample (we use motherfixed effects instead of (full) sibling fixed effects).


Framework and sample We consider the impact of parental divorce on whether children

reached 5th grade in primary school (by the expected age). Since children who start primary

school at age 6 should be attending 5th grade by age 10, we consider children who were 9 or

younger when their parents divorced to be affected by the divorce and children who were 10

or older when their parents divorced to be not affected by the divorce when we study primary

school completion. Hence, the estimation sample is made up of children older than 10 who

have at least one full sibling who is also older than 10.

Model 2.3: LPM with sibling fixed effects

CompletedPrimaryis = α0+α1AgeAtDivorce0/5i+α2AgeAtDivorce6/9i+βControlsi+γs+εis(2.3)

In this equation, i denotes children and s denotes a family (defined as a group of full sib-

lings). Standard errors are clustered at the family (sibling-group) level. The outcome vari-

able CompletedPrimary is an indicator variable that takes on the value 1 when a child has

completed primary school (reaching the fifth year of primary school out of six years) and 0

otherwise. The main variables of interest are the variable AgeAtDivorce0/5 and the variable

AgeAtDivorce6/9. Both are indicator variables that take on the value 1 if the child’s age at the

time of the divorce is within that variable’s specific age range and 0 otherwise. The vector of

variables Controls is defined as in model 2.1. γs represents the sibling fixed effects.

2.4.2 Identification and interpretation issues

Identifying assumption The SFE model identifies the causal effect of divorce under the as-

sumption that in the absence of divorce, children younger than 5 at the time of the divorce

would have had the same educational outcomes as their older siblings. For this assumption to

be credible, two conditions must be fulfilled. First, the timing of the divorce must be as good as

random. Second, the siblings must have, on average, the same potential educational outcome:

there must not be systematic differences in ability or endowments between siblings that are

dependent on birth order. We discuss these two points below.

Conditional on divorce occurring, is the date of the divorce a random event?

The marriage market literature uses the idea of “sympathy shocks” (Dupuy and Galichon, 2014)

that occur randomly and increase the quality of a match. Similarly, the quality of a match could

be decreased by a random “reverse sympathy shock”, leading to divorce. During interviews

2.4.2 Identification and interpretation issues 85

conducted in Senegal, divorced parents’ narratives supported this idea. 27 This assumption is

backed by the fact that variables that capture family composition are not correlated with the

divorce date. First, the (wide) distribution of the age-at-divorce variable (Figure 2.1) suggests

that parents do not strategically time their divorce with respect to their children’s ages.28 Sec-

ond, families for which the coefficient of interest is estimated—those with (at least) one child

younger than 5 at the time of the divorce and one older than 5—are not different from other

families that also experienced divorce. The mothers of children for whom the coefficient of in-

terest is estimated do not appear to be different from other divorced mothers, apart from their

structural demographic characteristics (because the age of the women, the number of their chil-

dren, their children’s age, and the length of marriage are all correlated). Detailed results can be

found in the Appendix (Table A-2.2 and section A-2.2.).

Balance test on children’s characteristics

To support the assumption that the divorce date is not correlated with the characteristics of

the children that could also determine their educational outcomes, we conduct a balance test

of children’s characteristics across age-at-divorce groups. This test helps us check that older

siblings’ educational outcomes can be a credible counterfactual for their younger siblings’ out-

comes. Table 2.2 reports the results of this balance test on the estimation sample (siblings older

than 7 whose parents divorced) for the available individual characteristics determined before

the divorce. These variables are also the ones used as controls in the regression models.

As expected from a sample made up of siblings, there are systematic differences among age-at-

divorce groups when considering birth order and year of birth. As all siblings experience their

parents’ divorce on the same date, children who were older than 6 on the divorce date were

also older than their (younger) siblings on the survey date; hence, there was a 4.8-year average

age difference between these two groups. Relatedly, children older than 6 at the time of the

divorce were more likely to be firstborn children and less likely to be thirdborn.29 We might

worry that older siblings receive educational endowments that are on average different from

those of their younger siblings and that older siblings’ educational outcomes would therefore

27From qualitative interviews conducted in Senegal, it seems that some people do indeed consider divorces toresult from a reverse sympathy shock that older generations used to endure due to mougne (a Wolof term thatdescribes an attitude of resignation that allows one to endure difficulties).

28Qualitative interviews conducted in Senegal indicate that an individual’s priors regarding the impact of divorcediffer greatly—and often seem correlated with their own experiences—so the patterns we observe might also resultfrom timing according to different priors (e.g., “it is better to divorce while the children are young, so they will notbe affected by the conflict” or “it is better to stay together till the children are teenagers or young adults”).

29Birth order is computed using all children born to the same mother and thus includes older siblings who wereolder than 25 and half-siblings. Few children whose parents divorced have older half-siblings (from the mother’sside), and their birth rank is not affected by whether they have younger half-siblings. As a robustness check, weestimate the model on a sample that includes half-siblings. The coefficient on divorce remains large and positive(results available in the Online Appendix).


not be a good counterfactual for their younger siblings’ outcomes. To capture these potential

systematic differences in education endowments, we add birth order fixed effects to the model,

thus ensuring that the age-at-divorce variable does not capture birth order effects.30 Parents

may also make different educational decisions for their children depending on time-varying

factors: social norms, expectations, and school-related costs at the time a child should enter

school. To capture these potential systematic differences in education endowments, the model

includes time trends (quadratic controls for year of birth).

Age and birth order are mechanically correlated across siblings, but this is not the case for

gender. Hence, if the date of divorce is random, then the share of girls should be the same when

considering the different age-at-divorce categories. We confirm that this is the case (Table 2.2).

The identification assumption also requires that children younger than 5 at the time of the

divorce have the same health outcomes and the same educational abilities as children older

than 6 at the time of the divorce, as health and ability could affect their educational outcomes.

However, there is no reason why these differences should be correlated with the age-at-divorce

category in the Senegalese context.31

Threats to identification The SFE strategy does not distinguish between the impact of a di-

vorce and the impact of any other time-variant factors that would cause younger children not

to have the same educational outcomes as their older siblings. Given the Senegalese context,

a time-variant factor that triggers a divorce and causes younger children to have better educa-

tional outcomes than their older siblings is the main threat to identification.

Positive shock triggering a divorce A positive income shock that allows a woman to divorce would

be a confounding factor. The income shock could allow the woman to divorce and thereby

increase the likelihood that the younger children are sent to school. However, potential positive

income shocks are either extremely rare (because few women inherit from their parents, this

type of windfall income is unlikely to occur often enough to explain a large share of divorces)

or are likely to affect women who would already have the means to divorce (women with a

formal job are likely to already be well enough off to divorce if they wish to do so).

30In the results presented in the paper, birth order fixed effects are estimated on the sample made up of bothsiblings whose parents divorced and siblings whose parents did not divorce. The model can also be estimated ona sample made up only of siblings whose parents divorced, as there are children of each birth rank in both age-at-divorce categories. In this subsample, the coefficient on divorce remains large and positive (results available in theOnline Appendix).

31If parents divorce because their youngest child has a disability and the stress on the family is too high, then theage-at-divorce variable would capture differences in ability across siblings. However, this scenario is unlikely inthe Senegalese context: to the best of our knowledge, having a disabled child has never been mentioned as a causeof divorce, either in our own qualitative fieldwork or in research done by others (Dial, 2008).

2.4.2 Identification and interpretation issues 87

Table 2.2: Balance test of characteristics according to children’s age at the time of divorce

(1) (2) (3)Affected Not affected

≤ 5 at divorce ≥ 6 at divorceMean Mean Difference

Child is a girl 0.42 0.47 0.05Birth year 1999.20 1994.40 -4.81***First child 0.15 0.34 0.18***Second child 0.36 0.26 -0.09Third child 0.23 0.15 -0.08*Fourth and more 0.26 0.25 -0.00N 98 167 265

Note: The table presents characteristics of children according to their age atthe time of divorce.Sample: Children of divorced parents who are older than 7 on the survey date,only when they belong to a family in which two children are 7 or older on thesurvey date.Specification Column (1) reports the mean of each variable for children whowere younger than 5 when their parents divorced. Column (2) reports themean of each variable for children who were older than 6 when their parentsdivorced. Column (3) reports the value of the difference in means (column (2)- column (1)) and the significance of the t-test of the difference. P-values aredenoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.0

Negative shock triggering a divorce

Short-term adverse circumstances could trigger divorce and decrease the likelihood that chil-

dren who should attend school do so. When the situation improves, the younger siblings are

then more likely to attend school than the older children are. We cannot directly test this hy-

pothesis, as the data include information on only the latest shock experienced by households.

What we do is to compare children who could have been affected by this potential negative

shock (children aged 6 to 9 at the time of the divorce) to their older siblings (children aged 10

and older at the time of the divorce) who are arguably less likely to have been affected by the

negative shock (it would need to have taken place 5 years prior to the divorce). The results

show that children aged between 6 and 9 when their parents divorced are as likely to have

attended primary school as their older siblings, which indicates that the negative shock story

is unlikely to explain our results.

Exposure to conflict The occurrence of conflict is often correlated with divorce, but conflict might

take place both before and after the divorce.

The impact of conflict is likely to vary depending on children’s age and on the length of the

conflict, making it difficult clearly predict what the impact of conflict could be: older children

might be more affected, children may get used to conflict, conflict might worsen or recede after

the divorce, etc. Moreover, we expect that high conflict levels could impact children’s well-


being and some educational outcomes, such as test scores, but not necessarily the decision to

send a child to school, especially as children can be registered in primary school even if they

are older than 7. We hence do not consider conflict to be a threat to the sibling fixed effects

identification used.

2.5 Results

2.5.1 Results: Ever attended primary school

Main results Columns (1) and (2) of Table 2.3 report the results of two basic linear probability

models. Column (1) reports the results of a regression for whether the child has ever attended

primary school on indicator variables for the age-at-divorce groups. Being 5 or younger at the

time of the divorce is associated with a higher likelihood of having attended primary school.

When controlling for birth year, birth order, gender, and their interactions with the level of edu-

cation of the mother (column 2), the magnitude of the coefficient decreases, but it remains pos-

itive and significant. On average, children who were 5 or younger when their parents divorced

were 11 percentage points more likely to have attended primary school than their counterparts

whose parents did not divorce: this difference represents a 16% increase in the share of children

who have ever attended school.

Columns (3) to (5) report the results of three sibling fixed-effects models. Being 5 or younger

at the time of the divorce is still associated with a higher likelihood of having attended school

(column 3). The inclusion of controls does not change the results much (column 4); if anything,

the coefficient increases when the controls are added. The addition of the sibling fixed effects

slightly increases the magnitude of the coefficient of interest (column 2 vs. column 4). The

sibling fixed effects capture some unobserved characteristics that are common to siblings: the

basic LPM estimates seem to be slightly downward biased compared to the results estimated

with the SFE model.32

Column (6) reports the results of a regression for whether the child has (exclusively) attended

a Qur’anic school on indicators for the age-at-divorce groups and sibling fixed effects (the

same specification as in column 4). The estimated coefficient is -0.145: children who were

younger than 5 years old at the time of the divorce were less likely to have excusively attended

32Introducing sibling fixed effects changes the observations on which the coefficient of interest is identified: theidentifying families in the LPMs include families in which all the children were younger than 5 at the time of thedivorce while the identifying families in the SFE model include only families in which there is at least one childyounger than 5 at the time of the divorce and one older than 6. We reestimate the LPM on a restricted samplethat excludes families in which all children were younger than 5 at the time of the divorce so that the effects of theage-at-divorce groups are estimated on the same identifying families as in the SFE model. The coefficient of interestin the LPM is 0.131, which is still markedly different from the SFE coefficient.

2.5.1 Results: Ever attended primary school 89

a Qur’anic school than their older siblings.33 This finding indicates that the older siblings of

affected children are more likely, on average, to have attended Qur’anic school than no school

at all.

Our findings indicate that parental divorce does not necessarily lead to worse schooling out-

comes for children who were young at the time of divorce. It seems that parents might even

be able to (excessively) compensate their younger children after a divorce. In the remainder of

this paper, we test the robustness of these results and investigate which channels could mediate

them.

Table 2.3: Effect of parental divorce on primary school attendance and completion

(1) (2) (3) (4) (5) (6) (7) (8)

Specification LPM LPM SFE SFE SFE SFE SFE SFESample At least 2 children, older than 7 years old ≥2 children, ≥ 10Dependent variable Ever attended primary school Qur’anic only Ever attended CompletedAge at divorce0-5 y.o. 0.147*** 0.110** 0.139** 0.164*** 0.143** -0.145*** 0.164 0.0297

(0.0466) (0.0453) (0.0616) (0.0622) (0.0623) (0.0507) (0.115) (0.137)0-5 × girl 0.0517

(0.0912)6-9 y.o. 0.0287 0.0183

(0.0920) (0.103)6-25 y.o. -0.0200 -0.0734

(0.0510) (0.0494)Girl 0.0102

(0.00989)Controls No Yes No Yes Yes Yes Yes YesControls × educ No Yes No Yes No Yes No No

Share primary 0.65 0.65 0.65 0.65 0.65 0.19 0.66 0.48pvalue - joint significance a 0.03R2 0.0013 0.1200 0.0010 0.0558 0.0316 0.0388 0.0267 0.0971N observations 7,314 7,314 7,314 7,314 7,314 7,309 5,612 5,612N families 2,143 2,143 2,143 2,143 1,758 1,758N identifying obs. (0-5) 171 171 134 134 102 134 121 121N identifying families (0-5) 47 47 36 47 44 44

Note: Models (1) and (2) : Linear probability models. Models (3) to (8): Linear probability models with sibling fixed effects. Columns (1) to (5) and column (7): The outcome variableis an indicator variable that takes on the value 1 if the child has attended primary school and 0 otherwise. Column (6): The outcome variable is an indicator variable that takes on thevalue 1 if the child has attended (exclusively) Qur’anic school and 0 otherwise. Column (8): The outcome variable is an indicator variable that takes on the value 1 if the child hascompleted primary school and 0 otherwise.AgeAtDivorce0/5 is an indicator variable that takes on the value 1 if the child was 5 or younger when her parents divorced and 0 if her parents either divorced when she was 6 orolder or did not divorce. AgeAtDivorce6/25 is an indicator variables that takes on the value 1 if the child was 6 or older at time of divorce and 0 if either the child was youngerthan 5 or if her parents are not divorced.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value 1 if the child is a girl and 0 if thechild is a boy. C × educ variables include: the interaction of all the control variables with the highest education level of the mother (coded into 4 categories: no education, primary,secondary and higher, and unknown).At least 2 children, older than 7 years old Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 or older on the survey date. Atleast 2 children, older than 10 years old Children who are older than 10 on the survey date, only when they belong to a family in which two children are 10 or older on the survey date.

a p-value of the joint significance of the coefficient on AgeAtDivorce0/5 and the coefficient on its interaction with the variable girl.Robust standard errors in parentheses (clustered at the family (sibling-group) level). Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

Heterogeneity: Gender and age at divorce As girls are on average more likely to attend

primary school than boys, we test whether girls’ educational outcomes are more affected by

a divorce before the age of 5 than boys’ educational outcomes are. The coefficient associated

33We might be worried that older children were more likely to not live with their mother than their youngersiblings and that as a result, their schooling outcomes are not accurately reported and, more specifically, that theoutcome is reported as a Qur’anic or no education even though they did attend primary school. When estimatingthe sibling fixed-effects specification with “living in the surveyed household” as a dependent variable, the coeffi-cient associated with age at divorce, 0-5, is -0.0365 (s.e. 0.0758), thus indicating that children who were younger atthe time of divorce were not more likely to live in the surveyed households (with their mother). As such, the effectis not driven by reporting bias correlated with where the child lived on the survey date.


with the interaction term between age at divorce and gender is close to 0 and not significant in

either the basic LPM (not reported) or the SFE models (column (5)): girls and boys who were

younger than 5 on the divorce date do not seem to be differentially affected by their parents’

divorce when considering primary schooling. Given that the number of identifying children

and families in this specification is only slightly lower than in the main specification, it seems

that this insignificant result is not due to low power.

To understand whether the results are driven by specific ages at divorce, we estimate models

that include binary variables for each age at divorce (from 0 to 6 years old at the time of the

divorce) instead of two age-at-divorce categories (5 or younger and 6 or older). 34 The results

do not seem to be driven by a specific age group for children aged between 1 and 5 at the time

of the divorce. The coefficients associated with these age-at-divorce variables are displayed in

Figure A-2.2 in the Appendix.

Could the results be driven by a negative shock that lowered older children’s likelihood of

attending school? If a (negative) economic shock triggers (most) divorces, then we would

expect that divorce occurs rather shortly after the shock. For children who are older than 10

at the time of the divorce, the shock would need to have happened at least 5 years before the

divorce to affect their enrollment in primary school. As such, children older than 10 at the

time of the divorce are less likely to have been affected by a potential negative shock that also

triggers a divorce than children younger than 10 at the time of the divorce. We hence use

children older than 10 at the time of the divorce as the control group (not affected children) in

the main specification (column 7). The results show that children who were between 6 and

9 years old at the time of the divorce were as likely to have attended primary school as their

older siblings. We consider only children older than 10 on the survey date, so the sample size

is reduced, and the power is likely to be too low for us to detect significant effects. However,

the magnitude of the coefficient on the age at divorce, 6-9, variable (0.02) is much lower than

the magnitude of the coefficient on the age at divorce, 0-5, variable (0.164, also insignificant), so

the result for ages 6-9 at the time of the divorce can credibly be interpreted as an indication that

children aged 6-9 at the time of the divorce have the same schooling outcomes as their older

siblings.

We then estimate models that include binary variables for each age at divorce (from 0 to 9 years

old at the time of the divorce). Children aged 10 or older at the time of the divorce are hence the

control group, as discussed above. Figure 2.4 displays the coefficients associated with each age-

34The effects are hence identified using a larger set of observations/families, as we can leverage variation withinthe age group 0-5.

2.5.2 Results: Completed primary 91

at-divorce variable. The coefficients estimated on all the variables for age at divorce between

6 and 9 are close to 0: the children whose parents divorced when they were between 6 and

9 have the same likelihood of attending primary school as children who were older than 10

when their parents divorced. Similarly, the coefficient associated with being between 6 and 25

at the time of the divorce is not significant in columns (1) and (2): the probability of attending

primary school for children older than 6 at the time of the divorce does not differ from that of

children whose parents did not divorce. Thus, the coefficient on being younger than 5 at the

time of divorce comes from children younger than 5 on the divorce date being more educated

than what would be expected, rather than from children older than 6 at the time of the divorce

being less educated than what would be expected.

Figure 2.4: Coefficients associated with age-at-divorce variables

-.50

.51

1 2 3 4 5 6 7 8 9Age at divorce

1 : LPM without controls 2: SFE without controls3: SFE with controls

Note: Coefficients associated with binary variables for age at the time of the divorce. The omitted category cor-responds to children older than 10 at the time of the divorce. Specification: The dependent variable is an indicatorvariable that takes on the value 1 if the child has attended primary school and 0 otherwise. The model is a linearprobability model with controls (similar to column (7), Table 2.3). Sample: Children older than 10 on the survey datewho belong to families with at least 2 children older than 10 on the survey date.

2.5.2 Results: Completed primary

Column (8) in Table 2.3 reports the results for whether a child has completed primary school

(5th or 6th grade). Being younger than 5 at the time of the divorce is not associated with a

higher likelihood of completing primary school: the magnitude of the coefficient is very low.

This finding indicates that gains in primary school enrollment do not result in a higher likeli-

hood of having completed primary school, possibly because the level of investment in educa-


tion required to complete primary school is much higher than that required to have attended

primary school. Children who were between 6 and 9 when their parents divorced were as

likely as their older siblings to have completed primary school (on time). It seems that a di-

vorce during primary school does not affect children’s likelihood of completing 5th grade on

time. This finding means that children are not more likely to drop out of primary school than

their older siblings after their parents divorce.35

2.5.3 Sensitivity checks

Table 2.4 reports the results of the main specification when both the age threshold for inclusion

in the sample (columns) and the age threshold to be considered affected by the divorce (rows)

vary.

Sensitivity to the definition of the sample (age on the survey date) As the main sample is

made up of all children older than 7 on the survey date, the main outcome is mismeasured

for some of the children who had not (yet) begun attending primary school on the survey date

but who would start attending it. To check that measurement error does not drive the results,

we vary the age at which we define primary school attendance as beginning, going in steps

from 7 to 10, the age at which we see no more new entries into primary school (Figure 2.3a).

The magnitude of the coefficients (keeping constant the definition of being affected by divorce)

remains similar across columns. The main effect (age at divorce, 0-5) remains positive and

significant when we use ages on the survey date that are less than 10, indicating that our results

are unlikely to be much affected by measurement error. The fact that the significance of the

coefficient decreases as the age threshold increases seems more likely to result from a loss of

power due to the reduced sample size and the related decrease in the number of identifying

families.

Sensitivity to the definition of the affected by divorce variable (age at the time of divorce)

We also check that the results are robust to varying the upper age limit for the affected by di-

vorce group (this also changes the lower age limit of the not affected group). As expected from

Figure 2.4, the coefficients are positive and significant for age-at-divorce groups 0-5 and 0-6 and

not significant for other age thresholds. This is due to inclusion and exclusion errors in the af-35As a balance test, we confirm that the share of girls is not significantly different across the age-at-divorce groups

(0-5, 6-9, 10 or older). Using a sample of children who were older than 11 on the survey date, we find that thecoefficients on age at divorce, 6-9, and age at divorce, 6-10, remain close to 0 and not significant. Using a sampleof children who were older than 12 on the survey date, we find that the coefficient on age at divorce, 6-9, remainsclose to 0 and not significant. However, the coefficients on age at divorce, 6-10, and age at divorce, 6-11, are negativeand significant. Due to the low number of identifying observations and to potential measurement error (delayedcompletion of primary school), these results must be interpreted with caution.

2.6 Channels 93

fected group: the coefficients on the age-at-divorce groups 0-4, 0-7, and 0-8 are hence estimated

on siblings for whom we expect no difference in primary school attendance. The age threshold

4 results in errors of exclusion from the affected group (the not affected group includes children

aged 5): the coefficient on the age-at-divorce group 0-4 remains positive, but its magnitude de-

creases and its standard errors increase. Age thresholds 7 and 8 result in errors of inclusion in

the affected group (which then includes children aged 7 or 8): the magnitude of the coefficients

on the age-at-divorce groups 0-7 and 0-8 drops.

Table 2.4: Sensitivity to the definition of the sample (columns) and to the definition of beingaffected by divorce (rows)

Dependent variable: Ever attended primary schoolSample ≥ 6 ≥ 7 ≥ 8 ≥ 9 ≥ 10

Age at divorce0-4 y.o. 0.115 0.115 0.0993 0.0926 0.110

(0.112) (0.120) (0.117) (0.121) (0.135)Identifying children 139 127 88 75 62Identifying families 50 45 33 30 25

Age at divorce0-5 y.o. 0.134** 0.168*** 0.198*** 0.147** 0.140

(0.0642) (0.0611) (0.0694) (0.0707) (0.0920)Identifying children 157 134 108 94 63Identifying families 55 47 39 35 25

Age at divorce0-6 y.o. 0.0915* 0.122* 0.0890 0.0601

(0.0545) (0.0628) (0.0634) (0.0741)Identifying children 141 119 108 81Identifying families 49 42 39 31

Age at divorce0-7 y.o. 0.0416 0.0512 0.0438

(0.0872) (0.0917) (0.0956)Identifying children 110 96 79Identifying families 37 32 28

Age at divorce0-8 y.o. 0.0603 0.00982

(0.0804) (0.0826)Identifying children 85 74Identifying families 28 26N children (d) 286 265 227 211 192N families (d) 109 102 88 82 75N children (all) 7,896 7,314 6,733 6,183 5,695N families (all) 2,280 2,143 2,006 1,884 1,775

Note: Columns show results when varying the age threshold for inclusion in the sample. Rows showresults when varying the age threshold to be considered affected by divorce.Cells Each cell reports the coefficient on the variable AgeAtDivorce of a regression using the sibling fixedeffects model. The outcome variable is an indicator variable which takes on the value 1 if the childhas attended or attends formal (primary or secondary) school and 0 otherwise.Control variables include:quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable thattakes on the value 1 if the child is a girl and 0 if the child is a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

2.6 Channels

In this section, we discuss what might explain the results for primary school attendance and

primary school completion. We first test channels that could affect children directly: custody

and fostering decisions. As most children live with their mothers, especially when they are


young, we next consider how a divorce may affect mothers and discuss their access to re-

sources, remarriage and decision-making power.

2.6.1 Children: Custody and fostering decisions after divorce

We discuss whether with whom the child lives after divorce matters for educational outcomes.

If parents have differing preferences regarding their child’s education, then the child’s educa-

tional outcomes depend on which parent has custody. If parents foster their child, then they

are likely to foster her to relatives who have the same preferences for education as they have

but who have access to more resources.

We consider two variables that capture with whom a child was living before the age of 7 (the

age at which she should start attending school): living with her father (but not with her mother)

and being fostered. These variables allow us to study the link between custody and fostering

decisions and primary school attendance. These estimates should be interpreted as correla-

tions: custody and fostering decisions are also determined by unobservable characteristics of

the children (for instance, personality) that may also influence educational outcomes. Addi-

tionally, for custody and fostering decisions to mediate the positive link between being younger

than 5 on the survey date and increased school attendance, we would need to know with whom

the child was living during the period between her parents’ divorce and the age of 7. However,

we cannot reconstruct this variable for the whole sample and hence consider custody and fos-

tering decisions made before the age of 7.36

Our analysis follows three successive steps. First, we check whether children whose parents

divorced when they were 5 or younger had different caregivers from those their older siblings

had (Table A-2.3 in Appendix). As expected, children who were younger than 5 years old at

the time of the divorce were less likely to have lived full time with their mother (before 7) than

their older siblings were, but they were 9.9 percentage points more likely to have lived with

their father and 13.9 percentage points more likely to have been fostered before the age of 7

than their older siblings were.

Second, we check whether having lived with one’s father and whether having been fostered are

associated with a higher likelihood of attending primary school (columns 1 and 2, Table 2.5).

36Recovering retrospective information on who had custody of a child before the age of 7 is already a strenuouseffort, and we are not able to credibly recover this information for all children. However, we conduct two checks toensure the validity of these variables and of the analysis. First, we are able to build a variable that captures custodyand fostering decisions made between the divorce date and the age of 7 for a subsample of children. The correlationbetween this variable (“after divorce, before 7”) and the variable used in the analysis (“before 7”) is 0.8. Second,as the variable creation process results in additional missing values, thus restricting the sample, we estimate theregressions from Table 2.3 on this sample and confirm that the results do not change.

2.6.2 Mothers: Financial resources, remarriage and decision-making power 95

Having lived with the father before age 7 has no significant effect. Having been fostered is neg-

atively correlated with the likelihood of having attended primary school (column 1). However,

this effect is not necessarily causal. Fostering may be a means for parents to invest in their chil-

dren’s education, for instance, if they cannot finance it: the likelihood of attending school may

have been even lower if these children had not been fostered. This correlation disappears when

introducing the fixed effects (column 2): it is in fact driven by family-level characteristics.37

Third, we add the age-at-divorce variable and its interaction with the custody and fostering

variables to the regression model (columns 3 and 4, Table 2.5). The coefficient on age at divorce

remains significant, and the interaction terms are never significant. When considering whether

a child has lived (exclusively) with her mother before the age of 7, we find a positive correlation

with this variable (as expected from the signs of the coefficients on living with one’s father

and on fostering). However, the interaction term with the age at divorce is negative and not

significant. These findings suggest that custody and fostering decisions are not what mediate

the results. These results should nevertheless be interpreted with caution, as the coefficients

are identified on a small number of observations: the null result could therefore also be due to

a lack of power.

2.6.2 Mothers: Financial resources, remarriage and decision-making power

We discuss how mothers’ characteristics could affect children’s educational outcomes after a

divorce. These characteristics are endogenously determined, but studying them can shed light

on what happens to families after a divorce.

Financial resources The first potential channel is that a divorce may increase access to re-

sources, either permanently or temporarily, for some women. If getting a divorce allows

women to be more financially independent (Dial, 2008), then divorced women would have

a better situation than before, as they have more control over their resources. Similarly, women

who divorce a man who did not contribute to household expenses would be in a better finan-

cial situation after the divorce than before. However, this (permanently) improved financial

situation does not seem consistent with the fact that the results for primary school attendance

do not carry through to primary school completion. A more temporary shock would be more

in line with the patterns observed. Such shocks could include women benefiting from addi-

tional transfers from their family network after their divorce. More specifically, mothers may

receive more transfers from their siblings that are intended to help with their children’s edu-

37Beck et al. (2015) (using the same dataset as we do) find that fostered children have the same educational out-comes as their host siblings.


Table 2.5: Heterogeneity of effects on attendance: custody and fostering decisions

(1) (2) (3) (4)LPM SFE LPM SFE

Panel A: Interaction with variable living with fatherDependent variable Ever attended primary schoolAge at divorce0-5 y.o. 0.177*** 0.164**

(0.0532) (0.0644)With father -0.00128 -0.0565 -0.0102 -0.0642*

(0.0340) (0.0351) (0.0350) (0.0349)With father × 0-5 0.0297 0.0725

(0.119) (0.156)Controls Yes Yes Yes YesShare with father a 0.15 0.15 0.15 0.15N observations 7,253 7,253 7,253 7,253N families 2,125 2,125N identifying obs. 63 46 37 37N identifying families 25 18 13 13

Panel B: Interaction with variable fosteredDependent variable Ever attended primary schoolAge at divorce0-5 y.o. 0.181*** 0.157**

(0.0510) (0.0731)Fostered -0.101*** -0.0304 -0.108*** -0.0358

(0.0290) (0.0265) (0.0297) (0.0268)Fostered × 0-5 0.0754 0.105

(0.109) (0.158)Controls Yes Yes Yes YesShare fostered b 0.11 0.11 0.11 0.11N observations 7,289 7,289 7,289 7,289N families 2,134 2,134N identifying obs. 64 54 35 31N identifying families 20 16 11 9

Note: Columns (1) and (3) present basic linear probability models, and columns (2) and (4)present linear probability models with sibling fixed-effects on the main sample. The outcomevariable is an indicator variable that takes on the value 1 if the child has attended primaryschool and 0 otherwise.Sample, Panel A: Results are estimated for a subsample of the main sample. This subsample ismade up of children from whom the variable having lived with the father is not missing and whobelong to families with at least two children for whom this variable is not missing.Sample, Panel B: Results are estimated for a subsample of the main sample. This subsample ismade up of children for whom the variable fostered is not missing and who belong to familieswith at least two children from whom this variable is not missing.

a Share of children who were living with their father, but not with their mother, before the ageof 7 among children of divorced parents.

b Share of children who were fostered before the age of 7 among children of divorced parents.Control variables include: quadratic control for year of birth, birth order indicators (4 cate-gories) and an indicator variable that takes on the value 1 if the child is a girl and 0 if the childis a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

cation (such as the transfers—not specific to the case of divorce—highlighted by Baland et al.

(2016) in the Cameroonian context). This support may allow recently divorced women to send

their young children to school by helping them pay for educational expenses, such as transport

fees and school supplies. This support from the family network might fade away in the longer

run.

We cannot directly assess the short- and long-run impacts of divorce on women’s income and

resources, as we do not have retrospective data on the economic situation. However, the panel

dimension of the PSF survey allows us to describe how per capita household consumption

changes after a divorce. We use data on the 43 women who divorced the father of their children

between the two survey waves. For these women, annual per capita household consumption


seems to be rather stable (397,455 FCFA per year per capita in 2011 versus 388,034 in 2006). It

seems that, on average, these women did not experience dramatic changes in their economic

situation. These estimates are compatible with a short-term positive economic shock after a

divorce and then a return to average consumption levels.

Resources: Remarriage Remarriage may be a means for women to access financial resources.

We investigate whether the positive link between divorce and primary school attendance is

driven by women who have remarried (columns 2 and 3, Table 2.6). The coefficient on the

interaction term between a child being younger than 5 at the time of the divorce and the re-

marriage of the child’s mother before the child turned 7 is not significant, but its size (0.209)

is relatively large compared to that of the coefficient on divorce before age 5 (0.118). The co-

efficient associated with being younger than 5 at the time of the divorce is halved and is not

significant when the interaction with remarriage is added. The positive link between divorce

and primary school attendance therefore seems to be driven by women who have remarried

(shortly) after their divorce. Remarriage might allow a woman to allocate more resources to

her children’s education given the scale economies associated with marriage as well as the po-

tential direct contribution of her new husband. It could be that her new husband helps with

the financial expenses, even if the child does not live with him.

These results should not be interpreted as a pure impact of remarriage, as selection effects are

also at work. Remarried women are on average younger than other ever-divorced women and

are hence likely to have better opportunities in the labor market.38 When investigating the

heterogeneity in the effect of divorce on primary school attendance according to the age of the

mother (columns 4 and 5, Table 2.6), we find that the effects are driven by mothers who are

younger than the median age at the time of the divorce (29 years old).

We cannot further disentangle whether this effect is due to remarriage or to (young) age, as

both are strongly correlated. Additionally, the decision to remarry is endogenous to other char-

acteristics of women, as some women choose not to remarry. Notably, women who cannot

afford remarriage tend to remain single (Lambert et al., 2019). We hence expect remarriage to

have a positive impact among women who face difficult financial situations, but we do not ex-

pect differences between women according to whether they remarried once their income and

access to resources is controlled for.

38Women who remarried in our sample were on average 27 years old at the time of the divorce, whereas womenwho did not were on average 30 years old. The difference is statistically significant.


Table 2.6: Heterogeneity of effects on attendance: Remarriage, age, and education of mothers

(1) (2) (3) (4) (5) (6) (7)SFE LPM SFE LPM SFE LPM SFE

Dependent variable Ever attended primary schoolSample Restricted to children of divorced parentsa

Age at divorce0-5 y.o. 0.193* 0.243*** 0.118 0.221** 0.0830 0.216** 0.227*

(0.105) (0.0796) (0.122) (0.0927) (0.144) (0.0938) (0.128)Remarriage of the mother -0.0553 0.0194

(0.169) (0.243)Remarriage × 0-5 -0.0813 0.209

(0.164) (0.205)Mother younger than 29 years old at divorce -0.0556

(0.115)Mother younger than 29 years old× 0-5 -0.00794 0.209

(0.125) (0.144)Educated Mother 0.321***

(0.0965)Educated mother× 0-5 -0.0807 -0.0746

(0.112) (0.111)Controls Yes Yes Yes Yes Yes Yes YesShare remarried/younger than 29/educated 0.17 0.17 0.37 0.37 0.38 0.38N observations 265 265 265 265 265 258 258N families 102 102 102 99N identifying obs. 134 58 48 88 58 69 50N identifying families 47 22 17 36 22 25 16

Note: Columns (1), (3), (5) and (7) present linear probability models with sibling fixed effects on the main sample. Columns (2), (4) and (6) present basic linearprobability models. The outcome variable is an indicator variable that takes on the value 1 if the child has attended primary school and 0 otherwise. Results of thecolumn (1) are different from the column (4) in Table 2.3 because the sample is reduced here to children of divorced women.

a Results are estimated on children who are older than 7 on the survey date and whose mother has divorced, only when they belong to a family in which two childrenare 7 or older on the survey date.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value 1 if the child is a girland 0 if the child is a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

Decision-making power and preferences After a divorce, a woman may gain more bargain-

ing power or become the sole decision maker regarding her children. If women have a stronger

preference than their husbands for their children’s primary education, this may explain why

children who are younger at the time of the divorce have better educational outcomes than

their older siblings. The link between women’s bargaining power, their stronger preference

for their children’s education, and their investment in the education of their children has been

highlighted in several societies (Doss (2013) for a literature review, Menon et al. (2014) on Viet-

nam), though the evidence has been contested in others (Akresh et al. (2016) do not find such

a correlation in the Burkinabe context). The role of this potential channel cannot be directly

tested, as the PSF survey contains no information on parental preferences for their children’s

education.

Education of the mother Higher educational attainment among women is likely to be pos-

itively correlated with those women’s access to resources, their bargaining power within the

household, their preference for education, but also to higher household-level consumption and

income. This correlation is likely to hold across all education levels, but as few mothers have

completed more than primary school, we are only able to use two categories: educated mothers


and uneducated mothers.

On the one hand, we expect all children of educated parents to have attended primary school—

the income constraint should not be binding enough for such parents not to send their child

to school. Additionally, educated parents are likely to have stronger preferences for their chil-

dren’s education. When parents have a (relatively) high income and strong preferences for

children’s education, then all their children should be attending primary school. If all siblings

attend school then there cannot be a difference in the likelihood to have attended primary

school across siblings. If that is the case, the positive effect of divorce on education should

be mostly driven by uneducated women. On the other hand, we expect the (relative) decision-

making power channel to work similarly for both educated and uneducated mothers. Actually,

educated mothers are likely to have higher bargaining power in absolute terms, but the effec-

tiveness of this channel depends on the bargaining power a woman has relative to her husband,

so both educated and uneducated women could gain more decision-making power after they

divorce.

We hence test whether there are differences in the impact of divorce on the primary school

attendance of the child depending on whether her mother has attended primary school (or

higher levels of education) (columns 6 and 7, Table 2.6). The interaction between having an

educated mother and age at the time of the divorce is negative but not significant. As there is

no significant difference between both categories of mothers, the result could be interpreted as

evidence that a (relative) decision-making power channel is indeed at work (for all divorced

women) and that the relaxing of the income constraint after a divorce is not what drives the

results. However, the negative sign of the coefficient would be consistent with the positive

effect of parental divorce on education being driven more by children of uneducated women

than by children of educated women, and therefore with a relaxing of the budget constraint

after a divorce for uneducated women. It might be that only the mothers who have at least

completed primary school are financially secure enough not to face a binding income constraint

in regard to their children’s education. If that is the case, then we should group mothers who

only attended primary school with uneducated mothers and compare them to mothers who

attended secondary school in order to test the relaxing of the income constraint channel, but

the small number of women who attended secondary school does not allow us to do so.

Overall, these findings suggest that the positive effects of divorce on children younger than

5 at the time of the divorce are driven, for the most part, by children whose mothers have

remarried. Remarrying is likely to go hand in hand with an increase in resources, especially


in the Senegalese context in which female labor force participation is low. However, if the

positive effects we observe are driven only by a relaxation of the budget constraint, we expect

the effect to be driven by uneducated mothers, for whom the budget constraint is likely to be

much more binding than for their educated counterparts. This is not the case: the effects are

observed for both educated and uneducated women. All women are likely to gain bargaining

power regarding their children’s education in the case of a divorce. If women have a higher

preference for education than their former husbands, this channel would explain part of the

observed increase in children’s education.

Why is there no longer-term effect? The absence of results regarding (on time) completion

of primary education means that in the long run, divorced parents do not make an increased

investment in younger children’s education. In the case of remarriage, this may be explained

by the fact that the new husbands do not value supporting the education of a child who is not

theirs over a long period. Moreover, the monetary and opportunity costs of schooling are lower

for younger children than for older children. For the latter, school supplies and transportation

are likely to be more expensive. Instead of attending school, older children can also contribute

to the household’s income and welfare by working or by helping the adults at home. All the

channels considered as explanations for the increased attendance in primary school are likely

not to be strong enough to overcome these increased direct and indirect costs to investing in

children’s human capital.

2.7 Conclusion

This paper offers new evidence on the consequences of parental divorce for children in Africa.

Using a dataset that records the detailed life histories of Senegalese individuals and of their

family members, we investigate how children’s educational outcomes are affected by their par-

ents’ divorce. Unlike previous studies, we show that divorce is not always detrimental to

children’s educational outcomes. Using a sibling fixed-effects model that allows us to control

for all characteristics common to the siblings, we find that children younger than 5 years old

at the time of the divorce have a higher likelihood of attending school than their siblings who

were older than 6 at the time of the divorce. This positive effect is not driven by children who

were older than 6 at the time of the divorce being negatively affected by the divorce but by an

increased investment in younger children’s education after the divorce. This higher investment

is not sustained for long: these children are as likely as their older siblings to have completed

primary school.

2.7 Conclusion 101

These positive findings are likely explained by selection rather than by institutions that are

specific to the Senegalese context. We find no evidence that fostering—an institution that is

common in most of West Africa—is what mediates the effects, indicating that selection into

divorce and selection into remarriage are likely to drive the results. The positive selection into

divorce based on education indicates that for most women, high levels of income, support

from the extended family, and bargaining power may be needed to decide to divorce. As there

is selection into divorce, we cannot infer from our results what the effects of divorce would

be for children if the selection into divorce changes, for instance, following a legal reform of

child support and alimony or changes in social norms. Our findings still nuance the concerns

raised by some in Senegal over the consequences of parental divorce for children, and further

research on what drives the effects we observe could help policymakers design policies that

better support divorced parents.

Future research on Senegal could aim to better understand what mediates the effects observed

and to understand why there are positive effects on primary school attendance but no effect

on primary school completion. First, the ideal data would include information collected be-

fore the divorce as well as after. Second, relying on better measures of human capital and

school performance would allow for a more complete picture of the effects of divorce on chil-

dren’s education. Data collection could then focus on test scores, cognitive and noncognitive

skills and psychological well-being. Third, a more nuanced understanding of divorce would re-

quire high-frequency panel data that tracks families’ income and expenditures as well as family

composition and especially where and with whom children are staying and who is their main

caretaker. These data could enable the identification of factors that help families compensate

their younger children after a divorce. If these factors are income-related, these findings could

provide leverage to improve the situation of divorced women with poor access to resources,

especially if selection into divorce changes and more women who are not from better-off back-

grounds divorce.

102 Appendix

Appendix

A-2.1. Individual determinants of educational outcomes

Table A-2.1 reports the results from three linear probability models in which the outcome vari-

ables are attendance in primary school, attendance in Qur’anic school (exclusively), and com-

pletion of primary school. The independent variables are the variables used as controls in our

main specifications (Table 2.3): gender, birth year, and birth order. To be able to comment on

birth order, we control for family size (family size is captured by the sibling fixed effects in

our main specification). For comparison purposes, we use the same sample as in Table 2.3: it

is made up of children who belong to families in which the outcome variable is defined for at

least two children.

Column (1) presents the correlation between the control variables and the likelihood of having

attended primary school. Girls are more likely to have attended primary school than boys.

This is consistent with the fact that the education rate of girls converged with that of boys in

the 1990s in Senegal (Figure 2.3b). This difference is also found when using the Demographic

Health Survey for Senegal 2010-2011: among children aged 7 to 15, 66% of girls and 63% of

boys had some primary schooling or were attending primary school on the survey date.

The trends observed in Figures 2.3a and 2.3b are seen in the regression results. The trend in

birth year is positive and significant, thus capturing increases in attendance rates over time:

children in later-born cohorts are more likely to have attended primary school than children

in earlier-born cohorts. Column (2) presents the correlation between the control variables and

the likelihood of having attended only Qur’anic school. Boys are more likely than girls to

have attended only a Qur’anic school, a finding that is consistent with qualitative evidence on

Qur’anic schools (Chehami, 2016).

There are no time trends when we consider Qur’anic school attendance. Column (3) presents

the correlation between the control variables and the likelihood of having completed primary

school by age 10. Boys and girls are equally likely to complete primary school. The time trends

in completion of primary school are similar to (and, if anything, stronger than) those observed

for primary school attendance: children in later-born cohorts are more likely to have completed

primary school than children in earlier-born cohorts. Controlling for family size, children with

a higher birth rank are less likely to have completed primary school than first-born children.

Appendix 103

Table A-2.1: Correlations between individual characteristics and school attendance

(1) (2) (3)

Dependent variable Ever attended primary school Qur’anic Completed primary SchoolSample: children older than 7 7 10

Child is a girl 0.0342*** -0.0920*** 0.00729(0.0118) (0.0101) (0.0134)

Birth rankSecond child -0.00462 0.00350 -0.0343*

(0.0134) (0.0113) (0.0182)Third child -0.0212 0.00341 -0.0184

(0.0163) (0.0137) (0.0209)Fourth and more -0.0142 0.00279 -0.0453**

(0.0176) (0.0146) (0.0215)Birth yearBirth year 7.999*** -0.662 21.21***

(0.874) (0.717) (1.361)Birth year squared -0.00200*** 0.000166 -0.00532***

(0.000219) (0.000180) (0.000341)Constant -7978.6*** 660.8 -21127.9***

(872.2) (715.8) (1356.6)Controls family size Yes Yes YesShare schooling 0.65 0.19 0.48Number of children 7,314 7,309 5,612

Note: Linear probability models. In column (1), the outcome variable is an indicator variable that takes one the value 1 if the child has attendedformal primary school and 0 otherwise. In column (2), the outcome variable is an indicator variable that takes on the value 1 if the child hasattended only Qur’anic school and 0 otherwise. In column (3), the outcome variable is an indicator variable that takes on the value 1 if the childhas attended the 5th grade of primary school and 0 otherwise.Sample Columns (1) and (2): Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 orolder on the survey date. Column (3): Children who are older than on the survey date, only when they belong to a family in which two childrenare 10 or older on the survey date.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

A-2.2. Observable characteristics of identifying families

Table A-2.2 displays the characteristics of mothers according to whether all their children were

younger than 5 at the time of the divorce, all their children were older than 6 at the time of

the divorce, or (at least) one child was younger than 5 at the time of the divorce and one child

was older than 5. Families that belong to the latter group make up the identifying families.

Women who divorced when all their children were older than 5 at the time of the divorce had

been married longer (as more time had elapsed between the birth of the last child and their

divorce) and had fewer children younger than 25 (due to censorship to the right of the data)

than other divorced women. Women who divorced when all their children were younger than

5 at the time of the divorce had been married for a shorter period, 5 years on average (which

is consistent with having two children who were younger than 5 at the time of the divorce)

and fewer children were born into their last union than among other divorced mothers in the

sample. These differences are because the age of women, the number of their children, and

the length of their marriage are all correlated. There were no significant differences in the

education levels of mothers. Mothers whose children were all younger than 5 at the time of

the divorce seem to be more educated, which is consistent with the fact that these women

were younger than other divorced mothers and that the average level of education of women

has increased over time, as shown in Figure 2.3b for primary school (this trend can be seen

104 Appendix

for higher educational levels). The mothers of children on whom our coefficient of interest is

estimated hence do not appear to be different from other mothers who divorce when they have

at least two children, apart from their structural demographic factors.

Table A-2.2: Characteristics of families according to children’s age at divorce

(1) (2) (3) (4) (5)Identifying All children all children Diff. identifying Diff. identifying

older than 5 younger than 5 all older all younger

Age 37.81 42.92 35.65 5.11*** -2.16Highest education levelNo formal education 0.64 0.65 0.47 0.00 -0.17Primary 0.27 0.14 0.47 -0.13 0.20Secondary or higher 0.09 0.22 0.06 0.13 -0.03Household ConsumptionFood expenditures (hh) 156730.92 206790.20 123768.48 50059.28 -32962.44Other expenditures (hh) 143408.60 128730.60 151600.76 -14678.00 8192.16Family compositionNumber of children alive 5.00 4.00 4.92 -1.00 -0.08Number of children (≤25 y.o) 4.02 3.08 4.18 -0.94*** 0.16Number of children - last uniona 2.96 2.76 2.18 -0.19 -0.78**Last marriage duration 9.94 16.14 4.70 6.20** -5.24**

Number of mothers 47 38 17 85 64

Note: The table presents characteristics of women and of their families according to the age of children at the time of divorce. Column (4) presentsthe results of a difference in means test between identifying families and families where all the children were older than 5 on the survey date.Column (5) presents the results of a difference in average test between identifying families and families where all the children were younger than5 on the survey date.Sample: Divorced mothers who have at least two children (from her marriage that ended) who are older than 7 on the survey date.Significance of the t-test of the difference is reported in column (4) and (5). P-values are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, ***p<0.01.

A-2.3. Additional table and figure: Custody and fostering decisions

A-2.4. Additional results

Figure A-2.1: With whom do children of divorced parents live?

(a) Girls younger than 25 whose parents divorced.

0

20

40

60

80

100



(b) Boys younger than 25 whose parents divorced.

0

20

40

60

80

100



Appendix 105

Table A-2.3: Custody and fostering decisions after a divorce

(1) (2) (3) (4) (5) (6)Dependent variable With Mother With Mother With father With Father Fostered FosteredSpecification LPM SFE LPM SFE LPM SFE

Sample Main sample a

Age at divorce0-5 y.o. -0.235*** -0.219*** 0.133*** 0.0985* 0.0932** 0.139**

(0.0567) (0.0794) (0.0432) (0.0551) (0.0462) (0.0652)Controls Yes Yes Yes Yes Yes YesMean dep. var. 0.90 0.90 0.04 0.04 0.06 0.06R2 0.0098 0.0051 0.0081 0.0058 0.0058 0.0033N observationsb 7,276 7,276 7,253 7,253 7,289 7,289

Note: Columns (1), (3) and (5): Linear probability model. Columns (2), (4) and (6): Linear probability model with sibling fixed effects. Columns (1)and (2): The outcome variable is an indicator variable that takes on the value 1 if the child has been living with her mother at least until the age of7 and 0 otherwise. Columns (3) and (4): The outcome variable is an indicator variable that takes on the value 1 if the child has been living with herfather (but not with her mother) before the age of 7 and 0 otherwise. Columns (5) and (6): The outcome variable is an indicator variable that takeson the value 1 if the child has been fostered before the age of 7 and 0 otherwise.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value1 if the child is a girl and 0 if the child is a boy.a Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 or older on the survey date.b The sample size varies because the outcome variable is missing for some observations.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

Figure A-2.2: Coefficients associated with age-at-divorce variables

-.20

.2.4

.6

1 2 3 4 5 6Age at divorce

1 : LPM without controls 2: SFE without controls3: SFE with controls

Note: Coefficients associated with binary variables for age at the time of the divorce. The omitted category cor-responds to children older than 7 at the time of the divorce. Specification: The dependent variable is an indicatorvariable that takes the value 1 if the child has attended primary school and 0 otherwise.The model is a linear proba-bility model with basic controls (similar to column (4), Table 2.3, but without interacted controls). Sample: Childrenolder than 7 on the survey date who belong to families with at least 2 children older than 7 on the survey date.

106 Appendix

Online Appendix

B-2.1. Additional robustness checks

Table B-2.1: Robustness checks: Primary school attendance

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)

Model LPM SFE LPM SFE LPM SFE LPM SFE LPM LPMSample Main sample Main sample Restricted sample Restricted sample Divorce Divorce W. half-siblings W. half-siblings All children aged 7 or older All children aged 7 or older Siblings pairs (divorce)a

Dependent variable Ever enrolled in primary school ∆school (youngest -oldest)

Age at divorce0-5 y.o. 0.165*** 0.168*** 0.131* 0.168*** 0.197*** 0.197* 0.165*** 0.115*** 0.130*** 0.137***

(0.0469) (0.0611) (0.0671) (0.0611) (0.0706) (0.106) (0.0385) (0.0379) (0.0340) (0.0336)6-25 y.o. -0.0318 -0.0316 -0.0212 0.00696

(0.0514) (0.0514) (0.0470) (0.0437)Siblings pairIdentifying 0.206** 0.262** 0.216

(0.101) (0.123) (0.130)Both under 5 -0.164

(0.156)Constant -7905.5*** -8120.7*** -7910.7*** -8165.2*** -419.9 -3024.6 -7777.5*** -8033.6*** 0.640*** -7383.5*** -0.0566 -0.230 -0.142

(866.1) (753.2) (870.5) (753.0) (5018.3) (6645.6) (838.6) (732.3) (0.00831) (798.7) (0.0691) (0.203) (0.220)Controls Yes Yes Yes Yes Yes Yes Yes Yes No Yes No No NoControls – pairs No No No No No No No No No No No Yes Yes

Share primary 0.65 0.65 0.65 0.65 0.69 0.69 0.64 0.64 0.64 0.64 0.04 0.04 0.04R2 0.0149 0.0316 0.0140 0.0319 0.0468 0.0580 0.0144 0.0303 0.0016 0.0141 0.0407 0.0766 0.0876Number of observations 7,314 7,314 7,277 7,277 265 265 7,939 7,939 8,417 8,417 100 100 100Additional observationsb 671 671 1149 1149

Note: LPM & SFE: Basic linear probability model: columns (1), (3) , (5), (7), (9), and (10). Linear probability model with sibling fixed effects: columns (2), (4) , (6), and (8). The outcome variable is an indicator variable that takes the value 1 if the child has attended or attends formal (primary or secondary) school. AgeAtDivorce0/5 is anindicator variable that takes the value 1 if the child was 5 or younger at time of divorce and 0 if either the child was older than 5 or if her parents are not divorced. AgeAtDivorce6/25 is an indicator variables that takes the value 1 if the child was 6 or older at time of divorce and 0 if either the child was younger than 5 or if her parents arenot divorced. Control variables include: quadratic control for birth year, birth order indicators (4 categories) and an indicator variable that takes the value 1 if the child is a girl. Robust standard errors in parentheses (clustered at the mother level).Sibling-difference model: Columns (11) to (13) reports results from a sibling-difference model. The outcome variable is defined as the schooling outcome of the youngest child minus the schooling outcome of the oldest child of the pair. The variable of interest is an indicator variable that takes the value 1 if one child was younger than 5 atdivorce date and one was older (identifying pair).Controls – pairs variables include: the age difference of the pair and the age of the oldest child; indicator variables for whether the pair is a girl-girl pair, an older boy-younger girl pair, or an older girl-younger boy and an indicator variable for whether the oldest child is a girl.Sample(s): Main sample: Children who are older than 7 at survey date, only when they belong to a family in which two children are 7 or older at survey date. Restricted sample: Main sample excluding children than were 5 or younger at divorce date and who do not have a sibling who was older than 6 at divorce date. Divorce sample: Mainsample restricted to children whose parents divorced. W. half-siblings: Main sample, including half-siblings. All children aged 7 or older : All children aged 7 or older at survey date.a If there is a pair of children with one older than 5 and one younger, we select this pair by selecting the child who is the closest to each side of the cutoff. If there is no such pair, we choose the two children who were the closest to age 5 at the time of divorce. Observations from two families for which children closest to the threshold includetwins are dropped from the sample.b Number of additional children included in the sample (relative to the number of children in the main sample).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

Sensitivity to the definition of the controls: Sample made up only of children whose parents

divorced When we run the estimation on the sample made up of only children whose parents

divorced, we find similar results (columns (5) and (6), Table B-2.1): being younger than 5 at the

time of divorce is still positively and significantly correlated with primary school attendance,

in the basic LPM and in the SFE specifications. The order of magnitude of this coefficient is also

similar.

Alternative specification: sibling-difference model We report results using an alternative

strategy: a sibling-difference model (columns (11) to (13) in the Table B-2.1). For each family,

we select the pair of siblings that is the closest to the age of 5 at the time of divorce39. The

sample size is limited, but we find that within identifying pairs, younger siblings are more

likely to have attended primary school then their older sibling compared to pairs of children

who were older than the age of 5 at the time of divorce.

Including half-siblings in the sample We report regression results from both the LPM speci-

fication and the SFE specification in columns (7) and (8) of Table B-2.1 in Appendix. The sample

includes 671 additional observations. The results are similar to what we find when removing

the half-siblings from the sample.

39The selection process is as follows. If there is a pair of children with one older than 5 and one younger, we selectthis pair by selecting the child who is the closest to each side of the cutoff. If there is no such pair, we choose thetwo children who were the closest to age 5 at the time of divorce. We drop observations from two families for whichchildren closest to the threshold include twins.

Appendix 107

Including children who do not have a sibling older than 7 in the sample As we include

children who do not have a full sibling that is older than 7, we cannot estimate the SFE model.

Results from our LPM without SFE are reported in columns (9) and (10) of Table B-2.1. The sam-

ple includes 1149 additional observations. Being younger than 5 remains significantly and pos-

itively correlated with primary school attendance, but has a slightly lower magnitude (0.137)

than the coefficient estimated on the main sample (0.164).

Figure B-2.1 shows the results of the linear probability model when binary variables for each

age at divorce are introduced (ages 0 to 6). Model 1 is reported for comparison purposes (sam-

ple of 2 children of a divorced mother). The sample used to estimate the model 2 includes all

children older than 7 who parents divorced (even the ones who do not have a sibling over 7 to

be compared to). The sample used to estimate the models 3 and 4 (controls included) include

all children older than 7, whether or not their parents are divorced. The coefficients from these

3 models seem to be similar to the coefficients from model 1, except for age 4 and 5 at divorce,

where they seems to be lower: this pattern is consistent with the lower coefficient on being

younger than 5 when estimated on the larger sample. The coefficient associated to being 6 at

divorce date is almost exactly at 0 whichever the sample and model considered. These findings

suggest that the results estimated on the main sample are not overly sensitive to the inclusion

of children who do not have a sibling older than 7.

Figure B-2.1: Coefficients on age at divorce – All children aged 7 and older

-.20

.2.4

1 2 3 4 5 6Age at divorce

1: LPM || divorce; at least 2 children 2: LPM || divorce; at least 1 child3: LPM || at least 1 child 4: LPM+c || at least 1 child

Note: Coefficients associated to binary variables for ages at divorce. The omitted category groups ages at divorcehigher than 7. The dependent variable is an indicator variable that takes the value 1 if the child has attended

primary school, and 0 otherwise. Sample: All children older than 7 at survey date.

108 Online Appendix

B-2.2. Additional tables

Table B-2.2: Schooling of children according to mother’s characteristics

(1) (2) (3) (4) (5) (6)Dependent variable Ever attended Ever attended Ever attended Completed Completed Completed

primary school primary school primary school primary primary primary

Mother has divorced 0.0625** 0.0158 0.0306 -0.00870(0.0314) (0.0289) (0.0361) (0.0336)

Primary (any) 0.319*** 0.326*** 0.336*** 0.337***(0.0148) (0.0147) (0.0155) (0.0154)

Secondary (any) 0.394*** 0.400*** 0.392*** 0.392***(0.0155) (0.0156) (0.0154) (0.0153)

Unknown 0.248*** 0.256*** 0.275*** 0.275***(0.0303) (0.0296) (0.0300) (0.0300)

Controls No Yes Yes Yes Yes YesShare schooling 0.65 0.65 0.65 0.67 0.67 0.67Number of children 8,333 8,333 8,333 6,608 6,608 6,608Dependent variable Ever attended primary school Ever attended primary school Ever attended primary school Completed primary Completed primary Completed primary

Note: Linear probability models. In columns (1), (2) and (3), the outcome variable is an indicator variable that takes the value 1 if the child has attended or attends formal primary school. In columns (4), (5) and (6), the outcome variable is an indicator variablethat takes the value 1 if the child has attended or attends the fifth grade of primary school. Sample: Columns (1), (2) and (3): all children older than 7 at the survey date. Columns (4), (5) and (6): all children older than 10 at the survey date.Control variables include: quadratic control for birth year, birth order indicators (4 categories) and an indicator variable that takes the value 1 if the child is a girl.Robust standard errors in parentheses (clustered at the mother level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Source: PSF2.

Table B-2.3: Characteristics of families of divorced women according to the age compositionof children at the time of the survey

2 children over 7 2 children over 8 2 children over 9 2 children over 10(1) (2) (3) (4)

Highest education levelNo formal education 0.62 0.60 0.61 0.64Primary 0.25 0.27 0.26 0.23Secondary or higher 0.13 0.13 0.13 0.14Family structureAge 39.35 40.15 40.61 41.43Household ConsumptionFood expenditures (hh) 169886.72 179069.06 180729.71 186110.75Other expenditures (hh) 139790.27 137739.03 140885.80 146620.24Number of children alive 4.77 5.16 5.35 5.59Father’s occupationFarmer 0.29 0.30 0.32 0.33Independant or informal employee 0.25 0.24 0.22 0.20State-employed or employer 0.22 0.23 0.22 0.20Occupation unknown 0.08 0.06 0.05 0.05Mother is remarried 0.43 0.44 0.46 0.48

Number of mothers 102 88 82 75

Note: The table presents characteristics of mothers and families according to the composition of the family at the time of the survey.Sample: All mothers surveyed in 2011 who have divorced, with at least a child younger than 25 and two children over 7 (over 8, over 9 andover 10) at the time of the survey for the column 1 (respectively column 2, 3 and 4).

CHAPTER 3

ETHNIC HOMOGENIZATION AND PUBLIC GOODS: EVIDENCE FROM

KENYA’S LAND REFORM PROGRAM

Joint with Catherine Boone and Alexander Moradi1

Abstract Little is know about the effects of ethnic homogenization policies despite being anobvious policy option, if not an ethical one, based on the literature that links ethnic fraction-alization to negative development outcomes. In this paper, we examine the effects of ethnichomogenization on public good provision using a natural experiment that took place in Kenya.We study a large-scale land reform program that led to a significant reduction in ethnic diver-sity, the settlement schemes program. Using a novel dataset about the precise location of programarea boundaries (Lukalo et al., 2019) that we combine with archival, survey, census, and satel-lite data, we implement a spatial regression discontinuity design. We argue that the borderbetween program areas (treatment) and neighboring areas (counterfactual) is plausibly ran-dom at the local level and confirm that there are no observable differences in pre-treatmentcharacteristics. We find a strong discontinuity in ethnic diversity but no differences in schoolprovision between program areas and counterfactual areas in the short run as well as in thelong run. As individuals were resettled to the program areas, they likely lack the dense socialnetworks that favor collective action to either hold politicians accountable or to provide publicgoods throughout cooperation at the community level. Our results are not driven by spilloversfrom treatment to counterfactual areas. A mediation analysis indicates that income effects areunlikely to drive this null result.

1Catherine Boone acknowledges the financial support of ESRC Grant # ES/R005753/1 “Spatial Dynamics inAfrican Political Economy” (Boone, PI). Juliette Crespin-Boucaud acknowledges the support of the EUR grant ANR-17-EURE-0001. This paper benefited from discussions with Denis Cogneau, Oliver vanden Eynde, Jonathan Lehne,Nicolas Navarrete H., Stephan Kyburz, Avner Seror, Rebecca Simson, and Maiting Zhuang. We are grateful toseminar audiences at Bicocca, LSE, PSE, Wageningen, audiences of the IFS-UCL-LSE/STICERD Development Eco-nomics Seminar, as well as to participants of the NEUDC conference, the Journees de Microeconomie Appliquee(JMA) conference, the German Development Economics Conference, and the International Development EconomicsConference. All remaining errors are our own.

110 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program

3.1 Introduction

A large body of work has linked ethnic fractionalization to negative outcomes, in the world in

general and African countries in particular (Alesina and La Ferrara, 2005; Alesina et al., 2016;

Easterly and Levine, 1997b). Yet the literature has little been concerned with potential solutions.

Neither has it been concerned about the fact that sub-national ethnic diversity levels result

from historical processes. Most policies implemented have gone in the direction of greater

integration (Bazzi et al. (2019) on Indonesia, Boesen (2019); Miguel (2004) on villagization in

Tanzania), rather than of ethnic separation that appears like a nuclear button (Milanovic, 2003).

We have a natural experiment that did precisely that.

This paper investigates the effects of ethnic homogenization on public good provision in Kenya.

We study a large-scale land reform program that led to a significant reduction in ethnic diver-

sity, the settlement schemes program.2 In the run-up and early years of independence, the Kenyan

government bought land owned by European settlers and redistributed it to African farmers.

About 3%-4% of the Kenyan population was part of the program. The government wanted

to propel agricultural development and to defuse land hunger (Boone et al., 2021). The land

reform combined elements of a rural development program, a redistributive land reform, and

an ethnic homogenization policy.

Using a novel dataset about the precise location of program area boundaries (Lukalo et al., 2019)

that we combine with archival, survey, census, and satellite data, this paper relies on a spatial

regression discontinuity design. The identifying assumption is that the exact borders between

program areas (treatment) and neighboring non-program areas (counterfactual) are as good

as random at the local level. We argue that this is plausibly the case, because of constraints

related to land acquisition. First, the Kenyan government wanted to acquire land in specific

parts of the country. Land prices turned out to be higher than expected and so the land to

be included in the schemes was revised downward (Leo, 1984). Moreover, state capacity was

limited. Even though the government wanted to include more land, they could not support that

administratively. We confirm that i) pre-treatment covariates do not differ across the border and

that ii) ethnic diversity decreased substantially in the program areas as a result of the program.

Our main outcome of interest is school provision. We also report results on population density,

field size, and polling stations, that help us understand the mechanisms at work.

Our main finding is that there are no differences in school provision between program areas

2The use of area-specific ethnic criteria to select beneficiaries has been documented by Leo (1984). This paper isthe first to quantify the effect this policy had on levels of ethnic diversity.

3.1 Introduction 111

and counterfactual areas in the short run as well as in the long run. Results do not change if we

consider per capita measures. Results on field size (a proxy for farm size and income3) suggest

that program areas are poorer and inequality is lower as compared to the counterfactual areas.

Results are at odds with the vast literature that finds a negative relationship between ethnic

fractionalization and public good provision but are in line with the literature arguing that eth-

nic diversity in fact proxies for the strength of social networks within communities (Eubank

et al., 2019; Fearon and Laitin, 1996; Miguel and Gugerty, 2005). As individuals were resettled

to the program areas, they likely lack these dense social networks that favor collective action

to either hold politicians accountable or to provide public goods throughout cooperation at

the community level. For that interpretation to hold, land reform itself must not have had a

negative direct impact on school provision, for instance through income effects or changes in

inequality. We use a mediation analysis to show that our results on school provision are robust

to the inclusion of a proxy for income (field size). Moreover, we use border segments unaffected

by the program to rule out spillover effects.

This paper relates to three strands of literature.4 First, there is a very large literature on eth-

nic diversity and its consequences (Alesina et al., 1999, 2016; Baldwin and Huber, 2010; Miguel

and Gugerty, 2005). Kenya is a prime example in studies of ethnic diversity. The existence of

ethnic politics and favoritism has been documented (Harris and Posner, 2019; Kramon and Pos-

ner, 2016). Ethnic diversity demonstrably and negatively affected productivity in firms (Hjort,

2014); public sector performance (Eynde vanden et al., 2018); and public good provision such

as roads (Burgess et al., 2015) and schools (Kramon and Posner, 2016; Miguel and Gugerty,

2005).5 Using a plausibly exogenous source of ethnic homogeneity we do not find more public

goods in ethnically homogeneous areas. This suggests that there may be contexts where the

simple link between ethnic diversity and public good provision is broken. We are not the first

ones with a non-positive finding. Harris and Posner (2019) argue, using Kenya as a case study,

that development projects, including school-related projects, are more allocated according to

needs than as a reward to political supporters. The effect of ethnic politics on schooling has

recently been challenged by Simson and Green (2020). Second, this paper relates as much to

3This is a strong assumption that nevertheless may apply for small geographic areas, where soil quality andcultivation practices are similar.

4We do not interpret our results as land reform results that would have external validity: the land reform isprogram is bundled with an ethnic homogenization component and the program area borders correspond to dis-continuities in ethnic fractionalization levels.

5Miguel and Gugerty (2005) is highly relevant for our paper. The authors find ethnic heterogeneity to be asso-ciated with lower quality school facilities, supposedly because of lower funds generated from public fundraisingevents. Their identification relies on historical heterogeneous settlement patterns where two ethnic groups meet (inrural western Kenya - which correspond to African Land Units in our study). They also look at community waterwells.


the literature on the origins of local-level ethnic diversity and how ethnic diversity must be his-

toricized. We document and exploit a hitherto neglected exogenous source of homogenization.

The land reform extended Kenya’s ethnic “homelands” in a rather peaceful but very effective

way. Third, we contribute to the literature on resettlement and forced migration. Becker et al.

(2020) showed how Polish WWII refugees, who lost their property in the wake of Poland’s

westward shift, acquired more human capital as a result of this experience. Using the same

historical event Charnysh and Peisakhin (2021) showed how community values survived in

group majority settlements of displaced people. Bazzi et al. (2019) found greater national in-

tegration and more public goods in diverse communities of Indonesia’s resettlement program.

Miho et al. (2019) documented spillovers of gender norms from the deportees to the local pop-

ulation under Stalin’s ethnic deportations. Our results suggest that there are contexts where

resettlement had little to no effect on the provision of public goods. Our results cannot be ex-

plained by ethnicity per se, but possibly by the lack of a deep-rooted social network typically

associated with ethnicity - and that was also lacking in nearby fractionalized communities, due

to resettlement on both sides of the border.

The paper is organized as follows. Section 3.2 presents the data. Section 3.3 provides back-

ground on the land reform program under study. Section 3.4 details the methodology used.

Section 3.5 presents the results from the main border comparison. Section 3.6 discusses their

interpretation. Section 3.7 provides additional results using other border comparisons as well

a robustness checks. Section 3.8 concludes.

3.2 Data

This project combines historical administrative records with modern survey data and GIS in-

formation from a variety of sources.

Program areas: Settlement schemes. We obtained the exact boundaries of the settlement

schemes from Lukalo et al. (2019), who constructed a map layer from over 1,500 digitized Reg-

istry Index Maps (RIM) kept by Survey of Kenya in 2018. Polygons were joined with attribute

data from the Ministry of Lands and Physical Planning (MoLPP) dataset on Kenyan settlement

schemes, presented in Lukalo and Odari (2016). We added more background data on the pro-

gram areas included in our sample from the Annual Reports of the Department of Settlement

(1974), notably land prices, estimated income potential, and ethnic group selected for settle-

ment.

3.2 Data 113

European owned farms: Scheduled Areas. We digitized the boundaries of the Scheduled

Areas from Kenya National Bureau of Statistics (1926). This collection of sheets at 1:250,000

scale shows the plots reserved for European farmers as well as Forest Reserves and African

Land Units.6

Physical geography. We use data on crop suitability for coffee, tea, maize, sugar, and wheat at

a 5km x 5km resolution from FAO/IIASA (2011). Elevation data comes from the Shuttle Radar

Topography Mission (SRTM) and was sourced from Regional Centre For Mapping Resource

For Development (2020) at 30m x 30m resolution. A shapefile of forests and game reserves is

sourced from Kenya National Bureau of Statistics (1926) and Foundation (2020).

Ethnic diversity. From the 1962 and 1989 Population Censuses we retrieved the ethnic back-

ground of the population at the smallest administrative units available (“location” & “sub-

location”, respectively).7 We also know ethnicity at the GPS location of households from the

Kenya DHS surveys 2003, 2008-09, 2014 (Kenya National Bureau of Statistics, 2015).

Population. From Jedwab et al. (2017) we obtained data on urban population for the years

1901, 1911, 1920, 1926, 1931, 1948, 1962, 1969, 1979, 1989, 1999, 2009 for any city with more

than 500 inhabitants at any point in time (N=249). The data is based on estimates and censuses,

gives point coordinates of the city center but does not record the extent of urban sprawl. Kenya

Gazetteers for the years 1955, 1964, and 1978 provide point locations of populated places (U.S.

Board on Geographic Names, 2018).8 Finally, CIESIN (2016) provides population estimates at

a resolution of 1 arc-second (approximately 30m in Kenya) for the year 2015. Their estimates

are based on high-resolution (0.5m) satellite imagery, classifying blocks of optical satellite data

as settled (containing buildings) or not, and then using proportional allocation to distribute

population data from subnational census data to the settlement extents.9

Agriculture. From FAO (2000) we obtained a polygon shapefile with field sizes in 2000 dis-

tinguishing between three categories: predominantly i) 1=small (<2 ha), ii) medium (2-5 ha),

and iii) large (>5 ha).10

6We preferred this source over later data, because of the large scale that allows highly accurate digitization ofthe borders. By 1926, the boundaries have largely been set (Morgan, 1963). We confirmed this by comparing theScheduled Areas of 1924 to 1953 (Troup, 1953) and 1962 (Morgan and Shaffer, 1966).

7In 1962, a location was the smallest administrative unit and there were 490 locations. In 1989, the number of sub-locations was 3,715. On average, the 1962 and 1989 administrative units comprised 17,620 and 5,714 individuals,respectively.

8The Gazetteers recorded places smaller than 500 inhabitants, or 792, 2,341, and populated places in 1955, 1964,and 1978 respectively.

9Night light data is problematic in rural contexts (Gibson et al., 2020).10These data are based on FAO’s Africover dataset (2000) and were presented in World Resources Institute (2007,

Map 5.7). We merged mixed field sizes (labeled “medium mixed with small” and “large mixed with medium”) to


Political Economy. The geographic coordinates of 26,447 polling stations during the 2012

presidential election come from Maron (2013), originally released from the Independent Elec-

toral and Boundaries Commission (IEBC), which, founded in 2011, is responsible for supervis-

ing referenda and elections in Kenya.

Public Goods. From 1955, 1964, and 1978 gazetteers we obtained point locations of schools,

markets, and facilities (U.S. Board on Geographic Names, 2018).11 While not accurate over time,

the spatial variation can be considered largely consistent.12 We obtained the precise geographic

point coordinates of all primary and secondary schools in Kenya from the 2007 school census

collected by the Ministry of Education (2009).

3.3 Background: Land, ethnicity, and public goods in Kenya

3.3.1 Colonial period

European settler areas. When the British colonized Kenya at the end of the 19th century, they

alienated large parts of the territory and reserved the land for European settlement. We refer to

this area around Nairobi, central and western Kenya as the European settler area (see map 3.1).13

Three factors were paramount for the alienation of land (Morgan, 1963). First, the climate was

deemed suitable for European settlement. The high altitude, generally above 5,000 feet (ca.

1,500 meters), made the alienated land malaria-free. Second, the land was close to the railway

line, thus giving European farmers market access.14 Last, the land had a low permanent pop-

ulation density at the end of the 19th century, part of it being used as cattle grazing grounds,

leading the authorities to declare the land as inhabited.15 The European farms, ranging in size

from 400 to 12,000 hectares, required African labor. This was secured by a squatter system (Leo,

1984). In return for labor, European settlers allowed African families to live on the farm and

cultivate a plot of their own.16 The status of squatter meant that Africans could be evicted at

the overall categories (”medium” and ”large” respectively). Moreover, we exclude any polygons as non-agriculturalif the field size was not mentioned as well as “urban settlements” (N=33). From the same source, we know whethercoffee and tea were cultivated.

11The category “facilities” groups together a wide range of features such as administrative facilities (offices, com-munity centers, post offices, banks, police posts, prisons), agricultural facilities (veterinary facilities, agriculturalresearch stations, experimental farms, dairies), recreational and sports facilities, religious buildings (mosques, mis-sions, churches) and industrial facilities (quarries, mines, mills, power stations).

12For example, the number of schools reported in 1955 and 1964 is largely consistent with data in Survey of Kenya(1959), but then decreases by 1978 and falls short of the numbers reported in Kenya (1980).

13These areas were officially called “Scheduled Areas”, and informally referred to as “White Highlands”.14Most of the railway was built between 1895 and 1929.15Kenya was hit by several shocks that depopulated this territory: The rinderpest of 1892 with losses of 80–90%

of cattle (Mack, 1970), locust swarms resulting in a devastating famine in 1897–1899, and a smallpox epidemic(Ambler, 1988). The pastoralist Maasa, who used ca. 60% of the land, were displaced from the Scheduled Areas tothe Reserves in the South and Laikipia.

16Agreements varied in the amount of labor that needed to be supplied, size of the plot, allowance of stockholdings, and wage payments (Youe, 1988). Over time, Europeans sought to replace the system with wage labor.

3.3.2 Land reform post-independence 115

Figure 3.1: Map of Kenya: Scheduled Areas, program areas, other redistributed areas, andmain cities.

the end of the labor contract. Overall, there was sizable labor immigration into the European

settler areas that was not restricted to a certain ethnicity.

African Land Units. The British confined Africans to “Native Reserves”, partly adjacent to

the European settler areas, partly additionally separated by a forest (see map 3.1). Each reserve

was designated for an African ethnic group. There was a Kikuyu Reserve, a Luo Reserve, and

so forth. We call these areas the African Land Units. In these areas, land tenure was based on

what the colonial government deemed as the prevailing customary law: Land was community

property. It could not be sold to individuals. Tribe members only had rights to the land they

were cultivating (Parsons, 2012).17 These areas were rural. Most people were engaged in small-

scale agriculture and animal husbandry, relying on family labor. The rigid reserve system

restricted expansion, and with population growth some areas became overcrowded. A class of

landless people emerged. And, the colonial government neglected these areas in the provision

of public goods (Eynde vanden et al., 2018).

3.3.2 Land reform post-independence

The land in the European settler area was made up of mixed farms, agricultural estates and

plantations, and ranches. Of the land acquired by the Kenyan government, almost all was

Evictions took place.17Kikuyu customary law, in fact, knew individual ownership (Leo, 1984, p.29-32).


in the “mixed farm areas”.18 Most mixed farms were marginally profitable or operating at

a loss and much of their hectarage was unexploited. European settlers knew that without

subsidies, state protection, and the colonial, apartheid-like labor repressive regime, they could

not survive economically (Leys, 1974). When British colonial rule was about to come to an end,

many European farmers wanted to leave.19 The independent Kenyan government wanted

to mitigate the land problem and create a new class of African farmers that would maintain

agricultural productivity. The Kenyan government decided not to expropriate the land but to

buy it. Funds came in from the UK, Germany, and the World bank in the form of partly grant

partly interest-bearing loans. By 1971 about 500,000 hectares, or 17% of the European settler

area, were transferred to about 49,500 African families. Based on the population count of the

Census 1962, this represented about 4% of the population.20

Program Areas. In this paper, we focus on the 120 conventional schemes implemented be-

fore 1973. We call them the program areas. We exclude settlement schemes located in the Coast

province (N=8) because climatic and agricultural conditions are very different from the Kenyan

Highlands. We exclude Haraka schemes (N=17) that settled landless poor on very small plots

(0.6 hectares on average) taken from abandoned or mismanaged European farms. We exclude

cooperative schemes (N=12), because of their cooperative ownership and organizational struc-

tures, which later were described as a failure (World Bank, 1973); and ranches (N=6) designated

for extensive agriculture (cattle herding) due to low soil quality and water availability. We also

exclude schemes later than 1973. While the government continued to create new smallholder

settlement schemes in the following decades, more and more marginal lands were allocated

and the government moved away from agricultural development goals and used the creation

of schemes to defuse tensions in politically strategic areas (Boone et al., 2021).21 The treatment

of those excluded schemes is very different; endogenous selection remains a concern; coun-

terfactual and policy implications are less clear. Figure 3.1 shows the location of the program

areas (in black), non-program land reform parcels allocated during our study period (in gray,

non-conventional schemes), and land reform parcels allocated after 1973 (light gray).22

18Most of the rural white population would have been residing in the mixed farm areas.19The 1962 census counted 55,759 Europeans. Their number dropped to ca. 5,000 in 1969 (Jedwab et al., 2017).20In comparison, the villagization programs (the relocation of people into centralized planned villages) in Tanza-

nia (1970s), Mozambique (1980s), and Ethiopia (1986-1989) involved about 30%, 15%, and 28% of the population,respectively (Hanlon, 1990; Lorgen, 2000). In contrast, forced migration under Stalin amounted to 6% of the Sovietpopulation (Polian, 2004).

21See Di Matteo (2019) for a study of Kenyan land reform from 1990s to 2016.22The exact boundaries of 17 schemes are unknown because their Registry Index Maps have not survived. We

nevertheless know their approximate location, size, and type: 15 of the 17 schemes are Haraka schemes. Thesedo not differ in size from the Haraka schemes whose locations are known (1206 and 1236 hectares for schemes ofknown and unknown location, respectively).


How did the state select the land to be included in the program areas? The newly formed

Kenyan government did not want to create a rig-rag of plots and schemes, but to have them

clustered. Under the idea of building ethnic nations, they also aimed to acquire land close to

the former Native Reserves.23 The government then faced several constraints about how much

land can be incorporated. First, the Kenyan government agreed with the British government

that land had to be bought which qualified as “compassionate purchases” (land from older

farmers or widows, for instance), the extent of which was not known in advance. Second, the

development loans were agreed in advance. The Kenyan government was prepared to pay

market prices as of 1957-1959, and then adjust the price based on current crop profitability.

However, in practice there was room for negotiation and in some cases, land was valued ac-

cording to a different formula.24 At the end of the process, land prices turned out to be higher

than officials expected and so the land to be included in the program areas was revised down-

ward (Leo, 1984).25 Third, state capacity was limited: the whole of Kenya’s Department of

Agriculture was occupied with the land reform and even if the government wanted to include

more land, they could not support that administratively. These constraints caused the exact

boundaries of the program areas to be unpredictable, which is the foundation of our empirical

strategy.

Table 3.1: Characteristics of settlement schemes

Type of Scheme Number Estimated yearly Average size Average number Average settlement Averageincome potential of plots of plots charge per hectare dev loan

(in £) (in hectares) (in £) (in £)

Low Income 83 33.7 12.1 348.7 20.5 127.0[14.6] [6.0] [175.2] [9.7] [33.8]

High Income 37 104.2 15.9 145.6 22.4 243.9[25.0] [8.3] [95.9] [6.4] [57.3]

Data from Department of Settlement (1974). Conventional farms only. Excluding cooperatives (N=8), ranches (N=6),Haraka (N=12), Harambee (N=2) and Shirika (N=4) schemes. The settlement charge reflected the government’s pur-chase price minus a grant element of 23%. A large percentage was given as a loan (90% and 100% in the Low/Highincome potential schemes respectively (Kenya, 1971)). Development loans were meant to cover the purchase of pro-ductive assets and inputs such as housing, fencing, livestock and crop cultivations. Standard deviation in brackets.

The program areas made up the bulk of the land reform. About 412,000 hectares were trans-

ferred to 35,000 families or 2.8% of the population as of 1962.26 The government created two

types of schemes, according to the income that farmers could earn on their plot: Low and high-

23In principle, this may suggest that European farmers closer to the Native Reserves may have had a betternegotiation position as compared to farmers further away from the Native Reserves because then the risk of notbeing able to sell the land to the government increased.

24Ruthenberg and fur Wirtschaftsforschung) (1966) reports a price calculated “on the basis of eight times theaverage annual profit, working on a 12.5 percent return on the capital invested in land, buildings, water supplies,roads, etc.”, which he deemed extremely generous.

25See the replies to European farmers who complained that their land was not included into the scheme “Landfile: Revised land settlement scheme” FCO/41/6927 in the British National Archives.

26Today, about 2% of Kenyans live in the program areas (own estimation).


income schemes.27 Table 3.1 reports descriptive statistics. Low income schemes were more

frequent than high income schemes; their earning potential, as estimated by the government at

that time, was about three times lower. The average plot size, in contrast, was not that differ-

ent. Plots in low-income schemes were on average 12.1 hectares, about one third smaller than

the average size in high-income schemes. Income potential is thus not driven by plot size, but

rather by differences in soil quality and the crops that can be cultivated(Leo, 1984, p. 159).28 The

final columns show that the settlement charge29, which African farmers assumed as a debt in

return for the plots, was similar for both types of schemes, but that developmental loans, which

accounted for permanent improvements like buildings and farming inventory was twice larger

for the high income schemes.

Selection into high and low income program areas. The selection of farmers into the schemes

depended on three criteria: income, skills, and ethnicity, the first two differed by type of

scheme, whether high or low income program areas. After all, the government wanted to create

a new class of African farmers on good quality land and commercially-oriented farms, i.e. the

high income program areas. First, the land was not given for free. African farmers had to buy

the land. The costs were linked to market prices, and varied according to the income potential

(see Table 3.1). However, farmers did not need to pay up-front. They typically received a loan

repayable after 5 years from farm proceeds. Hence, credit-constrained farmers were not overly

excluded (Kenya, 1971). Once farmers repaid the loan, they owned the land.30 At low-income

program areas even landless people and squatters were given land. In contrast, in high-income

program areas, farmers were required to prove their ability to invest. According to Leo (1984,

p. 82) it was a sum of money that only prosperous farmers or members of the petty bourgeoisie

could afford. Second, the government was keen to select farmers who would be able to repay

the loans and to maintain agricultural production (i.e. national food security). Hence, skills

and agricultural knowledge were an advantage. In low income program areas, it was enough

to have settled on the land, either employed by the European farmers or as squatters. Besides,

the government provided agricultural training as part of the land reform.31 As high income

27The government named the schemes differently, namely high and low-density schemes, based on the targetedpopulation density.

28Differences in the potential income of schemes are indeed correlated with soil quality. Using data fromFAO/IIASA (2011) on crop suitability on the main export crops that are relevant for this part of Kenya (coffee,maize, and tea), we compute the average crop suitability value of each settlement scheme. The average crop suit-ability is always numerically higher in high income schemes (36 schemes) than in low income schemes (85 schemes).T-tests of differences in means are however not statistically significant.

29The settlement charge covered the expenses incurred by the Kenya government in purchasing the land plus a10% “bad-debt” margin.

30There is evidence that farmers sold land parcels through informal means before they had fully repaid the loan(Ambwere, 2003).

31In 1962, mobile training teams toured the schemes and explained phosphate application, line planting of maize,dipping, calf-rearing, and artificial insemination (Department of Settlement, various). Later, more formal training


potential areas meant a larger commercial enterprise, there was probably a stronger emphasis

on entrepreneurial skills (Nottidge and Goldsack, 1966). Third, the government sought to ex-

tend the regional base of ethnic groups across the Reserves (Leo, 1984, p. 120-4).32 Kikuyus

were offered land in Central Kenya; Merus and Kamba in Eastern region and the Kalenjin in

Rift–Valley (Lukalo et al., 2019).33 In lands with large numbers of African laborers and squatters

of the “wrong ethnic group”, these people were largely removed.34 The ethnicity requirements

existed for both low and high income program areas.

The differences between the program areas are illustrated by the results of a field survey in the

early 1980s. Comparing high income scheme Passenga with neighboring low income scheme

Ol-Kalou West in Nyarandua North, Leo (1984) found in the high income area: fewer people

per plot (9.3 versus 13.3), more laborers (2.1 vs. 0.7), fewer squatters remaining on the land (0.32

vs. 2.2), commercial background (40.5% vs. 6.8%), more absentee landlords (33% vs. 8%).35 Leo

(1984) also observed that the high income program areas showed signs of prosperity such as

corrugated iron roofs, houses of wood and stone, whereas low income program areas were

dominated by mud houses, thatched roofs, and extensive areas of uncleared bush.

Counterfactual: Other parts of the former European settler area What happened to the Eu-

ropean settler area that was not allocated before 1973? From Lukalo et al. (2019) we calculated

that the Kenyan government acquired another 208,000 hectares, or 7% of the European settler

area, and used it for smallholder schemes implemented after 1973. About 25% of the land was

acquired by private land buying companies that bought the European farms, subdivided the

land and sold it individually (Leys, 1975, p.89-90).36 Some of the land was channeled as large

holdings to members of the government elite, some remained government property, including

as Agricultural Development Corporation (ADC) farms (Boone et al., 2021). About 50% of the

area remained in the hands of the foreign land-owners.37 Much of this was ranch land and cor-

porate holdings. Overall, to use the words of Leo (1984), the “regular market in land was being

was provided at farmer training centers. More of these were built close to the schemes.32The land reform program started at the beginning of independence negotiations (first preparations were made

in 1961). In line with the idea of regionalism (“Majimboism”), most ethnic groups were represented, except theMasai, who were not considered despite having a historical claim on the land. See Medard and Golaz (2011) foradditional information on internal borders and conflict related to access to land.

33Scraping the various issues of the Department of Settlement (various) we find 59 schemes explicatively linkedto one of the ethnic groups. Unfortunately, a systematic listing is not available.

34The 66/67 Department of Settlement (various), for example, states that “squatters who were residing on thesettled plots [Chepsir & Tugenon] have been removed to Lumbwa Township.”

35His survey consisted of interviews with a 40% random sample of plot-holders or managers.36The government supported this by increasing the capital stock of the Land and Agricultural Bank of Kenya and

permitting loans up to 80% of the value of the land if funding was not available in the money market.(Leo, 1984, p.99).

37Part of it are the famous Delamere estates, which, however, are nowhere close to the program areas and there-fore not relevant for our study.


allowed to become the main vehicle for land transfer to the African bourgeoisie”. This area

constitutes the counterfactual – what would have happened in the absence of the land reform.

3.3.3 Schools in Kenya

Education was part centrally, part locally-funded (Simson and Green, 2020). In the first decade

of independence, the construction of primary schools and teachers’ houses was funded lo-

cally, while afterward the central government stepped in and paid for some or all of the re-

current expenditures (Eshiwani, 1990). Such schools were referred to as harambee schools.

Kenya’s secondary schooling system had a significant private component. Historically, many

secondary schools were created as Harambee projects under local initiative using local re-

sources.38 Harambee schools were typically of lower quality and many were later back in-

tegrated into the formal government school system (Mwiria, 1991).

We use data from the 2007 census of schools to determine which institutions are in charge of the

schools.39 In 2007, most (41%) primary schools in Kenya were still financed and run by religious

organizations and about 31% of primary schools are run by the central government (Table A-3.1

in Appendix). About 20% of schools are funded by private individuals or organizations. Only

3% of schools are registered as community schools. Almost half (49%) of secondary schools are

run by religious organizations. The share of schools funded by the central government (22%) is

lower than for primary schools. 5% of secondary schools are registered as community schools.

3.4 Empirical Strategy

The main discontinuity we are interested in is the border between program areas and the Eu-

ropean settler area (see Figure 3.2). We discuss below the specification used as well as the

motivation and identifying assumptions.

3.4.1 Specification

The spatial regression discontinuity approach exploits the fact that the status of the land changes

discontinuously at the border of the program areas. The identifying assumption is that, after

controlling for the distance to the program area, being on the program area side (treatment)

or on the other side (control) is a random event uncorrelated with other unobservables deter-

minants of public good outcomes. Differences in outcomes across this border can hence be

38Schooling made up about 60% of Harambee projects (Ngau, 1987).39We use the reported school sponsor of each school as well as the status of the school (private or public). Accord-

ing to recommendation 13.28 from the Report of the Commission of Inquiry into the Education System of Kenya:Sponsors are required to take an active role in spiritual, financial, and infrastructural development to maintain thesponsor status(Republic of Kenya, 1999).

3.4.1 Specification 121

Figure 3.2: Illustration: Boundaries studied.

Program Areas(Plots distributed to

African farmers)

Comparisons# 1: Program Areas vs Counterfactual# 2: Program Areas vs African Land Units

European Settler AreasAfrican Land Units

Counterfactual(Plots not distributed)

attributed to the program.

We choose 250m x 250m grid cells as the unit of observation.40 We assign geographic features

to the cell in which they are located. We define dc as the distance (in km) of the centroid of

each cell (c) to the program area border, with negative (positive) values identifying cells in

counterfactual (program) areas, and estimate model 3.1. We estimate the model on the cells

for which abs(dc) ≤ ∆ where ∆ is either 5km (main results) or the value of the bandwidth

determined by a data-driven selection process (Calonico et al., 2017) (robustness). We use a

polynomial of order 1 (Gelman and Imbens, 2019).

Yc = α · dc + Tc · β · dc + f(Geoc) + ηc (3.1)

Yc is any of the outcome variables studied. Tc is an indicator variable that takes the value 1 if

the cell is part of a program area and 0 otherwise. β is the treatment effect of interest. f(Geoc)

is a third-order polynomial function of the latitude and the longitude of the centroid of each

cell. Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance

estimator.41

40Colonization and then land reform disrupted land occupation patterns. By not using “villages”, we avoidpitfalls of comparability in settlement structures across space and time.

41We do not use administrative unit fixed effects in our analysis. Administrative boundaries in Kenya wereredrawn shortly before independence, joining program area blocs to reserve lands (Boone et al., 2021). As such,administrative boundaries often follow the outer borders of program areas and are therefore endogenous to thetreatment.


3.4.2 Identifying assumption

Motivation The identifying assumption is that the exact boundary at the local level was as

good as random. We argue that this is plausible. The Kenyan government aimed to create

contiguous territories rather than a rag rug of scattered farms. The inclusion into the program

depended on the funding available to buy the land. Funding was coming from developmental

loans that were agreed in advance. European farmers had a good negotiating position. While

prices paid by the Kenyan government were above market prices, there was a strategic ben-

efit of holding on to the land. This incentive decreased towards the boundary.42 In any case,

land prices turned out to be higher than officials expected and so the land to be included into

the program areas was revised downward (Leo, 1984)43 The land and pre-treatment economic

activities should be very similar on each side.

Boundaries of interest The main results are based on the study of the boundaries between

the program areas and the counterfactual areas (i.e. plots not redistributed), as depicted in

Figure 3.2. Of all the possible boundary segments that are located within the former European

settler area, we study those depicted in Figure A-3.1.

We exclude boundary segments from our analysis following four criteria. First, we exclude

boundaries that touch forest and game reserves, as these are not. Forest reserves were nei-

ther meant for European nor African agricultural production. During colonial times, they also

served as “buffers” to African Land units. Over time, some of the forests disappeared but in

general, satellite images indicate very strong discontinuities along the old forest reserves. Pro-

gram areas de facto frequently border protected areas (see Figure 3.1) and these areas cannot

be considered to be a valid counterfactual. Second, we exclude boundaries between program

areas and non-program redistributed areas (i.e. non-conventional program areas and plots re-

distributed after 1973), as these areas cannot be considered as a valid counterfactual. Third,

we exclude boundaries that are located within clusters of program plots, as these boundaries

might not be exogenous. Fourth, we exclude the boundaries located in Nyandarua county (the

area located to the East of Nakuru) as the boundary follows one of the escarpments of the Great

Rift Valley. This border is hence not exogenous.

42African purchasing power was low and limited demand, whereas the sudden increase in land supplied by themany Europeans wanting to leave implied low prices.

43See the replies to European farmers who complained that their land was not included into the program “Landfile: Revised land settlement scheme” FCO/41/6927 in the British National Archives.

3.4.3 Data for SRDD 123

3.4.3 Data for SRDD

Matching grid cells to data Units of analysis are 250m x 250m grid cells that are matched

to the variable of interest according to their location. First, schools, polling stations, and DHS

clusters are matched to cells according to their location using GPS coordinates. Second, the

average of population, buildings, and altitude is computed for each cell from a high-resolution

raster file (grid cells about 30m2 in Kenya). Third, cells are matched to the agricultural data

(polygons that indicate field size and the type of crop grown).

Additionally, we compute two measures of interest using for schools and polling stations. First,

we compute the number of schools, students, and polling stations in each cell. Second, to

provide a per capita measure, we compute Voronoi polygons around each primary school, sec-

ondary school, and polling station. These polygons proxy for the catchment area of polling

stations and schools. We assign the population of the Voronoi polygon to the cell in which the

school or polling station is located, thus ensuring that there are no double counts of population

and that cells with 0 population are not dropped from the analysis.44

Excluded cells We compute the distance from each cell to the closest boundary of interest. We

then drop the cells that should not be considered as counterfactual areas. Cells that are located

in land reform parcels that are not conventional program areas (e.g. haraka schemes) or in

land reform parcels that were allocated later than 1972 are excluded from the control group.

We drop cells that fall into protected areas such as forests and cells that fall into water bodies

(lakes, main rivers). Additionally, we drop cells that are crossed by a boundary of interest,

as we cannot know whether features assigned to the cell are in the treatment area or in the

counterfactual area.45

Measurement error & Resolution For a spatial discontinuity analysis, the size of the cells

should ideally exceed the resolution of the underlying data. If, for example, soil quality is only

measured at 5km x 5km resolution, we will not detect any real discontinuity on the ground at

250m x 250m resolution. This is not a concern for our analysis, as we do not use data that was

measured at a resolution that exceeds the resolution of the cells.

A related issue is measurement error. In general, classical measurement error will attenuate

any discontinuity on the ground. Two kinds of measurement error could be a concern for our

analysis. First, there could be measurement error in the outcome data that we use. Regarding

44Voronoi polygons sometimes cross across the boundaries under study. We argue that there is no reason that thecatchment area of schools would stop at the border between program areas and other areas, as schools are not aneasily excludable public good. Using the population of the cell to proxy for the number of inhabitants yields similarresults to using Voronoi polygons, though cells with 0 population are dropped from the per capita analysis.

45This also addresses measurement error close to the border.


geocoded places (schools, polling stations), it seems likely that measurement errors in coordi-

nates is random, and hence that it would not bias our estimates. Coordinates of DHS clusters

have built-in measurement error, as the clusters are displaced by up to 5km in rural areas (up to

10km for 2% of these clusters). This measurement error should not affect program areas more

than other areas but is likely to blur the discontinuity.

Second, there could be measurement error in the boundaries of the program areas. This type

of measurement error seems unlikely, as the boundaries of the program areas were digitized

based on official Registry Index Maps (RIM) that were obtained from Survey of Kenya by the

National Land Commission (NLC) in 2018.46 We confirm that the boundaries match geograph-

ical features observed in Open Street Map and coincide well with plot boundaries from the

British National Archives.

3.5 Results

We compare areas close to the border separating the program areas from the European Set-

tler Area left to market forces (counterfactual). We first present results on a pooled sample of

program areas, then we discuss heterogeneity according to the expected income potential (at

allocation time) of program areas.

3.5.1 Validity checks: Ethnicity and pre-treatment comparison

Ethnicity & land reform program The effect of the land reform was a striking change of the

ethnic composition within the program areas (Figure A-3.2).

Table 3.2 shows the results of a regression of ethnic fractionalization (ELF)47 on the share of the

administrative unit (1962 locations and 1989 sublocation) that is located within the European

settler areas (but not within the program areas) and the share of the administrative unit that is

located within program areas. In 1962, European settler areas are significantly more ethnically

diverse than the African Land Units (column 1). The future program areas are also more diverse

than the African Land Units and appear to be as diverse as other parts of the European settler

areas (p-value: 0.15, column 1).

When the European Settler Areas are further split to consider separately areas that were never

included in any program and future land reform or resettlement parcels, the program areas

46The Registry Index Maps are official, legal maps of the plots in Kenya that were produced by surveyors of theGovernment of Kenya. These maps were kept in paper form at Survey of Kenya. These maps were geolocalizedthrough a joint effort of the NLC, the Spatial Analysis Lab of the University of Richmond, and the LSE in 2018-2020.The work took place in Nairobi, Richmond, and London.

47ELF equals one minus the Herfindahl index of ethnic group shares. It expresses the probability that two ran-domly selected individuals from that administrative unit belonged to different ethnic groups.

3.5.1 Validity checks: Ethnicity and pre-treatment comparison 125

appear a bit less diverse than the non-program areas (p-value: 0.04, column 2).

Table 3.2: Is the program associated with ethnic homogeneisation?

(1) (2) (3) (4) (5) (6) (7) (8) (9)Dependent variable ELF 1962 ELF 1962 ELF 1962 ELF 1962 ELF 1989 ELF 1989 ELF 1989 ELF 1989 ELFa

Unit Location Location Location Location Sublocation Sublocation Sublocation Sublocation DHS cluster

% in non-program European settler areas 0.534*** 0.519*** 0.533*** 0.534*** 0.378*** 0.379*** 0.378*** 0.378***(0.0322) (0.0337) (0.0322) (0.0322) (0.0632) (0.0668) (0.0632) (0.0632)

% in program area 0.390*** 0.313*** 0.0484 0.0491(0.0905) (0.0949) (0.0664) (0.0617)

% in ethnic program area 0.591*** 0.0285(0.184) (0.0614)

% in non-ethnic program area 0.333*** 0.0686(0.101) (0.0870)

% in program area (high income) 0.441* 0.0564(0.241) (0.0407)

% in program area (low income) 0.373*** 0.0458(0.116) (0.0852)

% in land reform parcels (<1973) -0.432 0.0380(0.573) (0.341)

% in land reform parcels (≥1973) 0.627*** -0.0520(0.180) (0.338)

In program area (DHS) -0.345(2.777)

In non-program European settler area (DHS) 16.15***(1.234)

Area (in thousands km2) 200.0 216.2 198.2 200.0 1050.7 1050.7 1054.9 1050.6(145.6) (144.0) (145.5) (145.7) (825.4) (825.7) (825.3) (825.5)

Lat & long (quadratic) Yes Yes Yes Yes Yes Yes Yes Yes Yes

Mean ELF 14.82 14.44 41.02s.d. ELF 21.30 20.77 23.42Mean % in program area 1.75 2.60p-valueb 0.15 0.04 0.01 0.01p-valuec 0.21 0.82 0.55 0.91N 462.00 462.00 462.00 462.00 3,500.00 3,500.00 3,500.00 3,500.00 2,342.00

Results from a OLS regression in which the outcome variable is the ethnolinguistic fractionalization (ELF) index. The higher its value, the more diverse the area. Incolumns 1 to 4, the sample is made up of all the locations (administrative units) that could be matched with the 1962 census data. In columns 5 to 8, the sample ismade up of all the sublocations (administrative units) that could be matched with the 1989 census data. Locations (1962) and sublocations (1989) are the smallestadministrative units for which ethnic data is available. In column 9, the sample is made up of all the surveyed clusters from the Demographic and Health Surveysconducted in Kenya in 2003, 2008-09, and 2014. Ethnic categories were recoded to be the same in both censuses and in the DHS surveys.% in European settler area: This is computed by excluding the program area. The % in European settler area coefficient can hence be interpreted as the difference in ethnicfractionalization between the European settler area (without the program area) and the African Land Units. The % in program area can be interpreted as the differencein ethnic fractionalization between the program area and the African Land Units in columns 1, 2, 5, and 6.The coefficient on % in land reform parcels (¡1973) can be interpreted as the difference in ethnic fractionalization between non-program land reform parcels (allocatedbefore 1973) and the African Land Units. The coefficient on % in land reform parcels (≥ 1973) can be interpreted as the difference in ethnic fractionalization betweennon-program land reform parcels (allocated after 1973) and the African Land Units.In columns 3 and 7, the coefficient % in ethnic program area captures the level of ethnic fractionalization for the program area coded as “ethnic” (according toDepartment of Settlement (various)) and % in non-ethnic program area captures the level of ethnic fractionalization for the program area coded as “non-ethnic” (thosefor which no information regarding an ethnic criterion was found). In columns 4 and 8, the coefficient % in program area (high income) captures the level of ethnicfractionalization for high income potential program area and % in program area (low income) captures the level of ethnic fractionalization for low income potentialprogram area (Department of Settlement, various).The area of each administrative unit is added to the regression. The DHS clusters are points and we do not know the size of the area surveyed. Quadratic controls forthe latitude and longitude of the centroid of each location, sublocation, and DHS cluster are also added to the regression. Standard errors are clustered at the region(columns 1 to 4) or at the province (columns 5 to 8) level. Regions in 1962 and provinces in 1989 are equivalent to level 1 (highest level) administrative boundaries.Significance levels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.

b Test of equality of coefficient estimated for % in European settler area and % in program area.c In columns 3 and 7: Test of equality of coefficient estimated for % in non-ethnic program area and % in ethnic program area. In columns 4 and 8: Test of equality of

coefficient estimated for % in program area (low income) and % in program area (high income).

When considering separately ethnic program areas and non-ethnic (i.e. program areas for

which no explicit mention of an ethnic criterion was found in archival documents) program

areas, the level of ethnic fractionalization is the same in both types of program areas pre-reform

(p-value: 0.21, column 3). When considering separately (future) high income potential program

areas and (future) low income potential program areas, the level of ethnic fractionalization is

the same in both types of program areas pre-reform (p-value: 0.89, column 4). Program areas

were hence, before the implementation of the program, ethnically diverse areas.

In 1989, after the reform was implemented, the former European settler area remains more

ethnically diverse than the former African Land Units (column 5). The program areas became


as homogeneous as the former African Land Units, which is consistent with reports from Leo

(1984): The coefficient associated with having 1% of the area under study drops from 0.39 in

1962 to 0.05 in 1989 and is not significant.48 The coefficient associated with program areas

is significantly different from the one associated with non-program areas including when we

control for future land reform or resettlement parcels (p-value: 0.01, columns 5 and 6). When

considering separately ethnic program areas and non-ethnic program areas, the level of ethnic

fractionalization is the same in both types of program area post-reform (p-value: 0.55, column

7). This finding indicates that program areas that were not necessarily classified as “ethnic”

were also allocated on an ethnic-based criterion. Both types of program areas are as homoge-

neous as the former African Land Units. When considering separately (future) high income

potential program areas and (future) low income potential program areas, the level of ethnic

fractionalization is the same in both types of program areas (p-value: 0.91, column 8). This

indicates that the ethnic-based criterion was used in both the low income potential program

areas and the high income potential program areas.

As an additional test of the association between program areas and low levels of ethnic diver-

sity, we report in column 9 results of a regression of ethnic fractionalization at cluster-level on

whether the cluster is located in a program area, in the former European settler area (but not

within the program areas), or in the former African Land Units (reference category). Results

from this regression are in line with the results from the 1989 census: the former European set-

tler areas remain more ethnically diverse than the former African Land Units and the program

areas became as homogeneous as the former African Land Units.49.

Additionally, we report results from RD plots (Figure 3.3) that show that the differences in

ethnic fractionalization are also found at the border under consideration. Figure 3.3a shows

the average cluster-level ethnic fractionalization within distance bins for clusters that are within

5km of the boundary between program areas and counterfactual areas. There is a “jump” at

the border, which indicates that sorting on ethnicity the counterfactual side is unlikely to be a

concern: It does not seem to be the case that people only bought land close to the program areas

if they were of the same ethnic group as beneficiaries of the program.50 As an additional check,

48The spatial units do not have the same size in each census, which may cause changes in coefficients estimated.For instance, the coefficient associated with non-program European settler area decreases from 0.53 to 0.37, a changethat might be due to the fact that units considered in 1989 are smaller (and hence less likely to group ethnicallyhomogeneous areas that do not have the same ethnic majority). The magnitude of the change in the non-programEuropean settler area coefficient is much smaller than the change in the program area one, indicating that the changeis not only driven by changes in the size of the units of analysis. This change in units of analysis does not affectwithin census comparisons.

49The DHS clusters in rural areas were randomly displaced by up to 5 km. As a result, some DHS households arewrongly assigned to treatment or counterfactual areas. As this built-in measurement error is random, it should notbias our results towards finding a difference between program areas and other areas.

50DHS clusters are displaced by up to 5km in rural areas (10km for 2% of clusters) and thus this measurement

3.5.1 Validity checks: Ethnicity and pre-treatment comparison 127

Figure 3.3: Ethnic fractionalization at the border (DHS)

(a) Program area/counterfactual.

0

20

40

60

80

-5 0 5Distance to border of program area (negative = untreated Scheduled Areas)

Sample average within bin Polynomial fit of order 4

RD Plot: Ethnic diversity (counterfactual)

(b) Program area/ALU.

-50

0

50

100

150

-10 -5 0 5 10Distance to border of program area (negative = former ALU)

Sample average within bin Polynomial fit of order 4

RD Plot: Ethnic diversity (ALU)

we also report results on cluster-level ethnic fractionalization for clusters that are within 5km of

the boundary between program areas and African Land Units. Figure 3.3b reports the results.

Given the small number of clusters included in this analysis, the level of ethnic diversity is

not precisely estimated. If anything, the level of ethnic diversity appears to be higher in the

former African Land Units than in the program areas, thus indicating that the program areas

are indeed less diverse than the rest of the country.

Table 3.3: Program areas vs. conterfactual – Altitude & pre-treatment characteristics

(1) (2) (3) (4) (5) (6) (7) (8)Outcomes Pop. places 1955 Facility 1955 Market 1955 Health center 1955 School 1955 Well 1955 Altitude (m) Altitude (m)

Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Average (in cell) Average (in cell)Data-driven bandwidth

RD Estimate -0.000134 -0.000170 0 0 0 0 -24.44*** 0.479(0.000205) (0.000156) (.) (.) (.) (.) (3.720) (7.730)

Mean DV 0.0000992 0.0000248 0 0 0 0 1917.3 1917.3s.d. DV 0.00996 0.00498 0 0 0 0 303.8 303.8Polynomial 1 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5 1.340Eff. N 65362 65362 65362 65362 65362 65362 65362 21057

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitudeof cell centroids). The coefficient RD Estimate captures the difference between the program areas (treatment) and the neighboring areas in the former European settler area(counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the programareas than in the counterfactual areas.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Altitude and pre-treatment characteristics. As shown in Table 3.3, there are no differences

between program areas and counterfactual areas when considering pre-treatment variables

(columns 1 to 6). There are no schools, wells, health centers, or markets within 5km of the

boundary studied. Populated places and as well facilities are also extremely rare. In 1962,

there were no cities close to the border under study (results available from the authors). Alti-

tude appears to be significant when we use the 5km bandwidth (column 7) but is not significant

when using a smaller, data-driven bandwidth (column 8), thus indicating that the difference in

altitude is not coming from a discontinuity in altitude at the border.

error is likely to result in a lesser discontinuity in ethnic fractionalization close to the border.


3.5.2 Effects of program

After 20 years Table 3.4 reports results from outcome variables measured in 1964 and in 1978.

There are no differences between the program areas and the counterfactual areas.

Table 3.4: Program areas vs. counterfactual – Short run outcomes

(1) (2) (3) (4) (5) (6)Outcomes Pop. places 1964 Facility 1964 Market 1964 Health center 1964 School 1964 Well 1964

Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)

RD Estimate 0.000459 -0.000154 0.000203 0 -0.0000242 0(0.000312) (0.000384) (0.000301) (.) (0.000135) (.)

Mean DV 0.000198 0.000231 0.0000909 0 0.0000579 0s.d. DV 0.0147 0.0163 0.00954 0 0.00761 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362

Outcomes Pop. places 1978 Facility 1978 Market 1978 Health center 1978 School 1978 Well 1978Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)

RD Estimate 0.000303 -0.000242 0.000396 0 0.000128 0(0.000344) (0.000393) (0.000362) (.) (0.000204) (.)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the pro-gram areas (treatment) and the neighboring areas in the former European settler area (counterfactual). A positive coefficient onRD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the pro-gram areas than in the counterfactual areas.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.

After 40 years Table 3.5 reports results for contemporaneous outcomes. We find that pop-

ulation density is higher in the program areas than in the counterfactual areas and that this

difference is found in the number of buildings too (columns 1 and 2, Panel A). However, the

magnitude of the difference is small: 0.12 more people per cell is equivalent to having 2 more

people per square km2. There are no differences in the number of primary and secondary

schools, in the number of students, or the number of polling stations (columns 4 to 7, Panel

A). There are no differences when using per capita measures of schools, students, and polling

stations (columns 4 to 7, panel B). This indicates that public good provision is on part with pop-

ulation levels in 2007. We find no differences in the number of schools in 1964, 1978, and 2007:

this indicates that program and counterfactual areas are on the same path regarding school

provision.51

51Positive and significant results on school provision in 1964 and 1978 would have indicated that the programareas had more schools in the short run but that the counterfactual areas caught up with them in the longer run.This is not the case.

3.5.2 Effects of program 129

Table 3.5: Program areas vs. counterfactual – Long run outcome measures

(1) (2) (3) (4) (5) (6) (7)Panel A Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations

Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate 0.118* 0.0281*** -0.000208 -0.000132 -0.496 0.210 -0.00165

(0.0610) (0.00737) (0.00198) (0.000956) (0.818) (0.219) (0.00183)

Mean DV 4.163 0.670 0.0138 0.00341 4.540 0.599 0.00723s.d. DV 3.905 0.470 0.127 0.0619 54.81 15.29 0.116Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362 65362

Panel B Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate -0.00000276 -0.000000108 -0.000384 0.0000209 -0.000000992(0.00000237) (0.000000237) (0.000646) (0.0000466) (0.000000876)

Mean DV 0.0000119 0.000000798 0.00323 0.000132 0.00000307s.d. DV 0.000145 0.0000176 0.0452 0.00401 0.0000712Polynomial 1 1 1 1 1Bandwidth 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362

Panel C Cell in Non-agri Small field Medium field Large field Tea Coffeefield field (<2 ha) (2-5 ha) (≥ 5 ha)

RD Estimate 0.00472* -0.00637*** 0.0919*** 0.0915*** -0.172*** -0.0225*** 0.00332**(0.00249) (0.000770) (0.00739) (0.00650) (0.00552) (0.00226) (0.00167)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009).The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.

Income & Inequality Considering the field size measure, the cells in the program areas are

more likely to be part of small and medium fields (columns 3 and 4, panel C) and less likely

to be part of fields bigger than 5 hectares (column 5). The coefficients associated with the

likelihood that tea and coffee are grown are statistically significant but of small magnitude,

and concern crops that are unlikely to be grown close to the boundaries under study: Tea

and coffee are grown in less than 2% of the cells each. These differences are unlikely to be

driving large differences in income between program areas and counterfactual ones.52. If we

consider that field size is a proxy for farm size,53 then the difference in field size across the

border indicates that the households living in the program areas might be, on average, worse-

off than the households living in the counterfactual areas. Additionally, given that about half

of the cells are located in small fields, it seems that the program areas are less unequal than the

counterfactual areas, as cells in the program area are less likely to be large cells.

52These differences in the likelihood to grow tea and coffee are not explained by differences in field size.53While field size does not equal farm size, we found a strong positive correlation using data from the 2015/2016

Kenya Integrated Household Budget Survey (Kenya National Bureau of Statistics, 2015/2016): the correlation coef-ficient between farm size and field size is 0.74.


Spillovers & catchment area of schools Another issue in interpreting the results is that there

are potential spillovers from program areas to the counterfactual areas. The main spillover is

whether schools on the program side serve children on both sides of the border. As there are

no differences in the number of schools, this does not seem to be the case. Per capita measures

take into account the fact that catchment areas of schools (measured by Voronoi polygons) do

not necessarily stop at the program area border.

Table 3.6: Program areas vs. conterfactual – Sponsor of schools

Program areas Counterfactual Differencet-testa

Primary schoolsCentral gvt 24.30 29.23 -4.93+Local gvt 0.93 1.43 -0.49Community 1.25 1.78 -0.54Private individual 22.12 22.10 0.01Religious organisation (public) 6.23 5.35 0.88Religious organisation (private) 41.74 34.22 7.52**Private ind. org. / NGO 2.80 5.53 -2.72*Unknown/others 0.62 0.36 0.27

Number of schools 321 561 882

Secondary schoolsCentral gvt 28.38 16.78 11.60**Local gvt 0.00 2.01 -2.01Community 10.81 4.70 6.11*Private individual 6.76 18.12 -11.36**Religious organisation (public) 4.05 4.03 0.03Religious organisation (private) 45.95 53.02 -7.07Private ind. org. / NGO 2.70 1.34 1.36Unknown/others 1.35 0.00 1.35

Number of schools 74 149 223

Schools located within 5km of the boundary under study (estimation sample).The difference in the number of schools is explained by the difference in the num-ber of cells in the program areas and in the counterfactual areas.a Column (3) reports the results of a t-test of differences in means. Significancelevels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.

Quality of schools (sponsors) We provide results on the sponsor of each school in 2007. Ta-

ble 3.6 presents a description of the sponsor of schools included in the estimated sample (schools

located within 5km of the border under study). As the number of schools is the same across

the boundary (and across time), the descriptive statistics in percentage should be interpreted

as substitution effects within types of schools.54 Primary schools in the program areas are more

54Our data points on schools are 1964, 1978, and 2007. There might have been differences in the number of schoolsin the 1980s and 1990s that we are not capturing. The difference in the share of school sponsors between treatment

3.5.2 Effects of program 131

likely by 7.5 percentage points to be private religious schools than schools in the counterfactual

area are. Primary schools in the counterfactual areas are more likely to be government schools

or to be funded by private individual organizations or by NGOs than schools in the program

area are. Regarding secondary schools, schools located in the program areas are more likely to

be financed by the central government and by the community, and less likely to be financed

by private individuals. As harambee schools have slowly been integrated into the formal school

system, it would be interesting to know whether the government schools were initially com-

munity schools. While the typical harambee school was of lower quality than other types of

schools, without additional information we cannot conclude regarding the quality of schools.

These differences in the school sponsor might reflect differences in the cost of providing schools

depending on the level of ethnic diversity (i.e. religious institutions may find it less costly to

provide schools in homogeneous areas than in diverse ones).

Border segment analysis: Income potential of program areas We now consider two bound-

ary categories: i) the boundary between low income potential program and neighboring coun-

terfactual areas and ii) the boundary between high income potential program and neighboring

counterfactual areas (Map A-3.1 depicts the potential income of program areas, as reported in

the archival documents.).55

Panel A of Table 3.7 presents the results for low potential income program areas. These areas

are not different from the counterfactual areas, except for the number of buildings (Panel A1,

column 2). The results on field size are similar to the ones on the pooled sample (Table 3.5).

Panel B of Table 3.7 present the results for high potential income program areas. These areas are

more populated than the neighboring counterfactual area. This positive population difference

is unexpected, given that high income potential program areas were initially program areas

in which the population density was meant to be lower.56 The number of secondary schools

and secondary school students (per capita) is higher in these areas. This finding could indicate

that, in higher income potential program areas, people were more successful and built more

schools. We cannot know whether the positive coefficient on the number of secondary school

students per capita is due to stronger preferences for education in the program area (demand

side) or if the schools now have a larger catchment area and more students. The results on field

size are similar to the ones on the pooled sample (Table 3.5). Both the high income potential

and control areas is in line with the idea that program areas might have financed more community-based religiousschools. The central government and NGOs would then have stepped in to provide schools in counterfactual areas.We plan to collect school censuses from these missing decades to test for this hypothesis.

55Belong to a high income potential program area rather than a low income potential area is not random: it islikely that there are differences in soil quality between these program areas. However, if soil quality varies smoothlyat the border, both these program areas can be compared to their respective neighboring areas.

56Program beneficiaries may have needed to hire more laborers which would drive the population estimates up.


Table 3.7: Program areas vs. conterfactual – Border segment analysis (income potential)

(1) (2) (3) (4) (5) (6) (7)Panel A: Low potential income < 100 £/year

Panel A1 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate -0.0105 0.0165** -0.00000398 -0.000000356 -0.000619 0.0000137 -0.00000144(0.0665) (0.00819) (0.00000285) (0.000000265) (0.000766) (0.0000573) (0.00000105)


Panel A2 Cell in Non-agri Small field Medium field Large field Tea Coffeefield field (<2 ha) (2-5 ha) (≥ 5 ha)

RD Estimate 0.00868*** -0.00512*** 0.0822*** 0.100*** -0.169*** -0.0116*** 0.00614***(0.00311) (0.000622) (0.00841) (0.00743) (0.00588) (0.00132) (0.00211)


Panel B: High potential income ≥ 100 £/year

Panel B1 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate 0.473*** 0.0471*** 0.00000240 0.000000965* 0.00104 0.0000807* 0.000000621(0.143) (0.0170) (0.00000416) (0.000000522) (0.00110) (0.0000481) (0.00000139)


Panel B2 Cell in Non-agri Small field Medium field Large field Tea Coffeefield field (<2 ha) (2-5 ha) (≥ 5 ha)

RD Estimate 0.000249 -0.0150*** 0.0926*** 0.0470*** -0.124*** -0.0400*** 0(0.00195) (0.00302) (0.0134) (0.0124) (0.0139) (0.00732) (.)

Mean DV 0.990 0.00395 0.416 0.251 0.319 0.0574 0s.d. DV 0.0981 0.0532 0.468 0.413 0.440 0.214 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 11301 11301 11301 11301 11301 11301 11301

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students areextracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013).Per capita estimates are computed using Voronoi polygons to compute population estimates associated with each school and polling station.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.

area and the low income potential program area appear to be worse off than their respective

neighboring areas. It is worth noting that the cells close to the high income potential border

are more likely to belong to large fields (32% of cells in the estimation sample of panel B) than

cells close to the low income potential border (18% of cells in the estimation sample of panel

A) and the magnitude of the coefficient associated to large fields is smaller in panel B than

in panel A. These findings indicate that the households living on high income program areas

are likely to be better off than the households living on low income program areas, both in

absolute terms and when compared to the counterfactual areas. This finding also holds when

looking at descriptive statistics on field size in high and low income potential program areas

(see Table A-3.2 in Appendix).

3.6 Discussion 133

Border segment analysis: Ethnic majority We investigate whether the results are driven by

differences across ethnic groups (defined as the majority ethnic group associated with each

border segment) that would be averaged out in the main specification. This is not the case:

the results are robust to including ethnic segment fixed effects (Table A-3.3, in Appendix).57

We also present the results when considering separately border segments according to the ma-

jority ethnic group each segment is associated to (Table A-3.4, in Appendix). The number of

schools and polling stations remain insignificant when considering separately ethnic boundary

segments associated with Kikuyu, Kalenjin, and Luhya.58 The number of primary school stu-

dents per capita is weakly significant and negative in Kikuyu program areas. We also report

the level of school provision in the program areas associated with these three ethnic groups (Ta-

ble A-3.5, in Appendix). There are no significant differences in per capita number of primary

schools and primary school students across these program areas: the difference in the number

of primary schools between Kalenjin and Luhya program areas is not found in the number of

primary school students, indicating that primary school provision is on par with population

levels. The per capita numbers of secondary schools and secondary school students is signifi-

cantly higher in Kikuyu program areas than in Kalenjin and Luhya program areas. This result

is in line with results from Simson and Green (2020) that show that Kikuyu individuals are

more likely to have attended secondary school.

3.6 Discussion

The results presented above are to be interpreted as the effects of the program under study. In

this section, we discuss possible interpretations of these results in light of theories related to

ethnic fractionalization and public good provision. We then discuss whether the effects could

be explained by the direct effects of the land reform itself. Information on school sponsors is

difficult to interpret in the absence of earlier data points. In this section, we focus on the results

on the number of schools and not the results on school sponsors.

3.6.1 Ethnicity and school provision

Community-level explanations

We can think of four channels linking ethnic homogeneity to higher levels of public good pro-

vision that are possibly relevant to the areas we study: a preference for mixing with co-ethnics;

57The boundary lines are divided into 10 meter long segments. Each boundary segment is then assigned to theethnic group that is the largest in the location the segment is located in, using the ethnicity data from the 1989census.

58We report results for these three ethnic groups as they correspond to the ethnic boundary segments associatedwith a high number of cells.


similar preferences over the type of public good to be funded; homogeneous returns to invest-

ments across ethnic lines; social sanctions being easier to enforce.59 These channels would let

us expect higher levels of public good provision in the program areas compared to the coun-

terfactual areas. Yet we do not find such a difference in 2007. Why not?

Resettlement on both sides of the border Network density A possible explanation to the null

result that we find is that we compare two areas in which resettlement took place: the loss of

social ties may have made cooperation more difficult and social sanctions less enforceable on

both sides of the border, thus resulting in similar levels of school provision on both sides of

the border. This interpretation is consistent with the idea that the level of ethnic diversity in

fact proxies for the strength of social networks within communities (Eubank et al. (2019) for an

empirical test of this hypothesis, first mentioned in Miguel and Gugerty (2005) and Fearon and

Laitin (1996)). Dense social networks allow communities to share information and hold politi-

cians accountable for the delivery of public goods. Newly resettled individuals are probably

less likely to share such a dense social network: this could make social sanctions or collective

action difficult to implement. As individuals on both the treatment and the counterfactual side

lack this dense social network, independent of the ethnic diversity level of their community,

we do not find an effect.

Self-section on willingness/ability to cooperate with strangers As people voluntarily moved to the

former European settler area, it is possible that they self-select according to their willingness

and ability to cooperate with strangers in general, thus resulting in no difference in public good

provision.

Sorting on the counterfactual side (at program implementation date) As our main result is that the

program area and the counterfactual area do not have different levels of school provision, we

are concerned by the fact that public good provision could be higher on the counterfactual than

”predicted” from the level of ethnic fractionalization because of sorting in the counterfactual

area on preference for ethnic diversity. Individuals who have a higher preference for ethnic

diversity would be more likely to participate in community activities (Alesina and La Ferrara,

2000). Sorting on similar preferences for public good provision level would also result in addi-

tional cooperation (Miguel and Gugerty, 2005).60 However, it is unlikely that individuals who

sought to buy land in the former European settler areas could anticipate what the ethnic com-

position of the area would be, as individuals and families were mostly trying to secure access to

59See Miguel and Gugerty (2005) for a review of theories of ethnic diversity and collective action.60As the preferences of public good provision varies within ethnic groups (Alesina et al., 1999), sorting on prefer-

ences for public goods does not imply sorting across ethnic lines.

3.6.2 Alternative explanations 135

land and that the ethnic composition of the counterfactual area was not known ex ante. Tiebout

sorting ex post is similarly unlikely in a context, rural Kenya, where residential mobility is low

in rural Kenya.61

Political economy: Allocation of schools by government

A positive effect of ethnic homogenization on public good provision could have been balanced

by higher levels of government investments in education in the counterfactual areas. Two types

of political economy channels could explain why the government could have provided more

schools to counterfactual areas. To secure votes, governments could have targeted the provi-

sion of schools in non-program areas either as compensation to households that had not been

able to benefit from the reform or as an investment in more diverse communities in which elec-

tions are likely to be more contested if there is ethnic voting (Hassan, 2017; Horowitz, 2019).62

This interpretation could be supported by results on primary school sponsors but does not hold

for secondary schools. Further developments of this paper will aim to understand the differ-

ence in school providers across treatment and counterfactual areas as well as the differences

between secondary and primary school results. It may be that the mechanisms that drive the

provision of primary schools are not the same as the ones that drive the provision of secondary

schools.

3.6.2 Alternative explanations

To interpret the results in terms of ethnic homogenization two additional assumptions are

needed. The first is that there was no direct impact of land reform on the outcomes we are

considering. The second is that individuals on both sides of the border were selected similarly.

We discuss these assumptions below. An additional channel would be that only one area has

convenient access to polling stations (and thus the ability to support or sanction the govern-

ment), however, as there are no differences in the number of polling stations (in absolute terms

and per capita) we can rule this explanation out.

Direct impact of the land reform program The land reform program could have affected the

provision of schools either through income effects or through redistribution effects (inequal-

ity). Regarding the former, if the border is indeed random at the local level, we do not expect

61Rhode and Strumpf (2003) argue that long-run trends in geographic segregation are inconsistent with Tieboutsorting models (Tiebout, 1956) where residential choice depends solely on local public goods.

62However public good provision is a costly strategy and to the best of our knowledge there is no evidence ofsuch targeting of public goods to buy votes. Harris and Posner (2019) argue, using Kenya as a case study, thatdevelopment projects (including school-related projects) are more allocated according to needs than as a reward topolitical supporters.


differences in soil quality. To the best of our knowledge, the resolution of the available soil

quality data does not allow us to test that soil quality does not vary at the border.63 We argue

that the economic benefits of the reforms were limited by the fact that the land had to be bought

by the beneficiaries, and was not given to them. As the participation in the program was vol-

untary and that the land was bought, then we can assume that the expectation was not that

the benefits would be large. (Field sizes are smaller, but if the loans have not been fully reim-

bursed, the beneficiaries would have been able to invest more money in their farm.) This direct

positive income effect would be more consistent with higher levels of public provision on all

types of program areas but does not explain the null result we find. Similarly, the reduction in

inequality would be associated with an increase in public good provision.

Finally, the results using the field size suggest that households in program areas might be

poorer than their counterparts. This could indicate a negative income effect that could coun-

teract the (theoretically) positive effect of ethnic homogenization and hence explain our results.

To explore this hypothesis, we conduct a mediation analysis (Table 3.8). The results on schools

and polling stations remain insignificant when controlling for field size which indicates that a

negative income effect is not what is driving the null result we observe.

Selection of individuals/beneficiaries (income effects) We do not have much information on

the individuals selected to be program area beneficiaries, but evidence from Leo (1984, p. 82)

suggests that the selection on income and ability was stronger on high income potential areas

than in low income potential areas. We do find more students attending school in high income

potential program areas compared to the counterfactual areas. This is consistent with income

effects coming from the ability of beneficiaries (skills and ability to make investments), which

would be consistent with the result on field size. These more skilled beneficiaries are likely to

also have a higher demand for education, which could explain the higher number of students

per capita in these areas.

An alternative explanation is that households that settled on the counterfactual areas have

been positively selected regarding income and skill compared to their counterparts in the low

income potential program areas. The intuition is that these households might have to pay more

upfront to buy land plots (or convince a bank to loan them money, which might have required

more credentials than the inclusion in the government-run land redistribution program). How-

ever, these areas do not have more community schools than the program areas: this channel

thus seems unlikely to have been at work.

63Available data on crop suitability (FAO/IIASA, 2011) is at the 5km by 5km resolution in Kenya.

3.7 Extensions 137

Table 3.8: Program areas vs. counterfactual – Mediation analysis: Field size and long runoutcome measures

(1) (2) (3) (4) (5)Panel A Nb primary Nb secondary Nb students Nb students Nb polling

schools schools primary secondary stationsRD Estimate -0.000877 -0.000273 -0.725 0.176 -0.00213

(0.00198) (0.000956) (0.818) (0.219) (0.00183)Field size controls Yes Yes Yes Yes Yes


Panel B Nb primary Nb secondary Nb students Nb students Nb pollingschools schools primary secondary stations

RD Estimate -0.00000318 -0.000000134 -0.000445 0.0000147 -0.00000118(0.00000237) (0.000000237) (0.000646) (0.0000466) (0.000000876)

Field size controls Yes Yes Yes Yes Yes


This table reports the results from a regression discontinuity specification. Location controls areincluded (third-order polynomial function of the latitude and the longitude of cell centroids).The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficienton RD Estimate indicates that, for the variable considered, the difference at the discontinuity ispositive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN(2016). Number of schools and number of students are extracted from the 2007 school cen-sus (Ministry of Education, 2009).The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to com-pute population estimates associated with each school and polling station.Data sources (field size): Data on field size were extracted from FAO (2000).The regression include field size fixed effects (small, medium, large). Standard errors are esti-mated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levelsare denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

3.7 Extensions

3.7.1 Other comparisons

Motivation – European settler area boundary We argued that the boundary between African

Land Units and the European settler areas was plausibly random at the local level. Indeed, as

Figure 3.1 shows, the African Land Units borders often followed straight lines. However, while


adjacent, the two areas differed at the time of the implementation of the land reform in many

respects. African Land Units were governed by customary land tenure. Africans could not

expand agricultural holdings into the European settler areas. As a consequence, population

growth led to very small landholdings. Moreover, the colonial government neglected these ar-

eas in the provision of public goods (Eynde vanden et al., 2018). Hence, a comparison between

African Land Units and program areas would not allow for a causal interpretation of the effect

of the program.64

Nevertheless, the comparison is interesting in its own right. The program areas extended the

regional base of ethnic groups across the Reserves (Leo, 1984, p. 120-4), so the two areas became

similar in ethnic dominance. Moreover, it has been argued that customary law spilled over to

the program areas. It is conceivable that the situation in the African land Units represents the

equilibrium and program areas 50 years on converged to the situation in African Land Units.

We can test this hypothesis by applying the same methodology to the border between program

areas and African Land Units. If true, we would find pre-treatment differences that become

smaller and disappear over time.

A natural cross-check is whether we find any differences between African Land Units and

Scheduled Areas. This serves as an external validity check. Naturally, the effect should be the

sum from the two effects above.65 Because the boundaries under study are different from the

ones used for the two other comparisons this may not be the case.

Program areas – African Land Units

Altitude and pre-treatment characteristics. As shown in Table A-3.6, there are no differences

between program areas and African Land Units, at the border, when considering pre-treatment

variables (columns 1 to 6). In 1962, there were no cities close to the border under study (results

available from the authors). Altitude appears to be significant when we use the 5km bandwidth

(column 7) but is not significant when using a smaller, data-driven bandwidth (column 8), thus

indicating that the difference in altitude is not coming from a discontinuity in altitude at the

border.

Effects of program – After 20 years Table A-3.7 reports results from outcome variables mea-

sured in 1964 and in 1978. In 1964, the program areas were less likely to have, at the border,

64Of course, if there were indeed no differences at the local level between African Land Units and Europeansettler areas, we may interpret any difference observed thereafter as being causally affected by “being included inthe European settler areas”. However, we would not be able to pin down the actual treatment.

65(counterfactual - African Land Unit) = (counterfactual - program area) + (program area - African Land Unit).Note that the results reported in section 3.5.2 are (program area - counterfactual).

3.7.1 Other comparisons 139

populated places, markets, and schools. This finding is consistent with the fact that the former

African Land Units had been more populated and populated for longer. The level of public

good provision is likely to be on par with the population (proxied by the number of populated

places). In 1978, the only significant difference is the number of government facilities, which

was still higher in the former African Land Units than in the program areas. This finding is

consistent with the idea that program areas are catching up over time with the former African

Land Units when we consider variables associated with the length of the settlement.

Effects of program – After 40 years/Contemporaneous outcomes Table A-3.8 reports results

for contemporaneous outcomes. We find that population density is lower in the program areas

than in the former African Land Units and that this difference is also found in the number of

buildings (columns 1 and 2, Panel A). This is consistent with the fact that the African Land

Units were more populated than the European settler area during the colonial period, and

seem to have remained so. There are no differences in the number of primary and secondary

schools, in the number of students, or the number of polling stations (columns 4 to 7, Panel

A). There are no differences when using per capita measures of schools, students, and polling

stations (columns 4 to 7, panel B). When considering field size (Panel C), the program areas

have, on average, fewer small fields (column 3) and more large fields (column 5). Tea is more

likely to be grown in the program area. Coffee is not grown in this part of the country. If we

consider that field size is a proxy for farm size, then the difference in field size across the border

indicates that the households living in the program areas might be, on average, better off than

the households living in the former African Land Units. Additionally, given that about 70% of

the cells are located in small fields, it seems that the program areas are more unequal than the

former African Land Units, as cells in the program area are more likely to be large cells.

Border segment analysis: Income potential of program areas We now consider two bound-

ary categories: i) the boundary between low income potential program and neighboring coun-

terfactual areas and ii) the boundary between high income potential program and neighboring

counterfactual areas (Map A-3.1 depicts the potential income of program areas, as reported in

the archival documents.)

Panel A of Table A-3.9 presents the results for low potential income program areas. These ar-

eas are not different from the African Land Units, except that the population is lower, but the

magnitude of the difference is small: 0.62 fewer people per cell results in 10 fewer people per

square km2 (Panel A1, column 1). There are no differences when we consider field size. Panel

B of Table A-3.9 present the results for high potential income program areas. These areas are


much less populated than the neighboring African Land Units, which indicates some persis-

tence over time, as the high income potential program areas were meant to be low population

density program areas: 2.75 fewer people per cell results in 44 more people per square km2.

When considering field size (Panel B2), the high income potential program areas have, on aver-

age, fewer small fields (column 3) and more large fields (column 5) as well as more tea-growing

fields.66. The average positive effect observed on the pooled sample is hence driven entirely

by the difference between the high income potential areas and the neighboring African Land

Units. This difference is likely to come from the African Land Units side (as the cells in the

estimation sample of panel B are much more likely to be in a small field than the cells in the

estimation sample of panel A). For both types of program areas, the provision of schools and

polling stations seems on par with population levels.

Interpretation We cannot attribute the effects observed to the colonial period (1901-1963) or

the post-colonial period. It is likely that the three areas considered – African Land Units, high

income potential program areas, low income potential areas – started from different levels of

population and public goods in 1963. However, it seems like the situation in the low income

potential program areas did converge to the one in the former African Land Units. The results

on the high income potential program areas suggest that these places are better off than the

former African Land Units. As these areas were selected, this success may be due to differences

in the program (notably initial plot size and selection of beneficiaries) or in soil quality. The

situation in these program areas has not converged to the one in the former African Land Units,

but as we have little data on plot or farm size over time, we cannot know whether the program

areas are in the process of converging to the situation in the former African Land Units, notably

with plots of land being subdivided at the time of inheritance.

Non-program European settler areas – African Land Units

Results As shown in Table A-3.10, the only difference across the boundary of the former

Scheduled Area, when considering pre-treatment variables and altitude is the number of pop-

ulated places, which was slightly higher in the former European settler area than in the African

Land Units. In 1964, there were more populated places, more markets, and more schools in the

African Land Units than in the former European settler area, a result that is the same as the one

observed when comparing the program areas to the African Land Units (Table A-3.11), which

indicates that the whole former European area had less of these types of facilities than the for-

mer African Land Units. In 1978, the former European settler area still had fewer schools than

66This effect is entirely driven by the high income potential program area in the Kisii region (South-West of Kenya)

3.7.2 Additional robustness tests 141

the former African Land Units. When considering contemporaneous outcomes (Table A-3.12),

we find that the non-program areas of the former European settler areas are still less populated

and have fewer buildings (columns 1 and 2, Panel A), a lower number of schools and students,

and a lower number of polling stations. These differences mostly spring from differences in

population level: the only difference that remains significant when considering per capita esti-

mates is the number of polling stations per capita (column 7, Panel B). Cells within the former

European settler area are less likely to be small and medium fields, and more likely to be large

fields, which corresponds to what can be expected, given that the African Land Units were

overpopulated and that plots were smaller there.

Comparison with results from other border comparisons We compare the estimate coeffi-

cients of Panel A (Table A-3.12) to the predicted coefficients based on Panel A of Table 3.5 and

Table A-3.8. The signs of the predicted and of the estimated coefficients on population and

buildings are the same. The estimated coefficients on the number of secondary schools, the

number of students for both types of schools, and the number of polling stations also have the

same sign as the predicted coefficient. The estimated and predicted coefficients are only dif-

ferent for the number of primary schools. The results hence go in the same direction whether

we use a simple difference (counterfactual vs African Land Units) or a double difference. This

finding suggests that there are no large spillovers from program areas onto neighboring ar-

eas. If there were large spillovers the double-difference results (spillovers included) would be

different from the simple-difference results (no spillovers).

3.7.2 Additional robustness tests

We test whether our main results are sensitive to bandwidth selection (and to the associated

specification error) by reporting regression results that follow the misspecification error correc-

tion procedure and data-driven bandwidth selection (Calonico et al., 2017). Table A-3.13 reports

the results for the main outcomes. The results on population and number of buildings carry

through and no difference is found when considering primary or secondary school outcomes.

3.8 Conclusion

Using a spatial regression discontinuity design, we study the short-run and long-run effects

of a land reform program that aimed to extend Kenya’s ethnic enclaves. We find a strong

discontinuity in ethnic diversity but no differences in school provision both in absolute and

in per capita terms between program areas and counterfactual areas. The land reform had a

lasting impact on field size, which we interpret as indicative of income and inequality effects.


These effects could potentially cancel or reinforce any effect from ethnic homogenization. A

mediation analysis, however, suggests that this is unlikely. Moreover, we studied two other

borders: i) the border between the program areas and African Land Units and ii) the border

between non-program European settler area and the African Land units. The results from this

first additional comparison suggest that the low income program areas tend to converge to the

situation in the African Land Units. Double differencing results compared to results from the

non-program border make spatial spillovers appear unlikely.

The results of the paper point to several next steps that should be taken. The difference between

program and counterfactual areas in school sponsors indicates that there might be differences

that our current measure (number of schools) is not capturing. Program areas might have

financed more community-based religious schools. As a result, the central government and

NGOs might have stepped in to provide schools in counterfactual areas. There might also be

differences in the number of schools in the 1980s and 1990s that we are not capturing (georef-

erenced data on schools is available only in 1964, 1978, and 2007, with the latter only being a

census). Hence, we plan to collect earlier school censuses that would allow us to study these

hypotheses. We also did not study school quality across the border. The available 2007 census

data contains some information regarding quality (e.g. available facilities, student to staff ratio)

that we plan to use to test whether sponsors indeed correlate with school quality.

Investigating further the timing of the settlement of new communities onto the former Euro-

pean settler areas could help to extend the discussion on sorting.

We will also do more to strengthen our claim of a quasi-random border. From archival docu-

ments held by the British National Archives, we digitized the plot boundaries before the land

reform was implemented.67 Using these maps, we will provide some information on the plots

that were included in the program areas and those that were not (plot size, access to a road,

date of inclusion in the program). We expect that these findings will confirm that the border of

the program areas is as good as random at the local level.

Our results may be specific regarding schooling and not carry forward to other types of public

goods. We, therefore, plan to include other georeferenced measures of public good provision

(e.g. roads, wells). To further strengthen the interpretation that a lack of group measure on

both sides of the border may explain our result, we may use information asked in the various

Afrobarometer surveys.

67FCO141-6917 contains maps of the areas around Kitale, Kisii, Ol’Kalou, Machakos. FCO141-6927 and FCO141-19109 covered the Nyarandua schemes. Plot boundaries are only available for farms in the European settler areas.

3.8 Conclusion 143

Another possible extension would be to use electoral results to test whether program areas vote

more for the incumbent party (constituencies are larger than program areas but we could run

the same type of analysis as for ethnic diversity). Future versions of this paper should discuss

alternatives to the ethnolinguistic fractionalization (polarization measures, Politically Relevant

Ethnic Groups (PREG) measure Posner (2004)).


Figure A-3.1: Map: Boundaries studied.

Appendix

Table A-3.1: Schools in Kenya

Primary schools Secondary schools

Central gvt 32.90 22.26Local gvt 2.58 1.91Community 1.93 5.90Private individual 18.52 16.85Religious organisation (public) 5.07 4.80Religious organisation (private) 33.52 44.22Private ind. org. / NGO 5.18 3.32Unknown/others 0.29 0.74

Number of schools 6118 1626

All schools in Kenya.

Appendix 145

Figure A-3.2: Ethnic fractionalization in Kenya (censuses)

(a) Census 1962: Ethnolinguistic fractionalization, only “Africans”, location-level data.

(b) Census 1989: Ethnolinguistic fractionalization, only Kenyan nationals,sublocation-level data


Table A-3.2: Field size in low and high income potential program areas

Low income potential High income potential Differenceprogram areas program areas t-testa

Share of cells that are located in

small fields 0.53 0.45 -0.08***medium fields 0.38 0.38 -0.01large fields 0.07 0.16 0.09***

Number of cells 20555 6430 26985

Cells located in program areas, within 5km of the boundaries under study(boundaries to counterfactual areas and to former African Land Units).a Column (3) reports the results of a t-test of differences in means. Significancelevels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.

Appendix 147

Table A-3.3: Program areas vs. counterfactual – Long run outcome measures (Ethnicityboundary FE controls)

(1) (2) (3) (4) (5) (6) (7)Outcome Population Buildings Nb primary Nb secondary Nb students Nb students Nb polling

2015 2015 schools schools primary secondary stationsRD Estimate 0.0647 0.0369*** -0.000152 -0.000442 -0.444 0.206 -0.00162

(0.0612) (0.00748) (0.00199) (0.000949) (0.827) (0.223) (0.00186)Ethnicity FEa Yes Yes Yes Yes Yes Yes Yes


Outcome Nb primary Nb secondary Nb students Nb students Nb pollingschools schools primary secondary stations

RD Estimate -0.00000257 -0.000000197 -0.000290 0.0000183 -0.000000890(0.00000243) (0.000000234) (0.000652) (0.0000474) (0.000000885)

Ethnicity FEa Yes Yes Yes Yes Yes


Outcome Cell in Non-agri Small field Medium field Large field Tea Coffeefield field (<2 ha) (2-5 ha) (≥ 5 ha)

RD Estimate 0.00580** -0.00725*** 0.113*** 0.0809*** -0.181*** -0.0198*** 0.00507***(0.00252) (0.000777) (0.00738) (0.00639) (0.00569) (0.00222) (0.00187)

Ethnicity FEa Yes Yes Yes Yes Yes Yes Yes


This table reports the results from a regression discontinuity specification. Location controls are included (third-orderpolynomial function of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the dif-ference between the program areas (treatment) and the neighboring areas in the former European settler area (counter-factual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discon-tinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number ofschools and number of students are extracted from the 2007 school census (Ministry of Education, 2009).The numberand location of polling stations is extracted from Maron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population esti-mates associated with each school and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).a The boundary lines are divided in 10 meter long segments. Each boundary segment is then assigned to the ethnicgroup that is the largest in the location the segment is located in, using the ethnicity data from the 1989 census.The regression include ethnicity fixed effect. Standard errors are estimated using a heteroskedasticity-robust nearestneighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.


Table A-3.4: Program areas vs. conterfactual – Border segment analysis (ethnic majority)

(1) (2) (3) (4) (5) (6) (7)

Kikuyu Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate -0.203* -0.0514*** -0.00000839 -0.000000648 -0.00203* -0.000136 0.000000119(0.117) (0.0170) (0.00000648) (0.000000709) (0.00116) (0.000166) (0.00000135)


Kikuyu Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate 0.0477*** -0.0187*** 0.140*** -0.0377*** -0.0356*** 0.00182*** 0.0197***

(0.00922) (0.00224) (0.0130) (0.00936) (0.00813) (0.000299) (0.00541)


Kalenjin Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate 0.432*** 0.0636*** -0.000000711 -0.000000274 -0.000448 0.0000573 -0.00000223(0.0965) (0.0110) (0.00000374) (0.000000320) (0.00100) (0.0000526) (0.00000161)


Kalenjin Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate -0.000430 0 0.00704 0.103*** -0.111*** 0.0000507 0

(0.00210) (.) (0.0145) (0.0135) (0.0138) (0.000706) (.)

Mean DV 0.993 0 0.314 0.374 0.305 0.000129 0s.d. DV 0.0818 0 0.434 0.467 0.434 0.00985 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 13366 13366 13366 13366 13366 13366 13366

Luhya Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita

RD Estimate -0.969*** -0.0380*** 0.00000171 0.000000103 0.00165 0.0000522 0.00000134(0.0967) (0.0136) (0.00000303) (0.000000237) (0.00160) (0.0000598) (0.00000126)


Luhya Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate -0.00631*** 0.0000130* 0.120*** 0.116*** -0.242*** -0.0204*** 0

(0.00180) (0.00000781) (0.0111) (0.00908) (0.00831) (0.00387) (.)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitudeand the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) and the neighboring areasin the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students are extracted fromthe 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013). Per capita estimates arecomputed using Voronoi polygons to compute population estimates associated with each school and polling station.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, **p<0.05, *** p<0.01.

Appendix 149

Table A-3.5: Descriptive statistics: Population, schools, and students by ethnic majority

(1) (2) (3) (4) (5) (6)Kikuyu Kalenjin Difference Luhya Difference Difference

program areas program areas Kalenjin-Kikuyua program area Luhya-Kikuyua Kalenjin-Luhyaa

Average population (by cell) 3.8016485 4.7233672 0.9217188*** 5.3424799 1.5408314*** 0.6191126***Avg nb of primary schools (per capita) 0.0000126 0.0000130 0.0000004 0.0000098 -0.0000028 -0.0000032*Avg nb of primary school students (per capita) 0.0029468 0.0030429 0.0000961 0.0040349 0.0010881 0.0009920Avg nb of secondary schools (per capita) 0.0000012 0.0000005 -0.0000007** 0.0000006 -0.0000006** 0.0000001Avg nb of secondary school students (per capita) 0.0002034 0.0000734 -0.0001300** 0.0000885 -0.0001149* 0.0000151

Number of cells 5240 9776 15016 6463 11703 16239

All cells included in a program area and that are close to a border segment coded either as Kikuyu, Kalenjin or Luhya. (The boundary lines are divided in10 meter long segments. Each boundary segment is then assigned to the ethnic group that is the largest in the location the segment is located in, using theethnicity data from the 1989 census.)a Columns (3), (5), and (6) reports the results of a t-test of differences in means. Significance levels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, ***p<0.01.

Table A-3.6: Program areas vs. African Land Units – Altitude & pre-treatment characteristics



RD Estimate -0.000289 0 0 0 0 0 -22.03*** 1.065(0.000570) (.) (.) (.) (.) (.) (7.065) (12.10)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitude ofcell centroids). The coefficient RD Estimate captures the difference between the program areas (treatment) and the former African Land Units (control). A positive coefficienton RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the program areas than in the former African LandUnits.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Table A-3.7: Program areas vs. African Land Units – Short run outcomes



RD Estimate -0.000236*** -0.00109 -0.000876** 0 -0.00249* 0(0.0000782) (0.000852) (0.000391) (.) (0.00144) (.)



RD Estimate 0 -0.00122 -0.000847** -0.000125 -0.00225 0(.) (0.000874) (0.000395) (0.000188) (0.00143) (.)

Mean DV 0.000164 0.000986 0.000699 0.0000411 0.00275 0s.d. DV 0.0128 0.0314 0.0279 0.00641 0.0547 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 9720 9720 9720 9720 9720 9720

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the programareas (treatment) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the former African LandUnits.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.


Table A-3.8: Program areas vs. African Land Units – Long run outcome measures


Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate -0.848*** -0.160*** 0.00142 -0.000250 -2.970 -1.539 -0.00332

(0.276) (0.0322) (0.0161) (0.00797) (4.850) (0.985) (0.00879)



RD Estimate 0.00000456 0.00000101 -0.00207 -0.000394 -0.00000237(0.0000123) (0.00000228) (0.00265) (0.000275) (0.00000231)



RD Estimate 0.00772 0 -0.161*** 0.0395 0.130*** 0.170*** 0(0.00818) (.) (0.0358) (0.0275) (0.0258) (0.0232) (.)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.

Appendix 151

Table A-3.9: Program areas vs. African Land Units – Border segment analysis (incomepotential)

(1) (2) (3) (4) (5) (6) (7)Panel A: Low potential income < 100 £/year


RD Estimate -0.620*** -0.0237 0.00000319 0.00000327 -0.00311 -0.0000588 -0.00000380(0.228) (0.0312) (0.0000148) (0.00000362) (0.00262) (0.000220) (0.00000293)



RD Estimate 0.00698 0 0.00518 0.0248 -0.0230 -0.0101 0(0.0117) (.) (0.0442) (0.0387) (0.0301) (0.0126) (.)


Panel B: High potential income ≥ 100 £/year


RD Estimate -2.752*** -0.391*** -0.00000203 -0.00000331* -0.000303 -0.00166 0.00000335(0.745) (0.0605) (0.0000200) (0.00000174) (0.00704) (0.00130) (0.00000393)



RD Estimate 4.77e-14*** 0 -0.411*** -0.0126 0.423*** 0.457*** 0(1.27e-14) (.) (0.0564) (0.0181) (0.0521) (0.0593) (.)

Mean DV 0.989 0 0.846 0.0724 0.0704 0.126 0s.d. DV 0.105 0 0.345 0.239 0.245 0.300 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 2817 2817 2817 2817 2817 2817 2817

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students areextracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron(2013). Per capita estimates are computed using Voronoi polygons to compute population estimates associated with each school and pollingstation.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.


Table A-3.10: Non-program European settler areas vs. African Land Units – Altitude &pre-treatment characteristics



RD Estimate 0.000543* 0.000257 0 0 0 0 -2.936 -2.765(0.000289) (0.000208) (.) (.) (.) (.) (3.507) (6.192)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitude ofcell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area (counterfactual) and the former African Land Units (control).A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the non-program areaEuropean settler area than in the former African Land Units.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

Table A-3.11: Non-program European settler areas vs. African Land Units – Short runoutcomes



RD Estimate -0.00177* -0.0000207 -0.000580* -0.0000240 -0.000932* 0.0000607(0.000949) (0.000353) (0.000318) (0.0000215) (0.000518) (0.000152)

Mean DV 0.00130 0.000395 0.000325 0.00000692 0.00109 0.0000208s.d. DV 0.0452 0.0205 0.0184 0.00263 0.0347 0.00456Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013


RD Estimate -0.00134 -0.000463 -0.000509 -0.0000240 -0.00155** 0.0000607(0.000933) (0.000430) (0.000358) (0.0000215) (0.000634) (0.000152)

Mean DV 0.00136 0.000575 0.000395 0.00000692 0.00135 0.0000208s.d. DV 0.0460 0.0248 0.0199 0.00263 0.0398 0.00456Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area (counterfactual) and the former African Land Units (control). A positive coefficient on RD Estimateindicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the non-program areaEuropean settler area than in the former African Land Units.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.

Appendix 153

Table A-3.12: Non-program European settler area vs. African Land Units – Long run outcomemeasures


Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate -2.111*** -0.160*** -0.00513** -0.00210* -1.714* -0.367 -0.00455**

(0.0708) (0.00688) (0.00241) (0.00117) (1.030) (0.313) (0.00206)



RD Estimate -0.00000106 4.40e-08 0.000778 0.0000144 -0.00000282**(0.00000340) (0.000000280) (0.00134) (0.0000721) (0.00000119)



RD Estimate -0.0331*** 0.00154* -0.203*** -0.00874* 0.177*** 0.0149*** -0.0355***(0.00419) (0.000825) (0.00661) (0.00450) (0.00436) (0.00390) (0.00334)


This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of thelatitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area(counterfactual) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered,the difference at the discontinuity is positive, and hence higher in the non-program area European settler area than in the former African LandUnits.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.


Table A-3.13: Robustness – Flexible bandwidth selection, correction of standard errors

(1) (2) (3) (4) (5) (6) (7)Outcome Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations

Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell

Panel A Border schemes/other parts European settler area

Conventional 0.273** 0.0584*** 0.00120 -0.000245 -0.0700 0.0492 0.000608(0.112) (0.0142) (0.00309) (0.00175) (1.299) (0.423) (0.00283)

Bias-corrected 0.316*** 0.0645*** 0.00140 -0.000337 0.0582 -0.0565 0.00140(0.112) (0.0142) (0.00309) (0.00175) (1.299) (0.423) (0.00283)

Robust 0.316** 0.0645*** 0.00140 -0.000337 0.0582 -0.0565 0.00140(0.132) (0.0164) (0.00366) (0.00208) (1.549) (0.503) (0.00329)

Mean DV 4.163 0.670 0.0138 0.00341 4.540 0.599 0.00723s.d. DV 3.905 0.470 0.127 0.0619 54.81 15.29 0.116Polynomial 1 1 1 1 1 1 1Bandwidth 1.708 1.543 1.792 1.670 1.747 1.382 1.808Eff. N 26676 24217 27905 26105 27265 21707 28087

Panel B Border schemes/African Land Units

Conventional -0.683** -0.139*** 0.0246 0.000387 9.853 -0.205 -0.00617(0.324) (0.0382) (0.0322) (0.00838) (12.28) (1.371) (0.0141)

Bias-corrected -0.648** -0.131*** 0.0315 0.00369 12.61 0.325 -0.00383(0.324) (0.0382) (0.0322) (0.00838) (12.28) (1.371) (0.0141)

Robust -0.648* -0.131*** 0.0315 0.00369 12.61 0.325 -0.00383(0.366) (0.0450) (0.0370) (0.0125) (13.99) (1.584) (0.0179)

Mean DV 6.981 0.830 0.0265 0.00773 8.866 1.575 0.0150s.d. DV 5.044 0.376 0.172 0.0921 71.42 25.28 0.217Polynomial 1 1 1 1 1 1 1Bandwidth 3.312 3.767 2.263 4.773 1.560 2.741 2.881Eff. N 5618 6648 3440 9164 2180 4379 4666

This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and thelongitude of cell centroids). In Panel A, the coefficient RD Estimate capture the difference between the program areas (treatment) and the neighboring areas in theformer European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuityis positive, and hence higher in the program areas than in the counterfactual areas. In Panel B, the coefficient RD Estimate capture the difference between theprogram areas (treatment) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, thedifference at the discontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students are extracted from the 2007school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013).

CONCLUSION

What makes findings surprising? Evidence from a PhD dissertation.

In this conclusion, I test whether the main findings of this PhD are surprising. Using a novel

dataset on researchers’ priors, I compare the findings of this PhD to researchers’ priors to assess

the degree of unexpectedness of the findings. I develop a model in which researchers’ priors

can be informed by their own experiences (anecdote-based priors), by their knowledge of the

literature (evidence-based priors), or not informed at all (uninformed priors). The results suggest

that whether findings are surprising depends on the distribution of researchers’ prior types.

Priors were measured based using seminar questions and research discussions of a represen-

tative sample of the researchers that attend the same seminars and conferences as me. This

measure is likely to suffer from severe measurement bias that should be attributed, for the

most part, to recall bias. This PhD provided the ideal setting for a study of researchers’ pri-

ors, as the topics it engages with helped make researchers’ priors observable. First, everybody

has some experience (direct or indirect) of marriage or divorce.68 During private discussions

researchers made their priors explicit. These priors were likely to be based on their personal

experiences. Second, there seems to be a very strong association between Africa and ethnicity

among researchers outside of development economics, so the topic of ethnicity comes up rather

often during seminars and conferences. (The strength of this association could be tested using

an Implicit Association Test.) Throughout this conclusion, I use the words ”surprising” and

”unexpected” interchangeably.

Chapter 1 concludes that i) interethnic marriages are far from rare and ii) interethnic marriages

have become more common. This finding runs counter to what can be derived from most of

the economic literature on ethnicity in Africa. The average economist appears surprised by the

68It was hence easy to interview a very diverse pool of respondents as respondents were often interested in theinterview topics and felt confident that they were knowledgeable about these topics. One exception was older menin polygamous marriages.

156 Conclusion

magnitude of the effect (1 out of 5 women is in an interethnic marriage), thus indicating that it

is indeed an unexpected finding.

Chapter 2 concludes that children’s likelihood to have ever attended school is not negatively

affected by their parents’ divorce. Is this a surprising finding? On the one hand, the literature

on divorces tends to find negative associations of divorces with educational outcomes. On the

other hand, the outcome we study is a basic investment in education, that we would not expect

to be affected by divorces in developed settings. Ultimately, whether this finding is considered

to be surprising depends on individuals’ priors on divorce, conflict, and education.

Chapter 3 tentatively concludes that ethnic homogenization does not result in more public

good provision if the homogenization process also disrupts (co-ethnic) social networks. This

finding is a recent development of the paper, hence data on priors is too limited to conclude on

the unexpectedness of this finding. One could nevertheless argue that documenting the drastic

change in ethnic diversity after the program is an interesting contribution.

Working with surprising results: A guide

This PhD explores three research questions whose answers turned out to be rather unexpected.

This PhD has hence been an intense experience in abduction economics (Heckman and Singer,

2017).69 One of the great joys of this PhD has been to tackle uninformed priors (or prenotions

(Durkheim, 1987)). One of the worst research-related fear that I experienced during this PhD

what wondering whether the “surprising findings” reflected the state of the world or errors.70

Perhaps unsurprisingly, this combination resulted in another great joy of this PhD: engaging

with the topic of research integrity and replicability.

As a word of conclusion regarding this dissertation, I wish to share a story regarding measure-

ment error (as well as, of course, marriage): I was in Nakuru, Kenya, when the 2019 census was

conducted. I thus had to be, as per the guidelines of the Kenyan National Bureau of Statistics

(KNBS), counted. Having worked on marriages, divorces, ethnicity, and land, it was now my

turn to answer questions on these topics. Racked with guilt over the idea of making a false

statement in the census, I told the census enumerator that I was not married. That might have

been an error. I then answered the other census questions while the enumerator was trying to

69“Abduction is the process of generating and revising models, hypotheses and data analyzed in response tosurprising findings.” (Heckman and Singer, 2017). The authors advocate abduction as a strategy for reacting tosurprise.

70PEBCAK errors (Problem Exists Between Chair And Keyboard) are one of the most common sources of errors.When facing a coding error, I find great comfort in Nick Eubank’s words: “It’s natural to think that the reason wefind problems in the code behind published papers is carelessness or inattention on behalf of authors, and that thekey to minimizing problems in our code is to be more careful. The truth, I have come to believe, is more subtle:humans are effectively incapable of writing error-free code.”(Eubank, 2016)

Conclusion 157

convince me of his interest in getting to know me. That effort came at the expense of this work

as an enumerator: I noticed that most of my answers were not recorded accurately. Apologies

to anyone who comes across a weird outlier in central Kenya.

REFERENCES

Ahlerup, P. and Olsson, O., 2012. “The roots of ethnic diversity.” Journal of Economic Growth17(2), 71–102.

Akresh, R., De Walque, D., and Kazianga, H., 2016. Evidence from a randomized evaluation of thehousehold welfare impacts of conditional and unconditional cash transfers given to mothers or fathers.The World Bank.

Alesina, A., Baqir, R., and Easterly, W., 1999. “Public goods and ethnic divisions.” QuarterlyJournal of Economics 114(4), 1243–1284.

Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., and Wacziarg, R., 2003. “Fractional-ization.” Journal of Economic growth 8(2), 155–194.

Alesina, A. and La Ferrara, E., 2000. “Participation in heterogeneous communities.” The quar-terly journal of economics 115(3), 847–904.

—, 2005. “Ethnic diversity and economic performance.” Journal of economic literature 43(3),762–800.

Alesina, A., Michalopoulos, S., and Papaioannou, E., 2016. “Ethnic Inequality.” Journal ofPolitical Economy 124(2), 428–488.

Amato, P. R., 2000. “The Consequences of Divorce for Adults and Children.” Journal of Marriageand Family 62(4), 1269–1287.

Ambler, C. H., 1988. Kenyan communities in the age of imperialism : the central region in the latenineteenth century. Yale historical publications ; v.136, New Haven ; London: Yale UniversityPress, x, 181 pages.

Ambwere, S., 2003. Policy Implications of Land Subdivision in Settlement Areas: A Case Study ofLumakanda Settlement Scheme. Thesis.

Anderson, S. and Bidner, C., 2021. “Family Institutions.” Prepared for the Handbook of FamilyEconomics.

Andre, P. and Demonsant, J.-L., 2014. “Substitution between Formal and Qur’Anic Schools inSenegal.” The Review of Faith and International Affairs 12(2), 61–65.

Baland, J.-M., Bonjean, I., Guirkinger, C., and Ziparo, R., 2016. “The economic consequencesof mutual help in extended families.” Journal of Development Economics 123, 38–56.

159

160 References

Baldwin, K. and Huber, J. D., 2010. “Economic versus cultural differences: Forms of ethnicdiversity and public goods provision.” American Political Science Review 104(4), 644–662.

Bazzi, S., Gaduh, A., Rothenberg, A. D., and Wong, M., 2019. “Unity in Diversity? HowIntergroup Contact Can Foster Nation Building.” American Economic Review 109(11), 3978–4025.

Beck, S., Vreyer, P. D., Lambert, S., Marazyan, K., and Safir, A., 2015. “Child fostering inSenegal.” Journal of Comparative Family Studies 46(1), 57–73.

Becker, G. S., 1973. “A theory of marriage: Part I.” Journal of Political economy 81(4), 813–846.

Becker, S. O., Grosfeld, I., Grosjean, P., Voigtlander, N., and Zhuravskaya, E., 2020. “Forcedmigration and human capital: Evidence from post-WWII population transfers.” AmericanEconomic Review 110(5), 1430–63.

Beegle, K., De Weerdt, J., and Dercon, S., 2006. “Orphanhood and the Long-Run Impact onChildren.” American Journal of Agricultural Economics 88(5), 1266–1272.

—, 2010. “Orphanhood and human capital destruction: Is there persistence into adulthood?”Demography 47(1), 163–180.

Berge, L. I. O., Bjorvatn, K., Galle, S., Miguel, E., Posner, D. N., Tungodden, B., and Zhang,K., 2018. “Ethnically Biased? Experimental Evidence from Kenya.” Journal of the EuropeanEconomic Association .

Bertrand-Dansereau, A. and Clark, S., 2016. “Pragmatic tradition or romantic aspiration? Thecauses of impulsive marriage and early divorce among women in rural Malawi.” Demo-graphic Research 35, 47–80.

Bisin, A. and Verdier, T., 2000. ““Beyond the melting pot”: cultural transmission, marriage,and the evolution of ethnic and religious traits.” The Quarterly Journal of Economics 115(3),955–988.

Bjorklund, A. and Sundstrom, M., 2006. “Parental separation and children’s educational at-tainment: A siblings analysis on Swedish register data.” Economica 73(292), 605–624.

Boesen, J., 2019. Tanzania: from ujamaa to villagization. University of Toronto Press.

Boone, C., Lukalo, F., and Joireman, S. F., 2021. “Promised Land: Settlement Schemes inKenya, 1962 to 2016.” Political Geography 89, 102393.

Boubacar, N. and Francois, R., 2007. Senegal, Country case study. Country Profile commissionedfor the EFA Global Monitoring Report 2007, Strong foundations: early childhood care andeducation.

Bratberg, E., Rieck, K. M. E., and Vaage, K., 2014. “Intergenerational earnings mobility anddivorce.” Journal of Population Economics 27(4), 1107–1126.

Burgess, R., Jedwab, R., Miguel, E., Morjaria, A., and Padro i Miquel, G., 2015. “The valueof democracy: evidence from road building in Kenya.” American Economic Review 105(6),1817–51.

Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R., 2017. “Rdrobust: Software forRegression-discontinuity Designs.” The Stata Journal 17(2), 372–404.

Canut, C., 2002. “Langues et filiation en Afrique.” Les Temps Modernes (4), 410–440.

References 161

Case, A. and Ardington, C., 2006. “The impact of parental death on school outcomes: Longi-tudinal evidence from South Africa.” Demography 43(3), 401–420.

Chae, S., 2016. “Parental divorce and children’s schooling in rural Malawi.” Demography 53(6),1743–1770.

Chandra, K., 2006. “What is ethnic identity and does it matter?” Annu. Rev. Polit. Sci. 9,397–424.

Charnysh, V. and Peisakhin, L., 2021. “The Role of Communities in the Transmission of Polit-ical Values: Evidence from Forced Population Transfers.” British Journal of Political Science ,1–21.

Chehami, J., 2016. “Les familles et le daara au Senegal.” Afrique contemporaine (1), 77–89.

Churchill, S. A. and Smyth, R., 2017. “Ethnic diversity and poverty.” World Development 95,285–302.

CIESIN, 2016. “High Resolution Settlement Layer (HRSL).” Source imagery for HRSL © 2016DigitalGlobe.

Cisse, F., Daffe, G., and Diagne, A., 2004. “Les inegalites dans l’acces a l’education au Senegal.”Revue d’economie du developpement 12(2), 107–122.

Clark, S. and Brauner-Otto, S., 2015. “Divorce in sub-Saharan Africa: Are Unions BecomingLess Stable?” Population and Development Review 41(4), 583–605.

Clark, S. and Hamplova, D., 2013. “Single motherhood and child mortality in sub-SaharanAfrica: A life course perspective.” Demography 50(5), 1521–1549.

Clark, S., Kabiru, C., and Mathur, R., 2010. “Relationship transitions among youth in urbanKenya.” Journal of Marriage and Family 72(1), 73–88.

Conversi, D., 2010. “Cultural homogenization, ethnic cleansing, and genocide.” In “OxfordResearch Encyclopedia of International Studies,” .

Crespin-Boucaud, J., 2020. “Interethnic and interfaith marriages in sub-Saharan Africa.” WorldDevelopment 125, 104668.

Crespin-Boucaud, J. and Hotte, R., 2021. “Parental divorces and children’s educational out-comes in Senegal.” World Development 145, 105483.

de la Cuesta, B. and Wantchekon, L., 2016. “Is Language Destiny? The Origins and Conse-quences of Ethnolinguistic Diversity in Sub-Saharan Africa.” In “The Palgrave Handbook ofEconomics and Language,” Springer, pages 513–537.

De Vreyer, P., Lambert, S., Safir, A., and Sylla, M., 2008. “Pauvrete et structure familiale,pourquoi une nouvelle enquete.” Stateco (102), 261–275.

Department of Settlement, 1974. Annual Report 1974. Department of Settlement.

—, various. Annual Reports. Department of Settlement.

Desmet, K., Ortuno-Ortın, I., and Wacziarg, R., 2016. “Linguistic cleavages and economicdevelopment.” In “The Palgrave handbook of economics and language,” Springer, pages425–446.

Di Matteo, F., 2019. Decolonising Property in Kenya?: Tracing Policy Processes of Kenyan Contem-porary Land Reform (1990s-2016). A Study of the Politicization of Decision-Making in HistoricalPerspective. Ph.D. thesis, Paris, EHESS.

162 References

Dial, F. B., 2008. Mariage et divorce a Dakar: itineraires feminins. KARTHALA Editions.

Djuikom, M. A. and van de Walle, D. P., 2018. “Marital Shocks and Women’s Welfare inAfrica.” World Bank Policy Research Working Paper (8306).

Doss, C., 2013. “Intrahousehold bargaining and resource allocation in developing countries.”The World Bank Research Observer 28(1), 52–78.

Dulani, B., Harris, A. S., Horowitz, J., and Kayuni, H., 2021. “Electoral preferences amongmultiethnic voters in Africa.” Comparative Political Studies 54(2), 280–311.

Dumas, C. and Lambert, S., 2011. “Educational Achievement and Socio-economic Background:Causality and Mechanisms in Senegal.” Journal of African Economies 20(1), 1–26.

Dupuy, A. and Galichon, A., 2014. “Personality traits and the marriage market.” Journal ofPolitical Economy 122(6), 1271–1319.

Durkheim, E., 1987. “Les regles de la methode sociologique (1895).” Paris, puf .

Easterly, W. and Levine, R., 1997a. “Africa’s growth tragedy: policies and ethnic divisions.”The quarterly journal of economics 112(4), 1203–1250.

—, 1997b. “Africa’s growth tragedy: policies and ethnic divisions.” Quarterly Journal of Eco-nomics 112(4), 1203–1250.

Eifert, B., Miguel, E., and Posner, D. N., 2010. “Political competition and ethnic identificationin Africa.” American Journal of Political Science 54(2), 494–510.

Ermisch, J. F. and Francesconi, M., 2001. “Family structure and children’s achievements.”Journal of population economics 14(2), 249–270.

Eshiwani, G. S., 1990. Implementing Educational Policies in Kenya. World Bank Discussion PapersNo. 85, Washington, D. C.: World Bank.

Eubank, N., 2016. “Embrace Your Fallibility:Thoughts on Code Integrity.”https://www.nickeubank.com/wp-content/uploads/2016/06/Eubank_EmbraceYourFallibility.pdf. Accessed: 2021-07-14.

Eubank, N. et al., 2019. “Social networks and the political salience of ethnicity.” QuarterlyJournal of Political Science 14(1), 1–39.

Eynde vanden, O., Kuhn, P. M., and Moradi, A., 2018. “Trickle-Down Ethnic Politics: Drunkand Absent in the Kenya Police Force (1957-1970).” American Economic Journal: EconomicPolicy 10(3), 388–417.

Fafchamps, M. and Quisumbing, A. R., 2007. “Household formation and marriage markets inrural areas.” Handbook of development economics 4, 3187–3247.

FAO, 2000. “Agricultural Fields: Kenya, 2000.”

FAO/IIASA, 2011. “Global Agro-ecological Zones (GAEZ v3.0).” FAO Rome, Italy and IIASA,Laxenburg, Austria.

Fearon, J. D., 2003. “Ethnic and cultural diversity by country.” Journal of economic growth 8(2),195–222.

Fearon, J. D. and Laitin, D. D., 1996. “Explaining interethnic cooperation.” American PoliticalScience Review 90(4), 715–735.

Foundation, O., 2020. “Global Agro-ecological Zones (GAEZ v3.0).”

https://www.nickeubank.com/wp-content/uploads/2016/06/Eubank_EmbraceYourFallibility.pdf

https://www.nickeubank.com/wp-content/uploads/2016/06/Eubank_EmbraceYourFallibility.pdf

References 163

Francesconi, M., Jenkins, S. P., and Siedler, T., 2010. “Childhood family structure and school-ing outcomes: evidence for Germany.” Journal of Population Economics 23(3), 1073–1103.

Francois, P., Rainer, I., and Trebbi, F., 2015. “How is power shared in Africa?” Econometrica83(2), 465–503.

Fryer Jr, R. G., 2007. “Guess who’s been coming to dinner? Trends in interracial marriage overthe 20th century.” Journal of Economic Perspectives 21(2), 71–90.

Furtado, D. and Theodoropoulos, N., 2011. “Interethnic marriage: a choice between ethnicand educational similarities.” Journal of Population Economics 24(4), 1257–1279.

Gelman, A. and Imbens, G., 2019. “Why high-order polynomials should not be used in regres-sion discontinuity designs.” Journal of Business & Economic Statistics 37(3), 447–456.

Gershman, B. and Rivera, D., 2018. “Subnational diversity in Sub-Saharan Africa: Insightsfrom a new dataset.” Journal of Development Economics 133, 231–263.

Gibson, J., Olivia, S., and Boe-Gibson, G., 2020. “Night lights in economics: Sources anduses.” Etudes et Documents,n°1, CERDI .

Gisselquist, R. M., Leiderer, S., and Nino-Zarazua, M., 2016. “Ethnic heterogeneity and publicgoods provision in Zambia: Evidence of a subnational “diversity dividend”.” World Devel-opment 78, 308–323.

Glewwe, P. and Kremer, M., 2006. “Schools, teachers, and education outcomes in developingcountries.” Handbook of the Economics of Education 2, 945–1017.

Gnoumou Thiombiano, B., LeGrand, T. K., and Kobiane, J.-F., 2013. “Effects of Parental UnionDissolution on Child Mortality and Child Schooling in Burkina Faso.” Demographic Research29, 797–816.

Goren, E., 2014. “How ethnic diversity affects economic growth.” World Development 59, 275–297.

Greenberg, J. H., 1956. “The measurement of linguistic diversity.” Language 32(1), 109–115.

Habyarimana, J., Humphreys, M., Posner, D. N., and Weinstein, J. M., 2007. “Why does ethnicdiversity undermine public goods provision?” American Political Science Review 101(4), 709–725.

Hanlon, J., 1990. Mozambique : the revolution under fire. London ; Atlantic Highlands, N.J.: ZedBooks.

Harris, J. A. and Posner, D. N., 2019. “(Under what conditions) Do politicians reward theirsupporters? Evidence from Kenya’s constituencies development fund.” American PoliticalScience Review 113(1), 123–139.

Hassan, M., 2017. “The Strategic Shuffle: Ethnic Geography, the Internal Security Apparatus,and Elections in Kenya.” American Journal of Political Science 61(2), 382–395.

Heckman, J. J. and Singer, B., 2017. “Abducting economics.” American Economic Review 107(5),298–302.

Hjort, J., 2014. “Ethnic Divisions and Production in Firms *.” The Quarterly Journal of Economics129(4), 1899–1946.

Horowitz, J., 2019. “Ethnicity and the Swing Vote in Africa’s Emerging Democracies: Evidencefrom Kenya.” British Journal of Political Science 49(3), 901–921.

164 References

Hotte, R. and Marazyan, K., 2020. “Demand for insurance and within-kin-group marriages:Evidence from a West-African country.” Journal of Development Economics 146, 102489.

Jedwab, R., Kerby, E., and Moradi, A., 2017. “History, Path Dependence and Development:Evidence from Colonial Railroads, Settlers and Cities in Kenya.” Economic Journal , 1467–1494.

Jones, S., Schipper, Y., Ruto, S., and Rajani, R., 2014. “Can your child read and count? Mea-suring learning outcomes in East Africa.” Journal of African economies 23(5), 643–672.

Kalmijn, M., 1998. “Intermarriage and homogamy: Causes, patterns, trends.” Annual review ofsociology 24(1), 395–421.

Kalmijn, M. and Van Tubergen, F., 2006. “Ethnic intermarriage in the Netherlands: Confirma-tions and refutations of accepted insights.” European Journal of Population/Revue europeenne dedemographie 22(4), 371–397.

Kanbur, R., Rajaram, P. K., and Varshney, A., 2011. “Ethnic diversity and ethnic strife. Aninterdisciplinary perspective.” World Development 39(2), 147–158.

Kenya, 1971. An Economic Appraisal of the Settlement Schemes 1964/65 - 1967/68. Statistics Divi-sion and Ministry of Finance and Economic Planning.

—, 1980. Educational Trends 1973-77. Nairobi: Central Bureau of Statistics Ministry of EconomicPlanning and Community Affairs.

Kenya National Bureau of Statistics, 1926. “Colony & Protectorate of Kenya: Plans ShowingAdministrative Boundaries.” Report, Nairobi.

—, 2015. “Kenya Demographic and Health Survey 2014.” Report.

Kenya National Bureau of Statistics, 2015/2016. “Kenya Integrated Household Budget Sur-vey.”

Kramon, E. and Posner, D. N., 2016. “Ethnic Favoritism in Education in Kenya.” QuarterlyJournal of Political Science 11(1), 1–58.

Lagoutte, S., Bengaly, A., Youra, B., Fall, P. T., and Danish Institute for Human Rights, 2014.Rupture du lien matrimonial, pluralisme juridique et droits des femmes en Afrique de l’Ouest franco-phone. Copenhagen: Danish Institute for Human Rights. OCLC: 900293711.

Lambert, S., van de Walle, D., and Villar, P., 2019. Towards Gender Equity in Development, chap.Marital trajectories, women’s autonomy and women’s wellbeing in Senegal. Oxford: OxfordUniversity Press.

Le Forner, H., 2020. “Age At Parents’ Separation and Children Achievement: Evidence FromFrance Using a Sibling Approach.” Annals of Economics and Statistics (forthcoming).

Leo, C., 1984. Land and class in Kenya. Political economy of world poverty, Toronto: Univ. ofToronto P., 244 s. pages.

Leys, C., 1974. “Interpreting African Underdevelopment: Reflections on the ILO Report onEmployment, Incomes and Equality in Kenya.” Manpower and Unemployment Research inAfrica , 19–28.

—, 1975. Underdevelopment in Kenya: the political economy of neo-colonialism, 1964-1971. Berkeley:University of California Press.

References 165

Locoh, T. and Thiriat, M.-P., 1995. “Divorce et remariage des femmes en Afrique de l’Ouest.Le cas du Togo.” Population , 61–93.

Lorgen, C. C., 2000. “Villagisation in Ethiopia, Mozambique, and Tanzania.” Social Dynamics26(2), 171–198.

Lowes, S., Nunn, N., Robinson, J. A., and Weigel, J., 2015. “Understanding Ethnic Identity inAfrica: Evidence from the Implicit Association Test (IAT).” American Economic Review 105(5),340–45.

Lukalo, Boone, Browne, and Joireman, 2019. “Kenya Settlement Schemes Data Project.” Lon-don, Nairobi, and Richmond: NCL, LSE, and UoR.

Lukalo, F. and Odari, S., 2016. “Exploring the Status of Settlement Schemes in Kenya.”

Luke, N. and Munshi, K., 2006. “New roles for marriage in urban Africa: Kinship networksand the labor market in Kenya.” The Review of Economics and Statistics 88(2), 264–282.

Lynch, G., 2011. I Say to You: ethnic politics and the Kalenjin in Kenya. University of ChicagoPress.

Mack, R., 1970. “The great African cattle plague epidemic of the 1890’s.” Tropical Animal Healthand Production 2(4), 210–219.

Marazyan, K., 2015. “Resource Allocation in Extended Sibships: An Empirical Investigationfor Senegal.” Journal of African Economies 24(3), 416–452.

Maron, M., 2013. “Kenya Election data.”

Mayrargue, C., 2004. “Trajectoires et enjeux contemporains du pentecotisme en Afrique del’Ouest.” Critique internationale (1), 95–109.

McCauley, J. F., 2014. “The political mobilization of ethnic and religious identities in Africa.”American Political Science Review 108(4), 801–816.

Menon, N., Van Der Meulen Rodgers, Y., and Nguyen, H., 2014. “Women’s land rights andchildren’s human capital in Vietnam.” World Development 54, 18–31.

Meyer, B., 2004. “Christianity in Africa: From African independent to Pentecostal-charismaticchurches.” Annu. Rev. Anthropol. 33, 447–474.

Michalopoulos, S., 2012. “The origins of ethnolinguistic diversity.” American Economic Review102(4), 1508–39.

Miguel, E., 2004. “Tribe or nation? Nation building and public goods in Kenya versus Tanza-nia.” World politics 56(3), 327–362.

Miguel, E. and Gugerty, M. K., 2005. “Ethnic diversity, social sanctions, and public goods inKenya.” Journal of Public Economics 89(11–12), 2325–2368.

Miho, A., Jarotschkin, A., and Zhuravskaya, E., 2019. “Diffusion of Gender Norms: Evidencefrom Stalin’s Ethnic Deportations.” Available at SSRN 3417682 .

Milanovic, B., 2003. Is inequality in Africa really different? Washington, D.C.: World BankDevelopment Research Group Poverty Team, 43 pages.

Miles, W. F. and Rochefort, D. A., 1991. “Nationalism versus ethnic identity in sub-SaharanAfrica.” American Political Science Review 85(2), 393–403.

Ministry of Education, 2009. “2007 School Census.” Online Database.

166 References

Monden, C. W. and Smits, J., 2005. “Ethnic intermarriage in times of social change: The caseof Latvia.” Demography 42(2), 323–345.

Montalvo, J. G. and Reynal-Querol, M., 2005. “Ethnic polarization, potential conflict, and civilwars.” American economic review 95(3), 796–816.

Morgan, W. and Shaffer, N. M., 1966. Population of Kenya: Density and Distribution. Nairobi:Oxford University Press.

Morgan, W. T. W., 1963. “The ’White Highlands’ of Kenya.” The Geographical Journal 129(2),140–155.

Mozaffar, S., Scarritt, J. R., and Galaich, G., 2003. “Electoral institutions, ethnopolitical cleav-ages, and party systems in Africa’s emerging democracies.” American political science review97(3), 379–390.

Mwiria, K., 1991. “Education for subordination: African education in colonial Kenya.” Historyof Education 20(3), 261–273.

Medard, C. and Golaz, V., 2011. “Les frontieres interieures du Kenya : une contrainte pourl’acces a la terre.” CERISCOPE Frontieres .

Ndaruhutse, S., Branelly, L., Latham, M., and Penson, J., 2008. Grade repetition in primaryschools in Sub-Saharan Africa: an evidence base for change. CfBT Education Trust Reading, UK.

Ngau, P. M., 1987. “Tensions in Empowerment: The Experience of the ”Harambee” (Self-Help)Movement in Kenya.” Economic Development and Cultural Change 35(3), 523–538.

Nottidge, C. P. R. and Goldsack, J. R., 1966. The Million-Acre Settlement Scheme 1962-1966.Department of Settlement.

Parsons, T., 2012. “Being Kikuyu in Meru: Challenging the Tribal Geography of ColonialKenya.” The Journal of African History 53(1), 65–86.

Polian, P. M., 2004. Against their will: the history and geography of forced migrations in the USSR.Budapest; New York: Central European University Press.

Posner, D. N., 2004. “Measuring ethnic fractionalization in Africa.” American journal of politicalscience 48(4), 849–863.

—, 2005. Institutions and ethnic politics in Africa. Cambridge University Press.

Qian, Z. and Lichter, D. T., 2007. “Social boundaries and marital assimilation: Interpretingtrends in racial and ethnic intermarriage.” American Sociological Review 72(1), 68–94.

—, 2011. “Changing patterns of interracial marriage in a multiracial society.” Journal of Marriageand Family 73(5), 1065–1084.

Regional Centre For Mapping Resource For Development, 2020. “Kenya SRTM DEM 30me-ters.” http://opendata.rcmrd.org/datasets/kenya-srtm-dem-30meters.

Republic of Kenya, 1999. Totally Integrated Quality Education and Traiining TIQET - Report of theCommission of Inquiry into the Education System of Kenya. Nairobi: Republic of Kenya.

Rhode, P. W. and Strumpf, K. S., 2003. “Assessing the importance of Tiebout sorting: Localheterogeneity from 1850 to 1990.” American Economic Review 93(5), 1648–1677.

Ruthenberg, H. and fur Wirtschaftsforschung), A.-S. I.-I., 1966. African agricultural productiondevelopment policy in Kenya, 1952-1965. Berlin: Springer-Verlag.

http://opendata.rcmrd.org/datasets/kenya-srtm-dem-30meters

References 167

Simons, G. F. and Fennig, C. D., 2017. “Ethnologue: Languages of the world.” SIL International20.

Simson, R., 2018. “Ethnic (in)equality in the public services of Kenya and Uganda.” AfricanAffairs 118(470), 75–100.

Simson, R. and Green, E., 2020. “Ethnic favouritism in Kenyan education reconsidered: whena picture is worth more than a thousand regressions.” The Journal of Modern African Studies58(3), 425–460.

Smith-Greenaway, E., 2020. “Does Parents’ Union Instability Disrupt Intergenerational Ad-vantage? An Analysis of Sub-Saharan Africa.” Demography , 1–29.

Sporlein, C., Schlueter, E., and van Tubergen, F., 2014. “Ethnic intermarriage in longitudinalperspective: Testing structural and cultural explanations in the United States, 1880–2011.”Social science research 43, 1–15.

Survey of Kenya, 1959. Atlas of Kenya: a comprehensive series of new and authentic maps preparedfrom the national survey and other governmental sources ; with gazetteer and notes on pronunciation& spelling. 1st ed. Nairobi: Printed by the Survey of Kenya, ix l., 44 l. of maps (part col.)pages.

Tiebout, C. M., 1956. “A pure theory of local expenditures.” Journal of political economy 64(5),416–424.

Troup, L., 1953. Inquiry into the General Economy of Farming in the Highlands having regard toCapital and Long- and Short-term Financial Commitments, whether Secured or Unsecured, excludingFarming Enterprises solely concerned with the Production of Sisal, Wattle, Tea and Coffee. Nairobi:Government Printer.

U.S. Board on Geographic Names, 2018. “Gazetteer Kenya.” http://geonames.nga.mil/gns/html/namefiles.html.

van de Walle, D., 2013. “Lasting Welfare Effects of Widowhood in Mali.” World Development51, 1–19.

Van der Gaag, J. and Adams, A., 2010. “Where Is the Learning? Measuring Schooling Effortsin Developing Countries. Policy Brief 2010-04.” Brookings Institution .

World Bank, 1973. “Agricultural Sectory Survey - Kenya.” Report 254a-KE, Eastern AfricaProjects Department.

World Resources Institute, 2007. Nature’s Benefits in Kenya, An Atlas of Ecosystems and HumanWell-Being. World Resources Institute.

Youe, C. P., 1988. “Settler Capital and the Assault on the Squatter Peasantry in Kenya’s UasinGishu District, 1942-63.” African Affairs 87(348), 393–418.

http://geonames.nga.mil/gns/html/namefiles.html

http://geonames.nga.mil/gns/html/namefiles.html